CN115631193A - Workpiece defect detection method and device based on attention mechanism and storage medium - Google Patents

Workpiece defect detection method and device based on attention mechanism and storage medium Download PDF

Info

Publication number
CN115631193A
CN115631193A CN202211552964.8A CN202211552964A CN115631193A CN 115631193 A CN115631193 A CN 115631193A CN 202211552964 A CN202211552964 A CN 202211552964A CN 115631193 A CN115631193 A CN 115631193A
Authority
CN
China
Prior art keywords
feature
classification
regression
module
feature map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211552964.8A
Other languages
Chinese (zh)
Inventor
李朋超
杨庆泰
籍吉川
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jushi Intelligent Technology Co ltd
Original Assignee
Beijing Jushi Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jushi Intelligent Technology Co ltd filed Critical Beijing Jushi Intelligent Technology Co ltd
Priority to CN202211552964.8A priority Critical patent/CN115631193A/en
Publication of CN115631193A publication Critical patent/CN115631193A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/766Arrangements for image or video recognition or understanding using pattern recognition or machine learning using regression, e.g. by projecting features on hyperplanes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30108Industrial image inspection
    • G06T2207/30164Workpiece; Machine component

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a method and a device for detecting workpiece defects based on an attention mechanism and a storage medium, which are applied to the technical field of workpiece surface defect detection and comprise the following steps: the method comprises the steps of inputting an image to be detected into a feature extraction network framework, obtaining a feature map of the image to be detected, inputting the feature map into a region generation module, obtaining classification features and regression features of the feature map, extracting the features by introducing an attention mechanism, reducing the calculated amount, paying attention to effective features related to a defect target, restraining complex background interference, improving the detection performance of the defect target, inputting the classification features and the regression features subjected to feature extraction into a double-head detection module, obtaining classification confidence of the classification features and boundary coordinates of the regression features, and adopting a double-head structure to better realize classification and regression tasks, so that the problem that the regression and classification tasks cannot simultaneously reach the optimal due to the fact that a single full connection layer or a convolution layer is adopted in the prior art is solved.

Description

Workpiece defect detection method and device based on attention mechanism and storage medium
Technical Field
The invention relates to the technical field of workpiece surface defect detection, in particular to a workpiece defect detection method and device based on an attention mechanism and a storage medium.
Background
The related algorithms for detecting the defect target on the surface of the workpiece are generally divided into two types: a one-phase algorithm (represented by YOLO) and a two-phase algorithm (represented by fast RCNN). The two-stage algorithm usually needs to complete two tasks of regression and classification, and many classical defect target detection methods all select one of a full connection layer or a convolution layer added at the last of a network to complete the regression and classification tasks, however, for a defect target detection model, the full connection layer is more favorable for classifying the categories of the defect targets, the convolution layer is more favorable for regression of the boundary frames of the defect targets, and only one of the two tasks is adopted, so that the completion condition of the two tasks is not optimal, the defect detection accuracy is low, in addition, the scale change of the surface defects of the workpiece is various, and the complex background interference is serious, so that the existing detection of the surface defects of the workpiece easily receives the background interference.
Disclosure of Invention
In view of the above, an object of the present invention is to provide a method, an apparatus, and a storage medium for workpiece defect detection based on an attention mechanism, so as to solve the problem in the prior art that the completion of two tasks, i.e., regression and classification, is not optimal due to the independent selection of a full connection layer or a convolution layer, which results in a low accuracy of defect detection, and simultaneously solve the problem that the scale change of the workpiece surface defect is various, the complex background interference is severe, and the background interference is easily received in the conventional workpiece surface defect target detection.
According to a first aspect of the embodiments of the present invention, there is provided a method for detecting defects of a workpiece based on an attention mechanism, including:
inputting an image to be detected of a workpiece into a feature extraction network to obtain a feature map of the image to be detected;
inputting the feature map into a region generation module, selecting an anchor frame of the feature map by the region generation module to obtain a suggestion frame, screening the suggestion frame by a preset intersection ratio to obtain a second feature map, taking the second feature map as a classification feature of the feature map, and extracting a boundary coordinate of the second feature map as a regression feature of the feature map;
respectively extracting the classification characteristic and the regression characteristic through an attention stacking module;
inputting the classification features and the regression features after feature extraction into a double-head detection module, processing the classification features by the double-head detection module through a full connection layer to obtain classification confidence, and performing pooling operation and translation zooming operation on the regression features to obtain boundary coordinates of the regression features;
and fusing the classification confidence coefficient and the boundary coordinate to obtain a defect detection result of the image to be detected.
Preferably, the inputting the feature map into the region generating module further includes:
recording the obtained feature graph of the image to be detected as a feature F1, performing up-sampling on the feature F1 to obtain a feature F2, performing up-sampling on the feature F2 to obtain a feature F3, repeating the steps, and recording the feature obtained by one-time up-sampling which meets the preset up-sampling times as the feature F N
Fusing the feature F1 and the feature F2 and inputting the fused feature F2 and the feature F3 into the region generation module, and repeating the steps until the feature F is input N And feature F N-1 Fusing and inputting into a region generation module, feature F N The individual inputs are also input into the region generation module.
Preferably, the first and second electrodes are formed of a metal,
inputting the characteristic diagram into an area generation module, wherein the step of screening the anchor frame of the characteristic diagram by the area generation module to obtain the suggestion frame comprises the following steps:
the region generation module acquires a prediction score and a prediction deviant of defects in each anchor frame of the feature map on the feature map through 1 × 1 convolution, screens out anchor frames with a preset number of thresholds according to the prediction score and the prediction deviant, and takes the screened anchor frames as suggestion frames;
the method comprises the following steps of screening the suggestion frame through a preset intersection ratio to obtain a second feature map, taking the second feature map as a classification feature of the feature map, extracting a boundary coordinate of the second feature map, and using the boundary coordinate as a regression feature of the feature map, wherein the step of screening the suggestion frame comprises the following steps:
selecting a sample with the intersection ratio of the suggestion frame to the defect image being more than 0.5 as a positive sample, and selecting a sample with the intersection ratio of the suggestion frame to the defect image being less than 0.3 as a negative sample;
selecting a preset number of suggestion boxes according to the proportion that the positive sample and the negative sample are 1;
and taking the selected suggestion frame as a second feature map, taking the second feature map as the classification feature of the feature map, and extracting the boundary coordinate of the second feature map as the regression feature of the feature map.
Preferably, the first and second electrodes are formed of a metal,
the step of respectively extracting the classification characteristic and the regression characteristic through the attention stacking module comprises the following steps:
the attention stacking module is formed by stacking a plurality of attention modules;
the classification or regression features are input into the lowest level attention module of the attention stacking module,
the attention module multiplies the classification characteristic or the regression characteristic by a key value storage unit of the attention module to obtain an attention matrix, then standardizes the line of the attention matrix through a softmax function, and standardizes the column of the attention matrix through an L1 norm to obtain a standard attention matrix;
the attention module multiplies the standard attention matrix by a numerical value storage unit of the attention module and adds the input classification characteristic or regression characteristic to obtain a classification reconstruction characteristic or a regression reconstruction characteristic;
and (4) taking the classification reconstruction features or the regression reconstruction features as the input of the next attention module, and repeating the steps until the last attention module outputs the classification reconstruction features or the regression reconstruction features.
Preferably, the first and second electrodes are formed of a metal,
the step of processing the classification features through a full connection layer to obtain classification confidence degrees by using the classification features subjected to feature extraction comprises the following steps:
inputting the classification reconstruction features output by the attention stacking module into a double-head detection module;
the double-head detection module stretches the classification reconstruction features into one-dimensional feature vectors, and the classification confidence of the one-dimensional feature vectors is calculated by utilizing the two full-connection layers.
Preferably, ,
the regression feature is input into the double-head detection module, and the double-head detection module obtains the boundary coordinates of the regression feature by performing pooling operation and translation zooming operation on the regression feature, and the method comprises the following steps:
inputting the regression reconstruction features output by the attention stacking module into a double-head detection module;
the double-head detection module firstly filters regression reconstruction characteristics through four continuous bottleneck layers;
carrying out dimensionality reduction on the filtered regression reconstruction characteristics by adopting average pooling operation, and then respectively carrying out translation operation and scale scaling operation on the bounding boxes of the regression reconstruction characteristics subjected to dimensionality reduction;
and combining the coordinates of the boundary box after the translation operation and the scale scaling operation, namely the boundary coordinates.
According to a second aspect of the embodiments of the present invention, there is provided an attention-based workpiece defect detecting apparatus, comprising:
the feature map extraction module: the system comprises a characteristic extraction network, a characteristic extraction network and a database, wherein the characteristic extraction network is used for inputting an image to be detected of a workpiece into the characteristic extraction network to obtain a characteristic diagram of the image to be detected;
a feature classification module: the system comprises a region generation module, a feature map input module, a classification module and a feature map output module, wherein the region generation module is used for inputting the feature map into the region generation module, the region generation module is used for screening an anchor frame of the feature map to obtain a suggestion frame, the suggestion frame is screened to obtain a second feature map through a preset intersection and comparison, the second feature map is used as a classification feature of the feature map, and a boundary coordinate of the second feature map is extracted to be used as a regression feature of the feature map;
a feature extraction module: the system is used for extracting the classification characteristic and the regression characteristic respectively through an attention stacking module;
a prediction module: the double-head detection module is used for processing the classification features through a full connection layer to obtain classification confidence coefficients, and performing pooling operation and translation zooming operation on the regression features to obtain boundary coordinates of the regression features;
an output module: and the method is used for fusing the classification confidence coefficient and the boundary coordinate to obtain a defect detection result of the image to be detected.
According to a third aspect of embodiments of the present invention, there is provided a storage medium storing a computer program which, when executed by a master, implements the steps of the above-described method.
The technical scheme provided by the embodiment of the invention can have the following beneficial effects:
according to the method, the image to be detected is input into the feature extraction network framework, the feature map of the image to be detected is obtained, the feature map is input into the region generation module, the classification feature and the regression feature of the feature map are obtained, feature extraction is carried out by introducing an attention mechanism, the calculation amount is reduced, meanwhile, the effective feature related to a defect target is concerned, so that the complex background interference is restrained, the detection performance of the defect target is improved, finally, the classification feature and the regression feature after feature extraction are input into the double-head detection module, the classification confidence of the classification feature and the boundary coordinate of the regression feature are obtained, a double-head structure is adopted, classification and regression tasks are better realized, and the problem that the regression and classification tasks cannot simultaneously reach the optimal due to the fact that a single full connection layer or convolution layer is adopted in the prior art is avoided.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
FIG. 1 is a schematic flow diagram illustrating a method for attention-based workpiece defect detection in accordance with an exemplary embodiment;
FIG. 2 is a block diagram of an overall scheme according to another exemplary embodiment;
FIG. 3 is a schematic diagram of an attention module shown in accordance with another exemplary embodiment;
FIG. 4 is a schematic diagram of a dual head configuration shown in accordance with another exemplary embodiment;
FIG. 5 is a system diagram illustrating an attention-based workpiece flaw detection arrangement in accordance with another exemplary embodiment;
in the drawings: the method comprises the following steps of 1-a feature map extraction module, 2-a feature classification module, 3-a feature extraction module, 4-a prediction module and 5-an output module.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
Example one
Fig. 1 is a flowchart illustrating a method for segmenting a defect region of a workpiece by cross-level fusion according to an exemplary embodiment, where the method includes:
s1, inputting an image to be detected of a workpiece into a feature extraction network to obtain a feature map of the image to be detected;
s2, inputting the feature map into a region generation module, selecting an anchor frame of the feature map by the region generation module to obtain a suggestion frame, screening the suggestion frame by a preset intersection ratio to obtain a second feature map, taking the second feature map as a classification feature of the feature map, and extracting a boundary coordinate of the second feature map as a regression feature of the feature map;
s3, respectively extracting the classification features and the regression features through an attention stacking module;
s4, inputting the classification features and the regression features after feature extraction into a double-head detection module, processing the classification features through a full connection layer by the double-head detection module to obtain classification confidence, and performing pooling operation and translation zooming operation on the regression features to obtain boundary coordinates of the regression features;
s5, fusing the classification confidence coefficient and the boundary coordinate to obtain a defect detection result of the image to be detected;
it can be understood that, as shown in fig. 2, in the present application, a ResNet-50 network skeleton is used to obtain a feature map of an image to be detected, then the feature map is input into a region generation module (RPN), a classification feature and a regression feature of the feature map are obtained through the region generation module (RPN), the classification feature and the regression feature, namely Rol, are respectively subjected to feature extraction through a plurality of attention modules, namely the above-mentioned attention stacking module, then the classification feature and the regression feature are input into a dual-head detection module, a classification confidence of the classification feature and a boundary coordinate of the regression feature are obtained, and the classification confidence and the boundary coordinate are fused to output a defect detection result of the image to be detected; in the application, an attention mechanism (attention stacking module) is introduced for feature extraction, effective features related to a defect target are concerned while the calculated amount is reduced, so that complex background interference is suppressed, the detection performance of the defect target is improved, finally, classification features and regression features after feature extraction are input into a double-head detection module, the classification confidence of the classification features and the boundary coordinates of the regression features are obtained, a double-head structure is adopted, classification and regression tasks are better realized, and the problem that the regression and classification tasks cannot reach the optimal simultaneously due to the fact that a single full-connection layer or a single convolution layer is adopted in the prior art is avoided.
Preferably, the inputting the feature map into the region generating module further includes:
recording the obtained characteristic graph of the image to be detected as a characteristic F1, performing up-sampling on the characteristic F1 to obtain a characteristic F2, performing up-sampling on the characteristic F2 to obtain a characteristic F3, repeating the steps to meet the preset up-sampling requirementThe feature obtained by one-time up-sampling of the number of samples is recorded as feature F N
Fusing the feature F1 and the feature F2 and inputting the fused feature F2 and the feature F3 into the region generation module, and repeating the steps until the feature F is input N And feature F N-1 Fusing and inputting into a region generation module, feature F N The individual inputs are also input into the region generation module;
it can be understood that, as shown in fig. 2, in the present application, a Feature extraction Pyramid (FPN) is used to fuse the upsampled high semantic features with the shallow positioning detail features, and the specific process is as follows: recording an obtained feature graph of an image to be detected as a feature F1, performing up-sampling on the feature F1 to obtain a feature F2, performing up-sampling on the feature F2 to obtain a feature F3, performing up-sampling on the feature F3 to obtain a feature F4, repeating the up-sampling process until the preset up-sampling times are met, and obtaining the feature F N The feature F1 and the feature F2 are fused and input to a region generation module (RPN), and the feature F3 and the feature F2 are fused and input to a region generation module (RPN) until the feature F is fused and input to the region generation module (RPN) N And characteristic F N-1 The fusion is performed and input to the region generation module (RPN), it is worth mentioning that the last layer of features F N There are no corresponding upsampling features, which are input separately into the region generation block (RPN).
Preferably, ,
inputting the feature map into an area generation module, wherein the area generation module performs screening on an anchor frame of the feature map to obtain a suggestion frame, and the method comprises the following steps:
the region generation module acquires the prediction score and the prediction deviant of the defect in each anchor frame of the feature map on the feature map through 1 x 1 convolution, screens out anchor frames with a preset number of thresholds according to the prediction score and the prediction deviant, and takes the screened anchor frames as suggestion frames;
the method comprises the following steps of screening the suggestion frame through a preset intersection ratio to obtain a second feature map, taking the second feature map as a classification feature of the feature map, extracting a boundary coordinate of the second feature map, and using the boundary coordinate as a regression feature of the feature map, wherein the step of screening the suggestion frame comprises the following steps:
selecting a positive sample from the suggestion frame with the intersection ratio of more than 0.5 to the defect image, and selecting a negative sample from the suggestion frame with the intersection ratio of less than 0.3 to the defect image;
selecting a preset number of suggestion boxes from high to low according to the proportion of the positive samples to the negative samples being 1;
taking the selected suggestion frame as a second feature map, taking the second feature map as a classification feature of the feature map, extracting a boundary coordinate of the second feature map as a regression feature of the feature map;
it can be understood that each pixel point in the feature map corresponds to a certain number (generally 9) of anchor frames (Anchors), and corresponds to that the original image can basically cover all objects that may appear; then obtaining whether each Anchor contains a prediction score and a prediction deviant of the object to be detected or not on the characteristic diagram by utilizing 1 × 1 convolution, screening out a specified number (2000 during training and 256 during testing) of Anchors according to the prediction score and the prediction deviant, and preliminarily adjusting a better position to obtain a suggestion box (Proposal); during training, since the amount of the propofol is too large, the propofol needs to be evaluated according to an Intersection over Union (IoU), wherein an IoU greater than 0.5 is evaluated as a positive sample, an IoU less than 0.3 is evaluated as a negative sample, and the ratio of the iopus to the IoU is calculated according to a formula 1:3, screening 256 Proposals with higher IoU values as a second feature map by using the positive and negative samples, directly using the second feature map as a classification feature, extracting boundary coordinates from the second feature map, using the boundary coordinates as a regression feature, wherein the Proposal is directly used as the RoI without the step in the test; finally, features of the RoI are pooled to fixed dimensions by Pooling of the area to be treated (RoI Pooling).
Preferably, the first and second electrodes are formed of a metal,
the step of respectively extracting the classification features and the regression features through the attention stacking module comprises the following steps:
the attention stacking module is formed by stacking a plurality of attention modules;
the classification or regression features are input into the lowest level attention module of the attention stacking module,
the attention module multiplies the classification characteristic or the regression characteristic by a key value storage unit of the attention module to obtain an attention matrix, then standardizes the line of the attention matrix through a softmax function, and standardizes the column of the attention matrix through an L1 norm to obtain a standard attention matrix;
the attention module multiplies the standard attention matrix by a numerical value storage unit of the attention module and adds the input classification characteristic or regression characteristic to obtain a classification reconstruction characteristic or a regression reconstruction characteristic;
taking the classification reconstruction characteristics or the regression reconstruction characteristics as the input of the next attention module, and repeating the steps until the last attention module outputs the classification reconstruction characteristics or the regression reconstruction characteristics;
it is to be understood that the attention-stacking module is a stack of N attention modules, and the architecture of a single attention module is shown in FIG. 3, and the classification or regression features, collectively referred to herein as the Rol feature, are generally referred to as the RoI feature
Figure 874277DEST_PATH_IMAGE001
Wherein X represents the RoI feature, s is the RoI feature number in one picture, c is the RoI channel number, h and w are the RoI height and width respectively, and the above formula can be reshaped into
Figure 241804DEST_PATH_IMAGE002
Two external storage units, key value storage units, are defined in each attention module
Figure 421113DEST_PATH_IMAGE003
And a numerical value storage unit
Figure 630990DEST_PATH_IMAGE004
First calculate the RoI and
Figure 212144DEST_PATH_IMAGE005
the calculation formula of the attention matrix A between the two is as follows:
Figure 484994DEST_PATH_IMAGE006
in the formula,
Figure 151598DEST_PATH_IMAGE007
for double normalization, the rows and columns of the matrix are normalized: the rows of matrix A are normalized using the softmax function, using the L1 norm
Figure 168096DEST_PATH_IMAGE008
Standardizing the columns of the matrix A, and then calculating an attribute matrix A and
Figure 603756DEST_PATH_IMAGE009
the original RoI characteristics are added to the product of (a) to (b) to reconstruct the characteristics, which are expressed as follows:
Figure 313086DEST_PATH_IMAGE010
wherein,
Figure 263725DEST_PATH_IMAGE011
namely, the reconstruction characteristics can be obtained by stacking a plurality of attention modules, so that the RoI characteristics can be more fully extracted, and the model expression is better;
in the above, the softmax function is used in the classification process to realize multi-classification, and briefly, it maps some output neurons to real numbers between (0-1), and the normalization ensures that the sum is 1, so that the sum of the probabilities of the multi-classification is also exactly 1, L1 norm regularization is an important means in machine learning, and in the learning process of the support vector machine, it is actually a process of solving the optimum for the cost function, so that L1 norm regularization makes the result obtained by learning satisfy sparseness by adding L1 norm to the cost function, thereby facilitating people to extract features.
Preferably, the first and second electrodes are formed of a metal,
the step of processing the classification features through a full connection layer to obtain a classification confidence coefficient comprises the following steps:
inputting the classification reconstruction features output by the attention stacking module into a double-head detection module;
the double-head detection module stretches the classification reconstruction features into one-dimensional feature vectors, and calculates the classification confidence of the one-dimensional feature vectors by utilizing two full-connection layers;
it can be understood that the structure of the dual-head detection module is as shown in fig. 4, after the reconstruction of the RoI features is completed, the reconstructed RoI features are input into the dual-head detection module, and if the RoI features are classified and reconstructed, the classified and reconstructed features are firstly stretched into one-dimensional feature vectors, and then the classification confidence is calculated through two Fc layers, i.e., full connection layers.
Preferably, the first and second electrodes are formed of a metal,
the regression feature is input into the double-head detection module, and the double-head detection module obtains the boundary coordinates of the regression feature by performing pooling operation and translation zooming operation on the regression feature, and the method comprises the following steps:
inputting the regression reconstruction features output by the attention stacking module into a double-head detection module;
the double-head detection module firstly filters regression reconstruction characteristics through four continuous bottleneck layers;
carrying out dimensionality reduction on the filtered regression reconstruction characteristics by adopting average pooling operation, and then respectively carrying out translation operation and scale scaling operation on the bounding boxes of the regression reconstruction characteristics subjected to dimensionality reduction;
combining the coordinates of the boundary box after the translation operation and the scale scaling operation, namely the coordinates of the boundary;
it can be understood that, as shown in fig. 4, if the regression reconstruction feature is input, after reducing the calculation amount by using 4 BottleNeck layers (BottleNeck layers) and performing dimensionality reduction by using average pooling, the translation and scaling value of the bounding box of the regression reconstruction feature is predicted, and the specific process of translation and scaling is as follows:
the bounding box of the RoI feature is generally denoted as
Figure 83913DEST_PATH_IMAGE012
First, make a translation
Figure 371151DEST_PATH_IMAGE013
Wherein:
Figure 313699DEST_PATH_IMAGE014
the translation prediction value may be expressed as:
Figure 689317DEST_PATH_IMAGE015
then scaling is carried out
Figure 109934DEST_PATH_IMAGE016
Wherein:
Figure 520186DEST_PATH_IMAGE017
the scale scaling prediction value can be expressed as:
Figure 571319DEST_PATH_IMAGE018
wherein,
Figure 496550DEST_PATH_IMAGE019
Figure 658541DEST_PATH_IMAGE020
Figure 720038DEST_PATH_IMAGE021
Figure 942072DEST_PATH_IMAGE022
the parameters are parameters to be trained based on the difference between the coordinates of the characteristic diagram and the actual coordinates of the target, and the parameters are finally obtained
Figure 354598DEST_PATH_IMAGE023
I.e. the boundary coordinates obtained by predicting the regression branch, and integrating the results of the regression branch and the classification branchThe purpose of detecting the defect target is achieved.
Example two
FIG. 5 is a system diagram illustrating an attention-based workpiece flaw detection arrangement according to another exemplary embodiment, including:
feature map extraction module 1: the system comprises a feature extraction network, a feature extraction network and a feature extraction network, wherein the feature extraction network is used for inputting an image to be detected of a workpiece to obtain a feature map of the image to be detected;
the feature classification module 2: the system comprises a region generation module, a feature map input module, a classification module and a feature map output module, wherein the region generation module is used for inputting the feature map into the region generation module, the region generation module is used for screening an anchor frame of the feature map to obtain a suggestion frame, the suggestion frame is screened to obtain a second feature map through a preset intersection and comparison, the second feature map is used as a classification feature of the feature map, and a boundary coordinate of the second feature map is extracted to be used as a regression feature of the feature map;
the feature extraction module 3: the system is used for extracting the classification characteristic and the regression characteristic respectively through an attention stacking module;
the prediction module 4: the double-head detection module is used for processing the classification features through a full connection layer to obtain classification confidence coefficients, and performing pooling operation and translation zooming operation on the regression features to obtain boundary coordinates of the regression features;
an output module 5: the image defect detection system is used for fusing the classification confidence coefficient and the boundary coordinate to obtain a defect detection result of the image to be detected;
it can be understood that, the feature map extraction module 1 inputs the image to be detected into the feature extraction network framework to obtain a feature map of the image to be detected; inputting the feature map into a region generation module through a feature classification module 2 to obtain classification features and regression features of the feature map; the classification characteristic and the regression characteristic are respectively subjected to characteristic extraction through an attention stacking module by a characteristic extraction module 3; inputting the classification features and the regression features subjected to feature extraction into a double-head detection module through a prediction module 4, and acquiring the classification confidence of the classification features and the boundary coordinates of the regression features; the output module 5 fuses the classification confidence and the boundary coordinates to output a defect detection result of the image to be detected; in the application, an attention mechanism (attention stacking module) is introduced for feature extraction, effective features related to a defect target are concerned while the calculated amount is reduced, so that complex background interference is suppressed, the detection performance of the defect target is improved, finally, classification features and regression features after feature extraction are input into a double-head detection module, the classification confidence of the classification features and the boundary coordinates of the regression features are obtained, a double-head structure is adopted, classification and regression tasks are better realized, and the problem that the regression and classification tasks cannot reach the optimal simultaneously due to the fact that a single full-connection layer or a single convolution layer is adopted in the prior art is avoided.
Example three:
the present embodiment provides a storage medium, which stores a computer program, when executed by a master controller, implementing the steps of the above method;
it will be appreciated that the storage medium referred to above may be a read-only memory, a magnetic or optical disk, or the like.
It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar contents in other embodiments may be referred to for the contents which are not described in detail in some embodiments.
It should be noted that the terms "first," "second," and the like in the description of the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, in the description of the present invention, the meaning of "a plurality" means at least two unless otherwise specified.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.
It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.
In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.
The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.
In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
Although embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are exemplary and not to be construed as limiting the present invention, and that changes, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims (8)

1. The method for detecting the defects of the workpiece based on the attention mechanism is characterized by comprising the following steps:
inputting an image to be detected of a workpiece into a feature extraction network to obtain a feature map of the image to be detected;
inputting the feature map into a region generation module, selecting an anchor frame of the feature map by the region generation module to obtain a suggestion frame, screening the suggestion frame by a preset intersection ratio to obtain a second feature map, taking the second feature map as a classification feature of the feature map, and extracting a boundary coordinate of the second feature map as a regression feature of the feature map;
respectively extracting the classification characteristic and the regression characteristic through an attention stacking module;
inputting the classification features and regression features after feature extraction into a double-head detection module, processing the classification features by the double-head detection module through a full connection layer to obtain classification confidence, and performing pooling operation and translation scaling operation on the regression features to obtain boundary coordinates of the regression features;
and fusing the classification confidence coefficient and the boundary coordinate to obtain a defect detection result of the image to be detected.
2. The method of claim 1, wherein inputting the feature map into a region generation module further comprises:
the obtained characteristic graph of the image to be detected is taken as a characteristic F1, the characteristic F1 is up-sampled to obtain a characteristic F2,the feature F2 is subjected to upsampling to obtain a feature F3, the steps are repeated, and the feature obtained by one-time upsampling which meets the preset upsampling times is recorded as the feature F N
Fusing the characteristics F1 and F2 and inputting the fused characteristics into the region generation module, fusing the characteristics F2 and F3 and inputting the fused characteristics into the region generation module, and repeating the steps until the characteristics F are input N And feature F N-1 Fusing and inputting into a region generation module, feature F N A separate input is also made to the region generation module.
3. The method of claim 2,
inputting the feature map into an area generation module, wherein the area generation module performs screening on an anchor frame of the feature map to obtain a suggestion frame, and the method comprises the following steps:
the region generation module acquires the prediction score and the prediction deviant of the defect in each anchor frame of the feature map on the feature map through 1 x 1 convolution, screens out anchor frames with a preset number of thresholds according to the prediction score and the prediction deviant, and takes the screened anchor frames as suggestion frames;
screening the suggestion frame through a preset intersection-parallel ratio to obtain a second feature map, taking the second feature map as a classification feature of the feature map, extracting boundary coordinates of the second feature map, and taking the boundary coordinates as a regression feature of the feature map, wherein the step of screening the suggestion frame comprises the following steps:
selecting a sample with the intersection ratio of the suggestion frame to the defect image being more than 0.5 as a positive sample, and selecting a sample with the intersection ratio of the suggestion frame to the defect image being less than 0.3 as a negative sample;
selecting a preset number of suggestion boxes according to the proportion that the positive sample and the negative sample are 1;
and taking the selected suggestion frame as a second feature map, taking the second feature map as the classification feature of the feature map, and extracting the boundary coordinate of the second feature map as the regression feature of the feature map.
4. The method of claim 1,
the step of respectively extracting the classification features and the regression features through the attention stacking module comprises the following steps:
the attention stacking module is composed of a plurality of attention modules;
the classification or regression features are input into the lowest level attention module of the attention stacking module,
the attention module multiplies the classification characteristic or the regression characteristic by a key value storage unit of the attention module to obtain an attention matrix, then standardizes the line of the attention matrix through a softmax function, and standardizes the column of the attention matrix through an L1 norm to obtain a standard attention matrix;
the attention module multiplies the standard attention matrix by a numerical value storage unit of the attention matrix and adds the input classification characteristic or regression characteristic to obtain a classification reconstruction characteristic or regression reconstruction characteristic;
and (4) taking the classification reconstruction features or the regression reconstruction features as the input of the next attention module, and repeating the steps until the last attention module outputs the classification reconstruction features or the regression reconstruction features.
5. The method of claim 4,
the step of processing the classification features through a full connection layer to obtain classification confidence degrees by using the classification features subjected to feature extraction comprises the following steps:
inputting the classification reconstruction features output by the attention stacking module into a double-head detection module;
the double-head detection module stretches the classification reconstruction features into one-dimensional feature vectors, and the classification confidence of the one-dimensional feature vectors is calculated by utilizing two full-connection layers.
6. The method of claim 4,
the regression feature is input into the double-head detection module, and the double-head detection module obtains the boundary coordinates of the regression feature by performing pooling operation and translation zooming operation on the regression feature, and the method comprises the following steps:
inputting the regression reconstruction features output by the attention stacking module into a double-head detection module;
the double-head detection module firstly filters regression reconstruction characteristics through four continuous bottleneck layers;
carrying out dimensionality reduction on the filtered regression reconstruction characteristics by adopting average pooling operation, and then respectively carrying out translation operation and scale scaling operation on the bounding boxes of the regression reconstruction characteristics subjected to dimensionality reduction;
and combining the coordinates of the boundary box after the translation operation and the scale scaling operation, namely the coordinates of the boundary.
7. An attention-based system workpiece defect detection apparatus, the apparatus comprising:
the feature map extraction module: the system comprises a characteristic extraction network, a characteristic extraction network and a database, wherein the characteristic extraction network is used for inputting an image to be detected of a workpiece into the characteristic extraction network to obtain a characteristic diagram of the image to be detected;
a feature classification module: the system comprises a region generation module, a feature map input module, a classification module and a feature map output module, wherein the region generation module is used for inputting the feature map into the region generation module, the region generation module is used for screening an anchor frame of the feature map to obtain a suggestion frame, the suggestion frame is screened to obtain a second feature map through a preset intersection and comparison, the second feature map is used as a classification feature of the feature map, and a boundary coordinate of the second feature map is extracted to be used as a regression feature of the feature map;
a feature extraction module: the system is used for extracting the classification characteristic and the regression characteristic respectively through an attention stacking module;
a prediction module: the double-head detection module is used for processing the classification features through a full connection layer to obtain classification confidence coefficients, and performing pooling operation and translation zooming operation on the regression features to obtain boundary coordinates of the regression features;
an output module: and the method is used for fusing the classification confidence coefficient and the boundary coordinate to obtain a defect detection result of the image to be detected.
8. A storage medium, characterized in that the storage medium stores a computer program which, when executed by a master controller, implements the steps of the attention mechanism-based workpiece defect detection method according to any one of claims 1-6.
CN202211552964.8A 2022-12-06 2022-12-06 Workpiece defect detection method and device based on attention mechanism and storage medium Pending CN115631193A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211552964.8A CN115631193A (en) 2022-12-06 2022-12-06 Workpiece defect detection method and device based on attention mechanism and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211552964.8A CN115631193A (en) 2022-12-06 2022-12-06 Workpiece defect detection method and device based on attention mechanism and storage medium

Publications (1)

Publication Number Publication Date
CN115631193A true CN115631193A (en) 2023-01-20

Family

ID=84909703

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211552964.8A Pending CN115631193A (en) 2022-12-06 2022-12-06 Workpiece defect detection method and device based on attention mechanism and storage medium

Country Status (1)

Country Link
CN (1) CN115631193A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111105413A (en) * 2019-12-31 2020-05-05 哈尔滨工程大学 Intelligent spark plug appearance defect detection system
CN113160139A (en) * 2021-03-24 2021-07-23 华南理工大学 Attention-based steel plate surface defect detection method of Faster R-CNN network
CN113420729A (en) * 2021-08-23 2021-09-21 城云科技(中国)有限公司 Multi-scale target detection method, model, electronic equipment and application thereof
WO2022036953A1 (en) * 2020-08-19 2022-02-24 上海商汤智能科技有限公司 Defect detection method and related apparatus, device, storage medium, and computer program product
WO2022160170A1 (en) * 2021-01-28 2022-08-04 东莞职业技术学院 Method and apparatus for detecting metal surface defects

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111105413A (en) * 2019-12-31 2020-05-05 哈尔滨工程大学 Intelligent spark plug appearance defect detection system
WO2022036953A1 (en) * 2020-08-19 2022-02-24 上海商汤智能科技有限公司 Defect detection method and related apparatus, device, storage medium, and computer program product
WO2022160170A1 (en) * 2021-01-28 2022-08-04 东莞职业技术学院 Method and apparatus for detecting metal surface defects
CN113160139A (en) * 2021-03-24 2021-07-23 华南理工大学 Attention-based steel plate surface defect detection method of Faster R-CNN network
CN113420729A (en) * 2021-08-23 2021-09-21 城云科技(中国)有限公司 Multi-scale target detection method, model, electronic equipment and application thereof

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
王宪保 等: "基于改进Faster RCNN的目标检测方法" *
王翔等: "基于多注意力机制的深度神经网络故障诊断算法", 《浙江理工大学学报(自然科学版)》 *

Similar Documents

Publication Publication Date Title
CN109902677B (en) Vehicle detection method based on deep learning
CN108830285B (en) Target detection method for reinforcement learning based on fast-RCNN
CN110189255B (en) Face detection method based on two-stage detection
CN111445478B (en) Automatic intracranial aneurysm region detection system and detection method for CTA image
Xu et al. Scale-aware feature pyramid architecture for marine object detection
CN108830326B (en) Automatic segmentation method and device for MRI (magnetic resonance imaging) image
CN111260055B (en) Model training method based on three-dimensional image recognition, storage medium and device
WO2023070447A1 (en) Model training method, image processing method, computing processing device, and non-transitory computer readable medium
US20230343078A1 (en) Automated defect classification and detection
CN111353544B (en) Improved Mixed Pooling-YOLOV 3-based target detection method
Ahmadi et al. Context-aware saliency detection for image retargeting using convolutional neural networks
WO2023116632A1 (en) Video instance segmentation method and apparatus based on spatio-temporal memory information
CN112465759A (en) Convolutional neural network-based aeroengine blade defect detection method
CN114627052A (en) Infrared image air leakage and liquid leakage detection method and system based on deep learning
CN117854072B (en) Automatic labeling method for industrial visual defects
CN114022718B (en) Digestive system pathological image recognition method, system and computer storage medium
CN116645592B (en) Crack detection method based on image processing and storage medium
Lopez Droguett et al. Semantic segmentation model for crack images from concrete bridges for mobile devices
CN113221731B (en) Multi-scale remote sensing image target detection method and system
CN114782311A (en) Improved multi-scale defect target detection method and system based on CenterNet
Heinrich et al. Demystifying the black box: A classification scheme for interpretation and visualization of deep intelligent systems
CN116758340A (en) Small target detection method based on super-resolution feature pyramid and attention mechanism
CN115439718A (en) Industrial detection method, system and storage medium combining supervised learning and feature matching technology
CN114494786A (en) Fine-grained image classification method based on multilayer coordination convolutional neural network
JP2024507637A (en) Method and apparatus for grading images of collectibles using image segmentation and image analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20230120

RJ01 Rejection of invention patent application after publication