CN115631193A

CN115631193A - Workpiece defect detection method and device based on attention mechanism and storage medium

Info

Publication number: CN115631193A
Application number: CN202211552964.8A
Authority: CN
Inventors: 李朋超; 杨庆泰; 籍吉川
Original assignee: Beijing Jushi Intelligent Technology Co ltd
Current assignee: Beijing Jushi Intelligent Technology Co ltd
Priority date: 2022-12-06
Filing date: 2022-12-06
Publication date: 2023-01-20

Abstract

The invention relates to a method and a device for detecting workpiece defects based on an attention mechanism and a storage medium, which are applied to the technical field of workpiece surface defect detection and comprise the following steps: the method comprises the steps of inputting an image to be detected into a feature extraction network framework, obtaining a feature map of the image to be detected, inputting the feature map into a region generation module, obtaining classification features and regression features of the feature map, extracting the features by introducing an attention mechanism, reducing the calculated amount, paying attention to effective features related to a defect target, restraining complex background interference, improving the detection performance of the defect target, inputting the classification features and the regression features subjected to feature extraction into a double-head detection module, obtaining classification confidence of the classification features and boundary coordinates of the regression features, and adopting a double-head structure to better realize classification and regression tasks, so that the problem that the regression and classification tasks cannot simultaneously reach the optimal due to the fact that a single full connection layer or a convolution layer is adopted in the prior art is solved.

Description

Workpiece defect detection method and device based on attention mechanism and storage medium

Technical Field

The invention relates to the technical field of workpiece surface defect detection, in particular to a workpiece defect detection method and device based on an attention mechanism and a storage medium.

Background

The related algorithms for detecting the defect target on the surface of the workpiece are generally divided into two types: a one-phase algorithm (represented by YOLO) and a two-phase algorithm (represented by fast RCNN). The two-stage algorithm usually needs to complete two tasks of regression and classification, and many classical defect target detection methods all select one of a full connection layer or a convolution layer added at the last of a network to complete the regression and classification tasks, however, for a defect target detection model, the full connection layer is more favorable for classifying the categories of the defect targets, the convolution layer is more favorable for regression of the boundary frames of the defect targets, and only one of the two tasks is adopted, so that the completion condition of the two tasks is not optimal, the defect detection accuracy is low, in addition, the scale change of the surface defects of the workpiece is various, and the complex background interference is serious, so that the existing detection of the surface defects of the workpiece easily receives the background interference.

Disclosure of Invention

In view of the above, an object of the present invention is to provide a method, an apparatus, and a storage medium for workpiece defect detection based on an attention mechanism, so as to solve the problem in the prior art that the completion of two tasks, i.e., regression and classification, is not optimal due to the independent selection of a full connection layer or a convolution layer, which results in a low accuracy of defect detection, and simultaneously solve the problem that the scale change of the workpiece surface defect is various, the complex background interference is severe, and the background interference is easily received in the conventional workpiece surface defect target detection.

According to a first aspect of the embodiments of the present invention, there is provided a method for detecting defects of a workpiece based on an attention mechanism, including:

inputting an image to be detected of a workpiece into a feature extraction network to obtain a feature map of the image to be detected;

inputting the feature map into a region generation module, selecting an anchor frame of the feature map by the region generation module to obtain a suggestion frame, screening the suggestion frame by a preset intersection ratio to obtain a second feature map, taking the second feature map as a classification feature of the feature map, and extracting a boundary coordinate of the second feature map as a regression feature of the feature map;

respectively extracting the classification characteristic and the regression characteristic through an attention stacking module;

inputting the classification features and the regression features after feature extraction into a double-head detection module, processing the classification features by the double-head detection module through a full connection layer to obtain classification confidence, and performing pooling operation and translation zooming operation on the regression features to obtain boundary coordinates of the regression features;

and fusing the classification confidence coefficient and the boundary coordinate to obtain a defect detection result of the image to be detected.

Preferably, the inputting the feature map into the region generating module further includes:

recording the obtained feature graph of the image to be detected as a feature F1, performing up-sampling on the feature F1 to obtain a feature F2, performing up-sampling on the feature F2 to obtain a feature F3, repeating the steps, and recording the feature obtained by one-time up-sampling which meets the preset up-sampling times as the feature F _N ；

Fusing the feature F1 and the feature F2 and inputting the fused feature F2 and the feature F3 into the region generation module, and repeating the steps until the feature F is input _N And feature F _N-1 Fusing and inputting into a region generation module, feature F _N The individual inputs are also input into the region generation module.

Preferably, the first and second electrodes are formed of a metal,

inputting the characteristic diagram into an area generation module, wherein the step of screening the anchor frame of the characteristic diagram by the area generation module to obtain the suggestion frame comprises the following steps:

the region generation module acquires a prediction score and a prediction deviant of defects in each anchor frame of the feature map on the feature map through 1 × 1 convolution, screens out anchor frames with a preset number of thresholds according to the prediction score and the prediction deviant, and takes the screened anchor frames as suggestion frames;

the method comprises the following steps of screening the suggestion frame through a preset intersection ratio to obtain a second feature map, taking the second feature map as a classification feature of the feature map, extracting a boundary coordinate of the second feature map, and using the boundary coordinate as a regression feature of the feature map, wherein the step of screening the suggestion frame comprises the following steps:

selecting a sample with the intersection ratio of the suggestion frame to the defect image being more than 0.5 as a positive sample, and selecting a sample with the intersection ratio of the suggestion frame to the defect image being less than 0.3 as a negative sample;

selecting a preset number of suggestion boxes according to the proportion that the positive sample and the negative sample are 1;

and taking the selected suggestion frame as a second feature map, taking the second feature map as the classification feature of the feature map, and extracting the boundary coordinate of the second feature map as the regression feature of the feature map.

Preferably, the first and second electrodes are formed of a metal,

the step of respectively extracting the classification characteristic and the regression characteristic through the attention stacking module comprises the following steps:

the attention stacking module is formed by stacking a plurality of attention modules;

the classification or regression features are input into the lowest level attention module of the attention stacking module,

the attention module multiplies the classification characteristic or the regression characteristic by a key value storage unit of the attention module to obtain an attention matrix, then standardizes the line of the attention matrix through a softmax function, and standardizes the column of the attention matrix through an L1 norm to obtain a standard attention matrix;

the attention module multiplies the standard attention matrix by a numerical value storage unit of the attention module and adds the input classification characteristic or regression characteristic to obtain a classification reconstruction characteristic or a regression reconstruction characteristic;

and (4) taking the classification reconstruction features or the regression reconstruction features as the input of the next attention module, and repeating the steps until the last attention module outputs the classification reconstruction features or the regression reconstruction features.

Preferably, the first and second electrodes are formed of a metal,

the step of processing the classification features through a full connection layer to obtain classification confidence degrees by using the classification features subjected to feature extraction comprises the following steps:

inputting the classification reconstruction features output by the attention stacking module into a double-head detection module;

the double-head detection module stretches the classification reconstruction features into one-dimensional feature vectors, and the classification confidence of the one-dimensional feature vectors is calculated by utilizing the two full-connection layers.

Preferably, ,

the regression feature is input into the double-head detection module, and the double-head detection module obtains the boundary coordinates of the regression feature by performing pooling operation and translation zooming operation on the regression feature, and the method comprises the following steps:

inputting the regression reconstruction features output by the attention stacking module into a double-head detection module;

the double-head detection module firstly filters regression reconstruction characteristics through four continuous bottleneck layers;

carrying out dimensionality reduction on the filtered regression reconstruction characteristics by adopting average pooling operation, and then respectively carrying out translation operation and scale scaling operation on the bounding boxes of the regression reconstruction characteristics subjected to dimensionality reduction;

and combining the coordinates of the boundary box after the translation operation and the scale scaling operation, namely the boundary coordinates.

According to a second aspect of the embodiments of the present invention, there is provided an attention-based workpiece defect detecting apparatus, comprising:

the feature map extraction module: the system comprises a characteristic extraction network, a characteristic extraction network and a database, wherein the characteristic extraction network is used for inputting an image to be detected of a workpiece into the characteristic extraction network to obtain a characteristic diagram of the image to be detected;

a feature classification module: the system comprises a region generation module, a feature map input module, a classification module and a feature map output module, wherein the region generation module is used for inputting the feature map into the region generation module, the region generation module is used for screening an anchor frame of the feature map to obtain a suggestion frame, the suggestion frame is screened to obtain a second feature map through a preset intersection and comparison, the second feature map is used as a classification feature of the feature map, and a boundary coordinate of the second feature map is extracted to be used as a regression feature of the feature map;

a feature extraction module: the system is used for extracting the classification characteristic and the regression characteristic respectively through an attention stacking module;

a prediction module: the double-head detection module is used for processing the classification features through a full connection layer to obtain classification confidence coefficients, and performing pooling operation and translation zooming operation on the regression features to obtain boundary coordinates of the regression features;

an output module: and the method is used for fusing the classification confidence coefficient and the boundary coordinate to obtain a defect detection result of the image to be detected.

According to a third aspect of embodiments of the present invention, there is provided a storage medium storing a computer program which, when executed by a master, implements the steps of the above-described method.

The technical scheme provided by the embodiment of the invention can have the following beneficial effects:

according to the method, the image to be detected is input into the feature extraction network framework, the feature map of the image to be detected is obtained, the feature map is input into the region generation module, the classification feature and the regression feature of the feature map are obtained, feature extraction is carried out by introducing an attention mechanism, the calculation amount is reduced, meanwhile, the effective feature related to a defect target is concerned, so that the complex background interference is restrained, the detection performance of the defect target is improved, finally, the classification feature and the regression feature after feature extraction are input into the double-head detection module, the classification confidence of the classification feature and the boundary coordinate of the regression feature are obtained, a double-head structure is adopted, classification and regression tasks are better realized, and the problem that the regression and classification tasks cannot simultaneously reach the optimal due to the fact that a single full connection layer or convolution layer is adopted in the prior art is avoided.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.

FIG. 1 is a schematic flow diagram illustrating a method for attention-based workpiece defect detection in accordance with an exemplary embodiment;

FIG. 2 is a block diagram of an overall scheme according to another exemplary embodiment;

FIG. 3 is a schematic diagram of an attention module shown in accordance with another exemplary embodiment;

FIG. 4 is a schematic diagram of a dual head configuration shown in accordance with another exemplary embodiment;

FIG. 5 is a system diagram illustrating an attention-based workpiece flaw detection arrangement in accordance with another exemplary embodiment;

in the drawings: the method comprises the following steps of 1-a feature map extraction module, 2-a feature classification module, 3-a feature extraction module, 4-a prediction module and 5-an output module.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.

Example one

Fig. 1 is a flowchart illustrating a method for segmenting a defect region of a workpiece by cross-level fusion according to an exemplary embodiment, where the method includes:

s1, inputting an image to be detected of a workpiece into a feature extraction network to obtain a feature map of the image to be detected;

s2, inputting the feature map into a region generation module, selecting an anchor frame of the feature map by the region generation module to obtain a suggestion frame, screening the suggestion frame by a preset intersection ratio to obtain a second feature map, taking the second feature map as a classification feature of the feature map, and extracting a boundary coordinate of the second feature map as a regression feature of the feature map;

s3, respectively extracting the classification features and the regression features through an attention stacking module;

s4, inputting the classification features and the regression features after feature extraction into a double-head detection module, processing the classification features through a full connection layer by the double-head detection module to obtain classification confidence, and performing pooling operation and translation zooming operation on the regression features to obtain boundary coordinates of the regression features;

s5, fusing the classification confidence coefficient and the boundary coordinate to obtain a defect detection result of the image to be detected;

it can be understood that, as shown in fig. 2, in the present application, a ResNet-50 network skeleton is used to obtain a feature map of an image to be detected, then the feature map is input into a region generation module (RPN), a classification feature and a regression feature of the feature map are obtained through the region generation module (RPN), the classification feature and the regression feature, namely Rol, are respectively subjected to feature extraction through a plurality of attention modules, namely the above-mentioned attention stacking module, then the classification feature and the regression feature are input into a dual-head detection module, a classification confidence of the classification feature and a boundary coordinate of the regression feature are obtained, and the classification confidence and the boundary coordinate are fused to output a defect detection result of the image to be detected; in the application, an attention mechanism (attention stacking module) is introduced for feature extraction, effective features related to a defect target are concerned while the calculated amount is reduced, so that complex background interference is suppressed, the detection performance of the defect target is improved, finally, classification features and regression features after feature extraction are input into a double-head detection module, the classification confidence of the classification features and the boundary coordinates of the regression features are obtained, a double-head structure is adopted, classification and regression tasks are better realized, and the problem that the regression and classification tasks cannot reach the optimal simultaneously due to the fact that a single full-connection layer or a single convolution layer is adopted in the prior art is avoided.

recording the obtained characteristic graph of the image to be detected as a characteristic F1, performing up-sampling on the characteristic F1 to obtain a characteristic F2, performing up-sampling on the characteristic F2 to obtain a characteristic F3, repeating the steps to meet the preset up-sampling requirementThe feature obtained by one-time up-sampling of the number of samples is recorded as feature F _N ；

Fusing the feature F1 and the feature F2 and inputting the fused feature F2 and the feature F3 into the region generation module, and repeating the steps until the feature F is input _N And feature F _N-1 Fusing and inputting into a region generation module, feature F _N The individual inputs are also input into the region generation module;

it can be understood that, as shown in fig. 2, in the present application, a Feature extraction Pyramid (FPN) is used to fuse the upsampled high semantic features with the shallow positioning detail features, and the specific process is as follows: recording an obtained feature graph of an image to be detected as a feature F1, performing up-sampling on the feature F1 to obtain a feature F2, performing up-sampling on the feature F2 to obtain a feature F3, performing up-sampling on the feature F3 to obtain a feature F4, repeating the up-sampling process until the preset up-sampling times are met, and obtaining the feature F _N The feature F1 and the feature F2 are fused and input to a region generation module (RPN), and the feature F3 and the feature F2 are fused and input to a region generation module (RPN) until the feature F is fused and input to the region generation module (RPN) _N And characteristic F _N-1 The fusion is performed and input to the region generation module (RPN), it is worth mentioning that the last layer of features F _N There are no corresponding upsampling features, which are input separately into the region generation block (RPN).

Preferably, ,

inputting the feature map into an area generation module, wherein the area generation module performs screening on an anchor frame of the feature map to obtain a suggestion frame, and the method comprises the following steps:

the region generation module acquires the prediction score and the prediction deviant of the defect in each anchor frame of the feature map on the feature map through 1 x 1 convolution, screens out anchor frames with a preset number of thresholds according to the prediction score and the prediction deviant, and takes the screened anchor frames as suggestion frames;

selecting a positive sample from the suggestion frame with the intersection ratio of more than 0.5 to the defect image, and selecting a negative sample from the suggestion frame with the intersection ratio of less than 0.3 to the defect image;

selecting a preset number of suggestion boxes from high to low according to the proportion of the positive samples to the negative samples being 1;

taking the selected suggestion frame as a second feature map, taking the second feature map as a classification feature of the feature map, extracting a boundary coordinate of the second feature map as a regression feature of the feature map;

it can be understood that each pixel point in the feature map corresponds to a certain number (generally 9) of anchor frames (Anchors), and corresponds to that the original image can basically cover all objects that may appear; then obtaining whether each Anchor contains a prediction score and a prediction deviant of the object to be detected or not on the characteristic diagram by utilizing 1 × 1 convolution, screening out a specified number (2000 during training and 256 during testing) of Anchors according to the prediction score and the prediction deviant, and preliminarily adjusting a better position to obtain a suggestion box (Proposal); during training, since the amount of the propofol is too large, the propofol needs to be evaluated according to an Intersection over Union (IoU), wherein an IoU greater than 0.5 is evaluated as a positive sample, an IoU less than 0.3 is evaluated as a negative sample, and the ratio of the iopus to the IoU is calculated according to a formula 1:3, screening 256 Proposals with higher IoU values as a second feature map by using the positive and negative samples, directly using the second feature map as a classification feature, extracting boundary coordinates from the second feature map, using the boundary coordinates as a regression feature, wherein the Proposal is directly used as the RoI without the step in the test; finally, features of the RoI are pooled to fixed dimensions by Pooling of the area to be treated (RoI Pooling).

Preferably, the first and second electrodes are formed of a metal,

the step of respectively extracting the classification features and the regression features through the attention stacking module comprises the following steps:

taking the classification reconstruction characteristics or the regression reconstruction characteristics as the input of the next attention module, and repeating the steps until the last attention module outputs the classification reconstruction characteristics or the regression reconstruction characteristics;

it is to be understood that the attention-stacking module is a stack of N attention modules, and the architecture of a single attention module is shown in FIG. 3, and the classification or regression features, collectively referred to herein as the Rol feature, are generally referred to as the RoI feature

Wherein X represents the RoI feature, s is the RoI feature number in one picture, c is the RoI channel number, h and w are the RoI height and width respectively, and the above formula can be reshaped into

Two external storage units, key value storage units, are defined in each attention module

And a numerical value storage unit

First calculate the RoI and

the calculation formula of the attention matrix A between the two is as follows:

in the formula,

for double normalization, the rows and columns of the matrix are normalized: the rows of matrix A are normalized using the softmax function, using the L1 norm

Standardizing the columns of the matrix A, and then calculating an attribute matrix A and

the original RoI characteristics are added to the product of (a) to (b) to reconstruct the characteristics, which are expressed as follows:

wherein,

namely, the reconstruction characteristics can be obtained by stacking a plurality of attention modules, so that the RoI characteristics can be more fully extracted, and the model expression is better;

in the above, the softmax function is used in the classification process to realize multi-classification, and briefly, it maps some output neurons to real numbers between (0-1), and the normalization ensures that the sum is 1, so that the sum of the probabilities of the multi-classification is also exactly 1, L1 norm regularization is an important means in machine learning, and in the learning process of the support vector machine, it is actually a process of solving the optimum for the cost function, so that L1 norm regularization makes the result obtained by learning satisfy sparseness by adding L1 norm to the cost function, thereby facilitating people to extract features.

Preferably, the first and second electrodes are formed of a metal,

the step of processing the classification features through a full connection layer to obtain a classification confidence coefficient comprises the following steps:

the double-head detection module stretches the classification reconstruction features into one-dimensional feature vectors, and calculates the classification confidence of the one-dimensional feature vectors by utilizing two full-connection layers;

it can be understood that the structure of the dual-head detection module is as shown in fig. 4, after the reconstruction of the RoI features is completed, the reconstructed RoI features are input into the dual-head detection module, and if the RoI features are classified and reconstructed, the classified and reconstructed features are firstly stretched into one-dimensional feature vectors, and then the classification confidence is calculated through two Fc layers, i.e., full connection layers.

Preferably, the first and second electrodes are formed of a metal,

combining the coordinates of the boundary box after the translation operation and the scale scaling operation, namely the coordinates of the boundary;

it can be understood that, as shown in fig. 4, if the regression reconstruction feature is input, after reducing the calculation amount by using 4 BottleNeck layers (BottleNeck layers) and performing dimensionality reduction by using average pooling, the translation and scaling value of the bounding box of the regression reconstruction feature is predicted, and the specific process of translation and scaling is as follows:

the bounding box of the RoI feature is generally denoted as

First, make a translation

Wherein:

the translation prediction value may be expressed as:

then scaling is carried out

Wherein:

the scale scaling prediction value can be expressed as:

wherein,

、

、

、

the parameters are parameters to be trained based on the difference between the coordinates of the characteristic diagram and the actual coordinates of the target, and the parameters are finally obtained

I.e. the boundary coordinates obtained by predicting the regression branch, and integrating the results of the regression branch and the classification branchThe purpose of detecting the defect target is achieved.

Example two

FIG. 5 is a system diagram illustrating an attention-based workpiece flaw detection arrangement according to another exemplary embodiment, including:

feature map extraction module 1: the system comprises a feature extraction network, a feature extraction network and a feature extraction network, wherein the feature extraction network is used for inputting an image to be detected of a workpiece to obtain a feature map of the image to be detected;

the feature classification module 2: the system comprises a region generation module, a feature map input module, a classification module and a feature map output module, wherein the region generation module is used for inputting the feature map into the region generation module, the region generation module is used for screening an anchor frame of the feature map to obtain a suggestion frame, the suggestion frame is screened to obtain a second feature map through a preset intersection and comparison, the second feature map is used as a classification feature of the feature map, and a boundary coordinate of the second feature map is extracted to be used as a regression feature of the feature map;

the feature extraction module 3: the system is used for extracting the classification characteristic and the regression characteristic respectively through an attention stacking module;

the prediction module 4: the double-head detection module is used for processing the classification features through a full connection layer to obtain classification confidence coefficients, and performing pooling operation and translation zooming operation on the regression features to obtain boundary coordinates of the regression features;

an output module 5: the image defect detection system is used for fusing the classification confidence coefficient and the boundary coordinate to obtain a defect detection result of the image to be detected;

it can be understood that, the feature map extraction module 1 inputs the image to be detected into the feature extraction network framework to obtain a feature map of the image to be detected; inputting the feature map into a region generation module through a feature classification module 2 to obtain classification features and regression features of the feature map; the classification characteristic and the regression characteristic are respectively subjected to characteristic extraction through an attention stacking module by a characteristic extraction module 3; inputting the classification features and the regression features subjected to feature extraction into a double-head detection module through a prediction module 4, and acquiring the classification confidence of the classification features and the boundary coordinates of the regression features; the output module 5 fuses the classification confidence and the boundary coordinates to output a defect detection result of the image to be detected; in the application, an attention mechanism (attention stacking module) is introduced for feature extraction, effective features related to a defect target are concerned while the calculated amount is reduced, so that complex background interference is suppressed, the detection performance of the defect target is improved, finally, classification features and regression features after feature extraction are input into a double-head detection module, the classification confidence of the classification features and the boundary coordinates of the regression features are obtained, a double-head structure is adopted, classification and regression tasks are better realized, and the problem that the regression and classification tasks cannot reach the optimal simultaneously due to the fact that a single full-connection layer or a single convolution layer is adopted in the prior art is avoided.

Example three:

the present embodiment provides a storage medium, which stores a computer program, when executed by a master controller, implementing the steps of the above method;

it will be appreciated that the storage medium referred to above may be a read-only memory, a magnetic or optical disk, or the like.

It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar contents in other embodiments may be referred to for the contents which are not described in detail in some embodiments.

It should be noted that the terms "first," "second," and the like in the description of the present invention are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, in the description of the present invention, the meaning of "a plurality" means at least two unless otherwise specified.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and alternate implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.

It should be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present invention may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are exemplary and not to be construed as limiting the present invention, and that changes, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. The method for detecting the defects of the workpiece based on the attention mechanism is characterized by comprising the following steps:

inputting the classification features and regression features after feature extraction into a double-head detection module, processing the classification features by the double-head detection module through a full connection layer to obtain classification confidence, and performing pooling operation and translation scaling operation on the regression features to obtain boundary coordinates of the regression features;

2. The method of claim 1, wherein inputting the feature map into a region generation module further comprises:

the obtained characteristic graph of the image to be detected is taken as a characteristic F1, the characteristic F1 is up-sampled to obtain a characteristic F2,the feature F2 is subjected to upsampling to obtain a feature F3, the steps are repeated, and the feature obtained by one-time upsampling which meets the preset upsampling times is recorded as the feature F _N ；

Fusing the characteristics F1 and F2 and inputting the fused characteristics into the region generation module, fusing the characteristics F2 and F3 and inputting the fused characteristics into the region generation module, and repeating the steps until the characteristics F are input _N And feature F _N-1 Fusing and inputting into a region generation module, feature F _N A separate input is also made to the region generation module.

3. The method of claim 2,

screening the suggestion frame through a preset intersection-parallel ratio to obtain a second feature map, taking the second feature map as a classification feature of the feature map, extracting boundary coordinates of the second feature map, and taking the boundary coordinates as a regression feature of the feature map, wherein the step of screening the suggestion frame comprises the following steps:

4. The method of claim 1,

the attention stacking module is composed of a plurality of attention modules;

the attention module multiplies the standard attention matrix by a numerical value storage unit of the attention matrix and adds the input classification characteristic or regression characteristic to obtain a classification reconstruction characteristic or regression reconstruction characteristic;

5. The method of claim 4,

the double-head detection module stretches the classification reconstruction features into one-dimensional feature vectors, and the classification confidence of the one-dimensional feature vectors is calculated by utilizing two full-connection layers.

6. The method of claim 4,

and combining the coordinates of the boundary box after the translation operation and the scale scaling operation, namely the coordinates of the boundary.

7. An attention-based system workpiece defect detection apparatus, the apparatus comprising:

8. A storage medium, characterized in that the storage medium stores a computer program which, when executed by a master controller, implements the steps of the attention mechanism-based workpiece defect detection method according to any one of claims 1-6.