CN116596881A - Workpiece surface defect detection method based on CNN and transducer - Google Patents
Workpiece surface defect detection method based on CNN and transducer Download PDFInfo
- Publication number
- CN116596881A CN116596881A CN202310558597.0A CN202310558597A CN116596881A CN 116596881 A CN116596881 A CN 116596881A CN 202310558597 A CN202310558597 A CN 202310558597A CN 116596881 A CN116596881 A CN 116596881A
- Authority
- CN
- China
- Prior art keywords
- feature
- cnn
- convolution
- map
- transducer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000007547 defect Effects 0.000 title claims abstract description 39
- 238000001514 detection method Methods 0.000 title claims description 14
- 238000000605 extraction Methods 0.000 claims abstract description 22
- 230000004927 fusion Effects 0.000 claims abstract description 12
- CLSIFQGHPQDTHQ-DTWKUNHWSA-N (2s,3r)-2-[(4-carboxyphenyl)methyl]-3-hydroxybutanedioic acid Chemical compound OC(=O)[C@H](O)[C@@H](C(O)=O)CC1=CC=C(C(O)=O)C=C1 CLSIFQGHPQDTHQ-DTWKUNHWSA-N 0.000 claims abstract description 7
- 238000000034 method Methods 0.000 claims description 23
- 230000008569 process Effects 0.000 claims description 16
- 238000012549 training Methods 0.000 claims description 16
- 238000010586 diagram Methods 0.000 claims description 15
- 238000004364 calculation method Methods 0.000 claims description 13
- 238000013507 mapping Methods 0.000 claims description 9
- 239000013598 vector Substances 0.000 claims description 9
- 238000012795 verification Methods 0.000 claims description 6
- 229910000831 Steel Inorganic materials 0.000 claims description 4
- 239000010959 steel Substances 0.000 claims description 4
- 230000007704 transition Effects 0.000 claims description 4
- 230000000295 complement effect Effects 0.000 claims description 3
- 239000003550 marker Substances 0.000 claims description 3
- 230000009467 reduction Effects 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims 1
- 230000002708 enhancing effect Effects 0.000 abstract description 2
- 238000013527 convolutional neural network Methods 0.000 description 28
- 238000012360 testing method Methods 0.000 description 4
- 238000004519 manufacturing process Methods 0.000 description 3
- 235000008694 Humulus lupulus Nutrition 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0004—Industrial image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/09—Supervised learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30108—Industrial image inspection
- G06T2207/30164—Workpiece; Machine component
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Abstract
The application discloses a trunk network and a feature fusion network based on combination of CNN and a Transformer, wherein a MobileViT block is added in the trunk network, and an improved CBMA module is combined at the tail of each MobileViT block so that two feature graphs can be fused better, and CSP bottleneck structures are applied to CNN and Transformer Block which are stacked continuously to improve the performance of the network. The whole model enhances the fusion of the CNN and the transducer feature map, and effectively improves the feature extraction capacity of the model backbone network and the acceptance domain of the output features. An upsampling feature extraction path including Transformer Block is added to the enhanced feature extraction network (PANet) and a Patch expansion is introduced for this architecture to handle the upsampling operation of the Transformer feature map. And adding bridging blocks between the feature extraction paths for performing layer-jump linking on the feature layers of the CNN and the Transformer, and enhancing global information of the feature map in the pyramid on the premise of keeping local information. The application can detect the surface defect target with larger shape, size, proportion and texture specificity in the workpiece.
Description
Technical Field
The application relates to the field of computer vision, in particular to a detection method for surface defects of a workpiece.
Background
For the manufacturing industry, quality control is critical, defects in a workpiece can have adverse effects on the rigidity, strength and bearing capacity of the workpiece, so that the stability of the workpiece cannot be ensured, and even huge potential safety hazards can be brought. Therefore, defect detection of large batches of workpieces is extremely important in the production process.
With the increase in computational power over the past decade, artificial neural networks have been able to address some previously difficult tasks and have achieved considerable success in a variety of fields. The convolutional neural network has high performance in tasks such as image classification, segmentation and target detection, and can adapt to various different use scenes, so that the convolutional neural network has stronger generalization capability. Meanwhile, the visual detection system based on deep learning can achieve high precision and high efficiency in areas which are difficult to detect by the traditional visual method. The depth network greatly improves the efficiency in the detection process, obviously reduces the cost required by detection, and is very suitable for completing the defect detection task of the workpiece surface.
Disclosure of Invention
The application provides a backbone network and a feature fusion network based on combination of CNN and a Transformer, wherein a MobileViT block is added in the backbone network, and an improved CBMA module is combined at the tail of each MobileViT block so that two feature graphs can be fused better, and CSP bottleneck structures are applied to CNN and Transformer Block which are stacked continuously to improve the performance of the network. The whole model enhances the fusion of the CNN and the transducer feature map, and effectively improves the feature extraction capacity of the model backbone network and the acceptance domain of the output features. An upsampling feature extraction path including Transformer Block is added to the enhanced feature extraction network (PANet) and a Patch expansion is introduced for this architecture to handle the upsampling operation of the Transformer feature map. And adding bridging blocks between the feature extraction paths for performing layer-jump linking on the feature layers of the CNN and the Transformer, and enhancing global information of the feature map in the pyramid on the premise of keeping local information. The method comprises the following steps:
s1, acquiring a steel surface defect data set and dividing a training verification set;
s2, constructing a transducer and CNN series structure trunk feature extraction network based on MobileVit, wherein a steel defect sample is taken as input, the feature extraction network comprises three stages, and the output of each stage is taken as an effective feature map;
and S3, constructing a PaNet-based multi-scale feature fusion network in which a transducer and a CNN are connected in parallel. Three effective feature layers of the trunk feature extraction network are used as input to perform feature fusion;
step S4, according to a predetermined number of workpiece surface defect samples, the images contain single defect positions and cover defects at different scales. Each sample image has a corresponding predetermined defect classification. The sample images are used as input, preset defect classification of defects with different scales in the images is used as output, and the detection model is trained to obtain the workpiece surface defect detection model.
The preset defects comprise six types of surface pits, inclusions, plaques, rolled oxide skin, cracks and scratches, and each type of defect has 360 graph samples.
And randomly dividing various images in the data set according to the proportion of 8:2 to respectively serve as a training set and a testing set. Thus the training set had 1728 samples and the test set had 432 samples. In the model training process, the parameter learning rate is set to be 1e-3, and the weight attenuation is set to be 5e-4. A total of 300 epochs were trained, of which the free_epochs were 50, this part of Batch size=16, the rest Batch size=4. The learning rate decreases in the form of cosine decay. When the training times were advanced to 50 epochs, the Batch Size was set to 16 and the learning rate was set to 1e4.
S2.1, firstly, the dimension of the feature map is improved through 1X 1 cross-channel convolution, then further feature extraction is carried out through depth separable convolution, and finally, the feature map is restored to the dimension when input is carried out through 1X 1 convolution. The depth separable convolution is mainly divided into two processes, namely channel-by-channel convolution, namely, convolution operation is respectively carried out by using each channel of the plurality of convolution check feature images; and point-by-point convolution, namely cross-channel convolution using a plurality of points of the 1 x 1 convolution kernel feature map.
S2.2 the S2.1 feature map is first convolved 3 x 3 at a time to extract local information and then convolved point by point using a 1 x 1 cross-channel. This process adjusts the original feature map dimension from H W C to H W d. After the 2D characteristic map of the picture is segmented, the 2D characteristic map is converted into a one-dimensional vector which can be directly processed by a transducer. Obtaining X Unfold ∈R P×N×d The P feature map represents the length of each sub-block vector after flattening, and N represents the number of vectors after the feature map is segmented. Feeding the flattened eigenvectors into a stack Transformer Block to obtain X G ∈R P×N×d 。
Feature map X obtained by L convectors G ∈R P×N×d Refolding and restoring to obtain X Fold ∈R H×W×d . Then X is taken up Fold Sending the obtained product into a 1X 1 convolution network, and performing dimension reduction on the whole feature map to obtain F epsilon R C×H×W And the subsequent splicing operation of the original characteristic diagram and the original characteristic diagram is facilitated.
S2.3: s2.1 and S2.2 output feature map F CNN ,F VIT ∈R C×H×W Channel attention map to M C ∈R C ×1×1 The spatial attention map is M S ∈R 1×H×W The CBMA process flow is as follows:
the final result of the channel attention module is as follows:
the final result of the spatial attention module is as follows:
s2.4: the operations of S2.1, S2.2 and S2.3 are carried out once for each Stage, three stages are repeatedly carried out, and the characteristic diagram after the operation of S2.3 in each Stage is taken as an effective characteristic diagram.
S3.1 an additional multi-scale feature extraction path consisting of Swin Transformer Block is added to the original PaNet.
S3.2 CNN tributary layer-skipping a bridge linked to the Transformer tributary is used to fuse the local features of CNN into the Transformer to complement the detail information. Firstly, respectively mapping the feature map of the CNN into Key and Value, and using the feature map of the transducer as Query map to perform the next self-attention calculation.
The calculation process from the local feature bridge of CNN to the transducer is as follows:
s3.3 the bridge of the transition leg layer jump link to the CNN leg is opposite to the bridge of the CNN leg layer jump link to the transition leg. It emphasizes global marker attention into local features. The feature map transformation dimensions are then input before and after the depth separable convolution, respectively. And then, performing Query mapping on the result of the convolution operation, performing Key and Value mapping on Token output by the transducer, and performing self-attention calculation again.
The calculation process of bridging from the global feature of the transducer to the CNN local feature is as follows:
drawings
FIG. 1 is a backbone network model diagram;
FIG. 2 is a diagram of a multi-scale feature fusion network feature;
FIG. 3 is a MobileVit model structure incorporating CBMA;
FIG. 4 is a diagram of a bridge configuration with CNN tributary layer hops linked to a transducer tributary;
fig. 5 is a diagram of a bridge configuration in which a transducer leg hops to link to a CNN leg.
Detailed Description
For a better understanding of the technical content of the present application, specific examples are set forth below, along with the accompanying drawings.
First, collecting a defect image of the surface of a workpiece.
Secondly, marking defects, and carrying out data enhancement to construct a workpiece surface defect data set, wherein the method specifically comprises the following steps: on a workpiece production line, photographing and sampling are carried out on each workpiece at a fixed position by using a sampling device, and a workpiece surface defect data set is constructed. And marking the surface defects of the acquired workpiece surface image through labelimg. Dividing the data set of the surface defects of the workpiece into a training set, a testing set and a verification set according to a preset proportion, and preparing an image marked on the surface by Labelimg as a data set in a VOC format; performing data enhancement on the image in the data set through imgauge data enhancement; the enhanced data set was combined with (training set + validation set): the ratio of the test set is 8:2, the training set: the verification set is randomly divided at a ratio of 8:2.
Thirdly, building a backbone network based on MobileVit, wherein the structure is shown in figure 1.
Fourth, based on PaNet, the feature fusion network is built, and the structure is shown in figure 2.
Fifth, the images contain single defect locations and cover defects at different scales based on a predetermined number of workpiece surface defect samples. Each sample image has a corresponding predetermined defect classification.
As shown in fig. 3, the improved MobileVit improves the dimension of the feature map by 1×1 cross-channel convolution, then uses depth separable convolution to perform further feature extraction, and finally restores the feature map to the dimension at the time of input by 1×1 convolution. The depth separable convolution is mainly divided into two processes, namely channel-by-channel convolution, namely, convolution operation is respectively carried out by using each channel of the plurality of convolution check feature images; and point-by-point convolution, namely cross-channel convolution using a plurality of points of the 1 x 1 convolution kernel feature map.
The feature map is first convolved 3 x 3 once to extract local information and then convolved point by point across channels using a 1 x 1 cross-channel. This process adjusts the original feature map dimension from H W C to H W d. After the 2D characteristic map of the picture is segmented, the 2D characteristic map is converted into a one-dimensional vector which can be directly processed by a transducer. Obtaining X Unfold ∈R P×N×d The P feature map represents the length of each sub-block vector after flattening, and N represents the number of vectors after the feature map is segmented. Feeding the flattened eigenvectors into a stack Transformer Block to obtain X G ∈R P×N×d 。
Feature map X obtained by L convectors G ∈R P×N×d Refolding and restoring to obtain X Fold ∈R H×W×d . Then X is taken up Fold Sending the obtained product into a 1X 1 convolution network, and performing dimension reduction on the whole feature map to obtain F epsilon R C×H×W And the subsequent splicing operation of the original characteristic diagram and the original characteristic diagram is facilitated.
CNN and Transformer output feature map F CNN ,F VIT ∈R C×H×W Channel attention map to M C ∈R C×1×1 The spatial attention map is M S ∈R 1×H×W The CBMA process flow is as follows:
the final result of the channel attention module is as follows:
the final result of the spatial attention module is as follows:
three stages are repeated, and the output of each Stage is used as an effective characteristic diagram.
As shown in fig. 2, an additional multi-scale feature extraction path consisting of Swin Transformer Block is added to the original PaNet.
As shown in fig. 4, the bridge of the CNN tributary layer-skipping link to the transducer tributary is used to fuse the local features of the CNN into the transducer to complement the detail information. Firstly, respectively mapping the feature map of the CNN into Key and Value, and using the feature map of the transducer as Query map to perform the next self-attention calculation.
The calculation process from the local feature bridge of CNN to the transducer is as follows: .
As shown in fig. 5, the bridge of the converter leg hop link to the CNN leg is opposite in direction to the bridge of the CNN leg hop link to the converter leg. It emphasizes global marker attention into local features. The feature map transformation dimensions are then input before and after the depth separable convolution, respectively. And then, performing Query mapping on the result of the convolution operation, performing Key and Value mapping on Token output by the transducer, and performing self-attention calculation again.
The calculation process of bridging from the global feature of the transducer to the CNN local feature is as follows:
the built model performs fine tuning training on the workpiece surface defect data set specifically comprises the following steps: classification losses were calculated using the Focal loss during training, and regression losses were calculated using the Smooth L1. The final loss function employed is the combination of focallos with smoth L1: l=lfl+lsl1, the classification loss Focal loss calculation formula is:
the regression loss smoth L1 is calculated as follows:
training a network by adopting a transfer learning method, pre-training in the VOC data set to obtain a weight file, and then fine-tuning in the workpiece surface defect data set. The number of iteration steps of the loop is set to 100, firstly, the batch size is set to 32, the learning rate is initialized to 5e-4, when the number of iteration steps reaches 50, the batch size is reset to 16, and the learning rate is 1e-4. And an early stop method (early stop) is adopted during training to avoid overfitting caused by continuous training, verification loss is calculated in each iteration, when the verification loss value reaches local optimum, the iteration is continued for 6 times, and if the model is not converged any more, the training is stopped.
Claims (3)
1. A workpiece surface authority detection method based on CNN and fransformer, the method comprising the steps of:
and S1, acquiring a steel surface defect data set and dividing a training verification set.
And S2, constructing a transducer and CNN series structure trunk feature extraction network based on MobileVit, wherein the feature extraction network takes a steel defect sample as input, and takes the output of each stage as an effective feature map.
And S3, constructing a PaNet-based multi-scale feature fusion network in which a transducer and a CNN are connected in parallel. Feature fusion with three effective feature layers of the backbone feature extraction network as inputs
S4, according to a preset number of workpiece surface defect samples, the images contain single defect positions and cover defects under different scales. Each sample image has a corresponding predetermined defect classification. The sample images are used as input, preset defect classification of defects with different scales in the images is used as output, and the detection model is trained to obtain the workpiece surface defect detection model.
2. The transducer and CNN tandem structure backbone feature extraction network of claim 1, wherein a modified CBMA module is incorporated at the end of each MobileViT block to allow better fusion of the two feature maps. The method specifically comprises the following steps:
s2.1, firstly, the dimension of the feature map is improved through 1X 1 cross-channel convolution, then further feature extraction is carried out through depth separable convolution, and finally, the feature map is restored to the dimension when input is carried out through 1X 1 convolution. The depth separable convolution is mainly divided into two processes, namely channel-by-channel convolution, namely, convolution operation is respectively carried out by using each channel of the plurality of convolution check feature images; and point-by-point convolution, namely cross-channel convolution using a plurality of points of the 1 x 1 convolution kernel feature map.
S2.2 the S2.1 feature map is first convolved 3 x 3 at a time to extract local information and then convolved point by point using a 1 x 1 cross-channel. This process adjusts the original feature map dimension from H W C to H W d. After the 2D characteristic map of the picture is segmented, the 2D characteristic map is converted into a one-dimensional vector which can be directly processed by a transducer. Obtaining X Unfold ∈R P×N×d The P feature map represents the length of each sub-block vector after flattening, and N represents the number of vectors after the feature map is segmented. Feeding the flattened eigenvectors into a stack Transformer Block to obtain X G ∈R P×N×d 。
Feature map X obtained by L convectors G ∈R P×N×d Refolding and restoring to obtain X Fold ∈R H×W×d . Then X is taken up Fold Sending the obtained product into a 1X 1 convolution network, and performing dimension reduction on the whole feature map to obtain F epsilon R C×H×W And the subsequent splicing operation of the original characteristic diagram and the original characteristic diagram is facilitated.
S2.3: s2.1 and S2.2 output feature map F CNN ,F VIT ∈R C×H×W Channel attention map to M C ∈R C×1×1 The spatial attention map is M S ∈R 1×H×W The processing flow of the CBMA is as follows:
The final result of the channel attention module is as follows:
the final result of the spatial attention module is as follows:
s2.4: the operations of S2.1, S2.2 and S2.3 are carried out once for each Stage, three stages are repeatedly carried out, and the characteristic diagram after the operation of S2.3 in each Stage is taken as an effective characteristic diagram.
3. The parallel multiscale feature fusion network of transducer and CNN of claim 1, wherein an additional multiscale feature extraction path comprised of Swin Transformer Block is added to the PaNet, comprising:
s3.1 an additional multi-scale feature extraction path consisting of Swin Transformer Block is added to the original PaNet.
S3.2 CNN tributary layer-skipping a bridge linked to the Transformer tributary is used to fuse the local features of CNN into the Transformer to complement the detail information. Firstly, respectively mapping the feature map of the CNN into Key and Value, and using the feature map of the transducer as Query map to perform the next self-attention calculation.
The calculation process from the local feature bridge of CNN to the transducer is as follows:
s3.3 the bridge of the transition leg layer jump link to the CNN leg is opposite to the bridge of the CNN leg layer jump link to the transition leg. It emphasizes global marker attention into local features. The feature map transformation dimensions are then input before and after the depth separable convolution, respectively. And then, performing Query mapping on the result of the convolution operation, performing Key and Value mapping on Token output by the transducer, and performing self-attention calculation again.
The calculation process of bridging from the global feature of the transducer to the CNN local feature is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310558597.0A CN116596881A (en) | 2023-05-17 | 2023-05-17 | Workpiece surface defect detection method based on CNN and transducer |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310558597.0A CN116596881A (en) | 2023-05-17 | 2023-05-17 | Workpiece surface defect detection method based on CNN and transducer |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116596881A true CN116596881A (en) | 2023-08-15 |
Family
ID=87604107
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310558597.0A Pending CN116596881A (en) | 2023-05-17 | 2023-05-17 | Workpiece surface defect detection method based on CNN and transducer |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116596881A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117094999A (en) * | 2023-10-19 | 2023-11-21 | 南京航空航天大学 | Cross-scale defect detection method |
CN117218606A (en) * | 2023-11-09 | 2023-12-12 | 四川泓宝润业工程技术有限公司 | Escape door detection method and device, storage medium and electronic equipment |
-
2023
- 2023-05-17 CN CN202310558597.0A patent/CN116596881A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117094999A (en) * | 2023-10-19 | 2023-11-21 | 南京航空航天大学 | Cross-scale defect detection method |
CN117094999B (en) * | 2023-10-19 | 2023-12-22 | 南京航空航天大学 | Cross-scale defect detection method |
CN117218606A (en) * | 2023-11-09 | 2023-12-12 | 四川泓宝润业工程技术有限公司 | Escape door detection method and device, storage medium and electronic equipment |
CN117218606B (en) * | 2023-11-09 | 2024-02-02 | 四川泓宝润业工程技术有限公司 | Escape door detection method and device, storage medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109190752B (en) | Image semantic segmentation method based on global features and local features of deep learning | |
CN116596881A (en) | Workpiece surface defect detection method based on CNN and transducer | |
CN112541503B (en) | Real-time semantic segmentation method based on context attention mechanism and information fusion | |
CN110782462B (en) | Semantic segmentation method based on double-flow feature fusion | |
CN112070158B (en) | Facial flaw detection method based on convolutional neural network and bilateral filtering | |
CN111861961A (en) | Multi-scale residual error fusion model for single image super-resolution and restoration method thereof | |
CN111340814A (en) | Multi-mode adaptive convolution-based RGB-D image semantic segmentation method | |
CN111860683B (en) | Target detection method based on feature fusion | |
CN110399840B (en) | Rapid lawn semantic segmentation and boundary detection method | |
CN112164077B (en) | Cell instance segmentation method based on bottom-up path enhancement | |
CN113269224B (en) | Scene image classification method, system and storage medium | |
CN112233129A (en) | Deep learning-based parallel multi-scale attention mechanism semantic segmentation method and device | |
CN112750125B (en) | Glass insulator piece positioning method based on end-to-end key point detection | |
CN114463297A (en) | Improved chip defect detection method based on FPN and DETR fusion | |
CN111654621B (en) | Dual-focus camera continuous digital zooming method based on convolutional neural network model | |
CN112700418A (en) | Crack detection method based on improved coding and decoding network model | |
CN116205876A (en) | Unsupervised notebook appearance defect detection method based on multi-scale standardized flow | |
CN114820541A (en) | Defect detection method based on reconstructed network | |
CN116309429A (en) | Chip defect detection method based on deep learning | |
CN115937121A (en) | Non-reference image quality evaluation method and system based on multi-dimensional feature fusion | |
CN116167965A (en) | Method and device for identifying quasi-unbalanced wafer defect mode based on privacy protection | |
CN114998183A (en) | Method for identifying surface defects of recycled aluminum alloy template | |
CN112733934A (en) | Multi-modal feature fusion road scene semantic segmentation method in complex environment | |
CN112991257A (en) | Heterogeneous remote sensing image change rapid detection method based on semi-supervised twin network | |
CN117576402B (en) | Deep learning-based multi-scale aggregation transducer remote sensing image semantic segmentation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
DD01 | Delivery of document by public notice | ||
DD01 | Delivery of document by public notice |
Addressee: Wang Yuechen Document name: Notice of Publication of Invention Patent Application Addressee: Wang Yuechen Document name: Notification of Qualified Preliminary Examination of Invention Patent Application |