CN117095155A - Multi-scale nixie tube detection method based on improved YOLO self-adaptive attention-feature enhancement network - Google Patents
Multi-scale nixie tube detection method based on improved YOLO self-adaptive attention-feature enhancement network Download PDFInfo
- Publication number
- CN117095155A CN117095155A CN202310906241.1A CN202310906241A CN117095155A CN 117095155 A CN117095155 A CN 117095155A CN 202310906241 A CN202310906241 A CN 202310906241A CN 117095155 A CN117095155 A CN 117095155A
- Authority
- CN
- China
- Prior art keywords
- feature
- nixie tube
- image
- detection
- scale
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 91
- 230000002776 aggregation Effects 0.000 claims abstract description 22
- 238000004220 aggregation Methods 0.000 claims abstract description 22
- 238000007781 pre-processing Methods 0.000 claims abstract description 10
- 238000000034 method Methods 0.000 claims description 29
- 230000003044 adaptive effect Effects 0.000 claims description 22
- 238000011176 pooling Methods 0.000 claims description 17
- 238000012549 training Methods 0.000 claims description 16
- 230000008569 process Effects 0.000 claims description 15
- 230000004927 fusion Effects 0.000 claims description 10
- 238000012360 testing method Methods 0.000 claims description 10
- 238000012545 processing Methods 0.000 claims description 9
- 239000013598 vector Substances 0.000 claims description 9
- 238000010586 diagram Methods 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 6
- 230000001629 suppression Effects 0.000 claims description 5
- 230000004931 aggregating effect Effects 0.000 claims description 4
- 238000005096 rolling process Methods 0.000 claims description 3
- 238000005520 cutting process Methods 0.000 claims description 2
- 238000004519 manufacturing process Methods 0.000 description 11
- 230000007547 defect Effects 0.000 description 9
- 238000013527 convolutional neural network Methods 0.000 description 4
- XAGFODPZIPBFFR-UHFFFAOYSA-N aluminium Chemical compound [Al] XAGFODPZIPBFFR-UHFFFAOYSA-N 0.000 description 2
- 229910052782 aluminium Inorganic materials 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- 230000002159 abnormal effect Effects 0.000 description 1
- 230000004075 alteration Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000012805 post-processing Methods 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0004—Industrial image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/774—Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30108—Industrial image inspection
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Quality & Reliability (AREA)
- Image Analysis (AREA)
Abstract
The invention belongs to the field of computer vision, and particularly relates to a multi-scale nixie tube detection method based on an improved YOLO self-adaptive attention-feature enhancement network, which comprises the following steps: acquiring a nixie tube image to be detected, and preprocessing the nixie tube image; inputting the predicted nixie tube image into an improved YOLO self-adaptive attention-feature enhancement network, and performing multi-scale feature aggregation treatment on the feature image through a feature pyramid module and a path aggregation network to obtain an aggregated high-level feature image; inputting the high-level feature map into a detection module to obtain a detection result; the invention adopts the self-adaptive force-meaning module and the characteristic enhancement module to extract and enhance the multi-scale characteristics of the characteristic map, thereby improving the detection precision.
Description
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a multiscale nixie tube detection method based on an improved YOLO self-adaptive attention-feature enhancement network.
Background
In the production and manufacturing links of the nixie tube, the detection of the nixie tube is the last step of the production and manufacturing links of the nixie tube, and the detection links are used for detecting the electrical performance, the appearance and the like of the nixie tube. Compared with the design and the manufacture of the nixie tube, the detection of the nixie tube is a labor-intensive industry with lower technical content, and mainly because the detection of the nixie tube is mostly carried out manually in the detection link of the nixie tube. However, with the improvement of the automation degree of digital production, the conventional manual detection is not suitable for the modern production due to the limitations of human eyes on the working time, accuracy, stability of the judgment result and judgment speed. The application of machine vision instead of manual detection has become an important trend for nixie tube detection.
Object detection is a machine vision task that processes computer digital images and gives an example of some type of visual object (e.g., birds, humans, or vehicles) therein. Object detection gives one of the basic information required for machine vision applications: "what object is where. Target detection is one of the basic problems of machine vision, and is the basis of some machine vision tasks such as image segmentation, target tracking and the like. The output of the target detection is typically an algorithmically generated bounding box indicating the detected target location, class and confidence. As the performance of manual features is increasingly becoming extremely powerful, the performance of traditional detection algorithms encounters bottlenecks in 2010, and during 2010-2012, the target detection algorithms develop more slowly. In 2012, convolutional neural networks (Convolutional Neural Network, CNN) were extensively studied due to the excellent performance of AlexNet exhibited in picture classification. Currently, the target detection algorithms based on neural networks can be classified into two types according to the determination process of the detection target: the "two-stage detection" of the detection frame and the "one-stage detection" of the direct generation detection frame are corrected step by step, and YOLO belongs to a CNN-based one-stage method.
As the latest framework in the current YOLO series, YOLOv5 is one of detection algorithms that achieve good performance in terms of both real-time performance and detection accuracy at present. Many researchers have applied it to various fields and have proposed various improvements. The patent "a method for detecting defects of aluminum sheets based on YOLOv5 (application number: 202211678791)" applies YOLOv5 to image defect detection of aluminum sheets, detects the defects and eliminates them. However, the method does not consider the dimensional change of the target in the detection process, and can have a certain influence on the detection precision.
Disclosure of Invention
In order to solve the problem of low accuracy of detection results caused by large scale variation of a factor tube chip in the prior art, the invention provides a multi-scale nixie tube detection method based on an improved YOLO self-adaptive attention-feature enhancement network, which comprises the following steps: acquiring a nixie tube image to be detected, and preprocessing the nixie tube image; inputting the predicted nixie tube image into a trained improved YOLO self-adaptive attention-feature enhancement network to obtain a detection result; the improved YOLO self-adaptive attention-feature enhancement network consists of a feature pyramid module, a path aggregation network and a detection module;
training the improved YOLO adaptive attention-feature enhancement network includes:
s1: acquiring a nixie tube image dataset, and preprocessing the nixie tube image in the dataset;
s2: dividing the preprocessed nixie tube image data set into a training set and a testing set;
s3: inputting the images in the training set into a feature pyramid module of an improved YOLO self-adaptive attention-feature enhancement network to obtain a multi-scale feature map;
s4: inputting the multi-scale feature map into a path aggregation network to obtain an aggregated high-level feature map;
s5: inputting the high-level feature map into a detection module to obtain a detection result;
s6: calculating a loss function of the model according to the detection result;
s7: parameters of the model are adjusted, and training of the network is completed when the loss function converges;
s8: and inputting the image in the test set into a trained improved YOLO self-adaptive attention-feature enhancement network to obtain a test result.
Preferably, preprocessing the nixie tube image includes: cutting a region of 30% above the vertical direction and 10% below the vertical direction of the nixie tube image; the cut image is divided into two images in a bisection manner from the horizontal direction.
Preferably, the processing of the input image by the feature pyramid module includes: inputting an input image into a backbone network to obtain a high-level feature map F_h; optimizing the high-level characteristic image by adopting a self-adaptive attention module to obtain a fusion characteristic image; inputting the fusion feature map and the high-level feature map into a feature enhancement module, and fusing the enhanced feature map to obtain an optimized high-level feature map; and performing downsampling on the optimized high-level feature images for multiple times to obtain feature images with different scales.
Further, the process of the adaptive attention module for processing the high-level features includes: inputting the high-level features into the self-adaptive pooling layer to obtain the context features with different scales; carrying out 1X 1 convolution on the context features of all scales to obtain feature graphs of different scales with the same channel dimension; upsampling the high-level feature map using bilinear interpolation; inputting the characteristic diagram after the adoption and the contextual characteristics of different scales into a Concat layer for channel combination to obtain a characteristic diagram after channel combination; the feature images after the channels are combined sequentially pass through a 1 multiplied by 1 convolution layer, a ReLU activation layer, a 3 multiplied by 3 convolution layer and a sigmoid activation layer to generate corresponding space weight images; and carrying out Hadamard product operation on the generated weight map and the feature map after the channels are combined to obtain a fusion feature map.
Further, the characteristic enhancement module comprises a multi-branch convolution layer and a multi-branch pooling layer; the multi-branch convolution layer is used for providing receptive fields with different sizes for the input feature images through cavity convolution; the multi-branch pooling layer is used for fusing the nixie tube image information from the three branch receptive fields.
Preferably, the process of the path aggregation network for aggregating the multi-scale feature map includes: upsampling each low resolution feature; and carrying out graph element-by-element addition on the up-sampled feature graphs and the highest resolution feature graphs, inputting the feature graphs after element addition into a path aggregation module, and carrying out a series of convolution operations on the feature graphs after element addition to obtain an aggregated high-level feature graph.
Preferably, the process of the detection module for processing the aggregated high-level feature map includes: performing a series of rolling and pooling operations on the polymerized high-level feature map to obtain a first feature map; converting the first feature map into a feature vector with a fixed size; inputting the feature vector into the full connection layer to obtain a classification score and a boundary frame coordinate; acquiring the position information of the detection target according to the classification score and the boundary frame coordinates; and eliminating the overlapped boundary boxes of the position information of the detection target by adopting a non-maximum suppression algorithm, thereby obtaining a final target detection result.
Preferably, the model's loss function includes a classification loss and a boundary regression loss; the classification loss adopts a binary cross entropy loss function; the loss function of the model is the sum of the classification loss and the boundary regression loss.
Further, the expression of the binary cross entropy loss function is:
L class =-(y·log(p)+(1-y)·log(1-p))
where p represents the predictive probability and y represents true labels.
Further, the expression of the boundary regression loss is:
L CIoU =1-CIoU
wherein CIoU represents the coincidence degree of the boundary box and the real target, ioU represents the intersection ratio, D center Representing the Euclidean distance between the geometric center of the predicted frame and the true geometric center of the target, D circumscribe The euclidean distance of the diagonal of the bounding rectangle representing the true object, v is a real number between 0 and 1, w is the width, and h is the height.
The invention has the beneficial effects that:
1. the invention provides a nixie tube appearance detection method, which replaces the traditional manual inspection mode, adopts an automatic detection mode, greatly improves the production efficiency, reduces the input of human resources and saves the time and the cost. 2. The appearance detection system of the nixie tube can detect appearance defects of the nixie tube, including the problems of cracks, scratches, stains and the like, with high precision and high speed. By timely finding and removing unqualified products, the quality and consistency of the products are effectively improved, and the rate of unqualified products is reduced. 3. The appearance quality of the nixie tube in the production process can be monitored in real time by the appearance detection system, and abnormal conditions can be found in time and fed back. The method is beneficial to the production enterprises to adjust production parameters in time and ensures the stability and consistency of the product quality. 4. The invention adopts the self-adaptive force-meaning module and the characteristic enhancement module to extract and enhance the multi-scale characteristics of the characteristic map, thereby improving the detection precision.
Drawings
FIG. 1 is a general flow chart of the present invention;
FIG. 2 is a network architecture diagram of a recognition method based on adaptive attention-feature enhancement of Yolov 5;
FIG. 3 is a block diagram of an adaptive attention network;
FIG. 4 is a diagram of a feature enhanced network architecture;
fig. 5 is a model training loss diagram.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
A multi-scale nixie tube detection method based on an improved YOLO self-adaptive attention-feature enhancement network is shown in figure 1, and comprises the following steps: acquiring a nixie tube image to be detected, and preprocessing the nixie tube image; inputting the predicted nixie tube image into a trained improved YOLO self-adaptive attention-feature enhancement network to obtain a detection result; the improved YOLO adaptive attention-feature enhancement network consists of a feature pyramid module, a path aggregation network, and a detection module.
Training the improved YOLO adaptive attention-feature enhancement network includes:
s1: acquiring a nixie tube image dataset, and preprocessing the nixie tube image in the dataset;
s2: dividing the preprocessed nixie tube image data set into a training set and a testing set;
s3: inputting the images in the training set into a feature pyramid module of an improved YOLO self-adaptive attention-feature enhancement network to obtain a multi-scale feature map;
s4: inputting the multi-scale feature map into a path aggregation network to obtain an aggregated high-level feature map;
s5: inputting the high-level feature map into a detection module to obtain a detection result;
s6: calculating a loss function of the model according to the detection result;
s7: parameters of the model are adjusted, and training of the network is completed when the loss function converges;
s8: and inputting the image in the test set into a trained improved YOLO self-adaptive attention-feature enhancement network to obtain a test result.
Preprocessing the nixie tube image comprises the following steps: and (3) performing data enhancement on the photographed pictures in four directions and marking the pictures as a training and verification test set. And then, inputting the manufactured nixie tube image data set into a constructed self-adaptive attention-feature enhancement network structure of YOLOv5 to obtain a trained nixie tube defect identification network model. The training process continues for 300 rounds, and the loss function values during the training process are shown in fig. 5, where box_loss, obj_loss, and cls_loss are the boundary regression error, the classification error of foreground-background prediction, and the classification error of positive samples, respectively.
The acquiring the nixie tube image comprises the following steps: the digital tube is connected with the optical fiber sensor, the optical fiber sensor monitors that the digital tube arrives when the digital tube moves to the front of the camera, the PLC receives signals and then controls the camera to acquire digital tube image information, and the image information is transmitted to the cloud platform. The resolution of the acquired nixie tube image is 3072×2048, in order to save calculation resources, the YOLO detection algorithm generally compresses the resolution of the image to 640×640 and then inputs the image into the neural network, so that the defect that the diameter of the original image is about 10 pixels is reduced to a degree that the defect is difficult to detect. Therefore, in order to improve the model performance, the nixie tube image is cropped in the following way: all pictures cut out the upper 30% and lower 10% portions in the vertical direction, with the remaining portion bisected into two images from the horizontal direction.
Construction of an adaptive attention-feature enhanced recognition method module based on YOLOv5, as shown in fig. 2. Feature pyramids (Feature Pyramid Network, FPN) and path aggregation networks (Path Aggregation Network, PAN) in the YOLOv5 model are both methods for multi-scale object detection. FPN is a top-down feature pyramid structure, and multi-scale target detection is achieved by fusing a high-level feature map with a low-level feature map. PAN is a path aggregation network that aggregates feature maps of different levels to achieve multi-scale target detection. In YOLOv5, FPN and PAN structures are used to multi-scale fuse features. In the invention, the self-adaptive attention module and the feature enhancement module are utilized to reduce information loss in the process of generating the feature map and enhance the feature pyramid of the representation capability.
The process of processing the input image by adopting the characteristic pyramid module comprises the following steps: inputting an input image into a backbone network to obtain a high-level feature map F_h; optimizing the high-level characteristic image by adopting a self-adaptive attention module to obtain a fusion characteristic image; inputting the fusion feature map and the high-level feature map into a feature enhancement module, and fusing the enhanced feature map to obtain an optimized high-level feature map; and performing downsampling on the optimized high-level feature images for multiple times to obtain feature images with different scales.
Specifically, the backbone network extracts advanced features: the input image I is passed through a backbone network (e.g., CSPDarknet 53) to obtain an advanced feature map f_h, expressed as: f_h=backbox (I), where the size of f_h is H/32×w/32, and H and W represent the height and width of the input image, respectively. Downsampling to obtain feature maps of different resolutions: through multiple downsampling operations, feature graphs F_ { h/2}, F_ { h/4}, F_ { h/8} and the like with different resolutions are obtained, and are expressed as follows: f_ { h/2} = downsampled (f_h), f_ { h/4} =
Downsamples (f_ { h/2 }), f_ { h/8} = downsamples (f_ { h/4 }), wherein downsamples represent downsampling operations. Namely, the feature pyramid mainly comprises a main network for extracting advanced features and downsampling to obtain feature graphs with different resolutions. The purpose of the feature pyramid is to obtain feature graphs with different scales through multiple downsampling operations so as to capture targets with different sizes.
An adaptive attention module. Adaptive attention network architecture as shown in fig. 3, for the input of the adaptive attention module, the context features of different scales (h1×s, h2×s, h3×s) are first obtained by the adaptive pooling layer. Each context feature is then convolved by 1 x 1 to yield the same channel dimension 256. They are upsampled to the scale of S using bilinear interpolation for subsequent fusion. The spatial attention mechanism merges the channels of the three context features through one Concat layer, and then the feature map sequentially generates corresponding spatial weights through a 1×1 convolution layer, a ReLU activation layer, a 3×3 convolution layer and a sigmoid activation layer. And carrying out Hadamard product operation on the generated weight graph and the feature graph after the channels are combined, separating, adding the obtained product to the input feature graph, and aggregating the context features. The final feature map has rich multi-scale context information, which alleviates to some extent the loss of information due to the reduced number of channels.
And a feature enhancement module. As shown in fig. 4, the feature enhancement module mainly utilizes the cavity convolution to adaptively learn different receptive fields in each feature map according to different scales of the detected nixie tube, so as to improve the accuracy of multi-scale target detection and identification. It can be divided into two parts: a multi-branch convolution layer and a multi-branch pooling layer. The multi-branch convolution layer is used for providing receptive fields with different sizes for the input feature images through hole convolution. And the average pooling layer is used for fusing the nixie tube image information from the three branch receptive fields so as to improve the accuracy of multi-scale prediction.
The process of the path aggregation network for aggregating the multi-scale feature map comprises the following steps: upsampling each low resolution feature; and carrying out graph element-by-element addition on the up-sampled feature graphs and the highest resolution feature graphs, inputting the feature graphs after element addition into a path aggregation module, and carrying out a series of convolution operations on the feature graphs after element addition to obtain an aggregated high-level feature graph.
Specifically, upsampling fuses low resolution features in that, for each lower resolution feature map, the PANet restores its size to the same size as the highest resolution feature map by an upsampling operation. These up-sampled feature maps are then added element by element to the highest resolution feature map, thereby achieving a fusion of the low resolution feature map with the high resolution feature map. Therefore, the information in the low-resolution feature map and the high-resolution feature map can be mutually fused, and feature expression is enriched. Path aggregation: in the fused feature map, the PANet further integrates multi-scale feature information in a path aggregation mode. Specifically, the PANet introduces a path aggregation module, which performs a series of convolution operations on the feature map to aggregate and interact feature information of different scales. Therefore, the feature graphs with different scales can be mutually influenced and promoted, and the characteristic capability of the features is enhanced.
Outputting the aggregated high-level feature map: and the high-level feature map after aggregation is obtained through the processing of the path aggregation module, and the feature map contains feature information from different scales and has richer and diversified feature expression capability. Such a high-level feature map will serve as input for subsequent detection tasks for detection and localization of targets.
And detecting a multi-scale nixie tube cloud platform. Firstly, the cloud platform can splice two pictures cut in the step S12, the detected picture information is obtained, the characteristic information is output at the same time, and for the nixie tube containing the defect information, the PLC can control the rejection hardware to reject the nixie tube after receiving the defect signal, so that the quality control of the nixie tube is realized.
In this embodiment, the process of the detection module processing the aggregated high-level feature map includes: performing a series of rolling and pooling operations on the polymerized high-level feature map to obtain a first feature map; converting the first feature map into a feature vector with a fixed size; inputting the feature vector into the full connection layer to obtain a classification score and a boundary frame coordinate; acquiring the position information of the detection target according to the classification score and the boundary frame coordinates; and eliminating the overlapped boundary boxes of the position information of the detection target by adopting a non-maximum suppression algorithm, thereby obtaining a final target detection result.
Specifically, convolution and pooling operations: the detection module first performs a series of convolution and pooling operations on the input high-level feature map to extract features and reduce the size of the feature map. These operations may be implemented by some convolution layers and pooling layers, e.g., convolution operations may help identify features, while pooling operations may reduce the size of feature maps and reduce the amount of computation.
Object classification and positioning: after the convolution and pooling operations, the detection module converts the feature map into a feature vector with a fixed size. The feature vector contains classification and positioning information for objects in the image. Typically, the detection module will map feature vectors to classification scores and bounding box coordinates using the full connection layer.
Prediction and post-processing: the target object detected in the image can be obtained through the classification score and the boundary frame coordinates output by the detection module. Typically, a threshold value is used to determine whether an object is detected based on the classification score, and then the positioning of the object is performed based on the bounding box coordinates.
Non-maximum suppression (NMS): after the initial results of target detection are obtained, non-maximal suppression (NMS) is typically applied to eliminate overlapping bounding boxes, resulting in final target detection results. The NMS removes repeated detection results according to the confidence coefficient of the target and the overlapping degree of the boundary box, and only the target detection result with the highest confidence coefficient is reserved.
In this embodiment, the loss function of the model includes a classification loss and a boundary regression loss; the classification loss adopts a binary cross entropy loss function; the loss function of the model is the sum of the classification loss and the boundary regression loss.
Specifically, YOLO measures the position information of the finally output target, the target category and the error between the confidence coefficient and the real target by using a loss function, and the smaller the value of the loss function is, the smaller the error between the two is, the error between the two is composed of a classification error and a boundary regression error, wherein the boundary regression error is constructed by CIoU, and the classification error is obtained by binary cross entropy calculation.
The binary cross entropy loss can be expressed as:
L class =-(y·log(p)+(1-y)·log(1-p))
where p represents the predictive probability and y represents true labels.
YOLO constructs the loss function of the bounding box using CIoU. The cross-over-Union (IoU) is a commonly used index in target detection, and is commonly used to measure the accuracy of the position information of the predicted result in the target detection, and the specific definition is as follows:
the CIoU used by YOLO is an extension of IoU because in target detection, the IoU index is too single and is only related to the overlap area. The CIoU simultaneously considers three relevant factors of the overlapping area, the geometric center distance and the length-width ratio of the boundary box and the real target, and comprehensively measures the coincidence degree of the boundary box and the real target. CIoU is specifically defined as the formula:
wherein IoU is the cross-over ratio for measuring the overlap area; d (D) center Euclidean distance between the geometric center of the predicted frame and the real geometric center of the target; d (D) circumscribe Euclidean distance of diagonal lines of the bounding rectangle true for the target;the method is used for measuring the distance between the prediction frame and the true geometric center of the target, and the closer the prediction frame is to the target, the smaller the distance between the prediction frame and the true geometric center of the target is; />The difference between the aspect ratio of the predicted frame and the aspect ratio of the real target is measured, wherein v is defined as the formula:
wherein w is width and h is height.
Since v is a real number between 0 and 1, the closer v is to 0 when the predicted frame aspect ratio is to the true target aspect ratio, the greater the difference between the two, the closer v is to 1.
The boundary regression loss is expressed as:
L CIoU =1-CIoU
the overall loss function of the model can be expressed as:
L=L class +L CIoU
wherein L is class For binary cross entropy loss, L CIoU Is a boundary regression loss.
While the foregoing is directed to embodiments, aspects and advantages of the present invention, other and further details of the invention may be had by the foregoing description, it will be understood that the foregoing embodiments are merely exemplary of the invention, and that any changes, substitutions, alterations, etc. which may be made herein without departing from the spirit and principles of the invention.
Claims (10)
1. The method for detecting the multi-scale nixie tube based on the improved YOLO self-adaptive attention-feature enhancement network is characterized by comprising the following steps of: acquiring a nixie tube image to be detected, and preprocessing the nixie tube image; inputting the predicted nixie tube image into a trained improved YOLO self-adaptive attention-feature enhancement network to obtain a detection result; the improved YOLO self-adaptive attention-feature enhancement network consists of a feature pyramid module, a path aggregation network and a detection module;
training the improved YOLO adaptive attention-feature enhancement network includes:
s1: acquiring a nixie tube image dataset, and preprocessing the nixie tube image in the dataset;
s2: dividing the preprocessed nixie tube image data set into a training set and a testing set;
s3: inputting the images in the training set into a feature pyramid module of an improved YOLO self-adaptive attention-feature enhancement network to obtain a multi-scale feature map;
s4: inputting the multi-scale feature map into a path aggregation network to obtain an aggregated high-level feature map;
s5: inputting the high-level feature map into a detection module to obtain a detection result;
s6: calculating a loss function of the model according to the detection result;
s7: parameters of the model are adjusted, and training of the network is completed when the loss function converges;
s8: and inputting the image in the test set into a trained improved YOLO self-adaptive attention-feature enhancement network to obtain a test result.
2. The method for detecting a multi-scale nixie tube based on an improved YOLO adaptive attention-feature enhancement network of claim 1, wherein preprocessing the nixie tube image comprises: cutting a region of 30% above the vertical direction and 10% below the vertical direction of the nixie tube image; the cut image is divided into two images in a bisection manner from the horizontal direction.
3. The method for multi-scale nixie tube detection based on improved YOLO adaptive attention-feature enhancement network of claim 1, wherein the processing of the input image using the feature pyramid module comprises: inputting an input image into a backbone network to obtain a high-level feature map F_h; optimizing the high-level characteristic image by adopting a self-adaptive attention module to obtain a fusion characteristic image; inputting the fusion feature map and the high-level feature map into a feature enhancement module, and fusing the enhanced feature map to obtain an optimized high-level feature map; and performing downsampling on the optimized high-level feature images for multiple times to obtain feature images with different scales.
4. A multi-scale nixie tube detection method based on an improved YOLO adaptive attention-feature enhancement network as in claim 3 wherein the process of the adaptive attention module processing high-level features comprises: inputting the high-level features into the self-adaptive pooling layer to obtain the context features with different scales; carrying out 1X 1 convolution on the context features of all scales to obtain feature graphs of different scales with the same channel dimension; upsampling the high-level feature map using bilinear interpolation; inputting the characteristic diagram after the adoption and the contextual characteristics of different scales into a Concat layer for channel combination to obtain a characteristic diagram after channel combination; the feature images after the channels are combined sequentially pass through a 1 multiplied by 1 convolution layer, a ReLU activation layer, a 3 multiplied by 3 convolution layer and a sigmoid activation layer to generate corresponding space weight images; and carrying out Hadamard product operation on the generated weight map and the feature map after the channels are combined to obtain a fusion feature map.
5. A multi-scale nixie tube detection method based on an improved YOLO adaptive attention-feature enhancement network as in claim 3 wherein the feature enhancement module comprises a multi-branch convolutional layer and a multi-branch pooling layer; the multi-branch convolution layer is used for providing receptive fields with different sizes for the input feature images through cavity convolution; the multi-branch pooling layer is used for fusing the nixie tube image information from the three branch receptive fields.
6. The method for detecting the multi-scale nixie tube based on the improved YOLO adaptive attention-feature enhancement network according to claim 1, wherein the process of aggregating the multi-scale feature map by the path aggregation network comprises the following steps: upsampling each low resolution feature; and carrying out graph element-by-element addition on the up-sampled feature graphs and the highest resolution feature graphs, inputting the feature graphs after element addition into a path aggregation module, and carrying out a series of convolution operations on the feature graphs after element addition to obtain an aggregated high-level feature graph.
7. The method for detecting the multiscale nixie tube based on the improved YOLO adaptive attention-feature enhancement network according to claim 1, wherein the process of processing the aggregated high-level feature map by the detection module comprises the following steps: performing a series of rolling and pooling operations on the polymerized high-level feature map to obtain a first feature map; converting the first feature map into a feature vector with a fixed size; inputting the feature vector into the full connection layer to obtain a classification score and a boundary frame coordinate; acquiring the position information of the detection target according to the classification score and the boundary frame coordinates; and eliminating the overlapped boundary boxes of the position information of the detection target by adopting a non-maximum suppression algorithm, thereby obtaining a final target detection result.
8. The method for multi-scale nixie tube detection based on improved YOLO adaptive attention-feature enhancement network of claim 1, wherein the model's loss functions include classification loss and boundary regression loss; the classification loss adopts a binary cross entropy loss function; the loss function of the model is the sum of the classification loss and the boundary regression loss.
9. The improved YOLO adaptive attention-feature enhancement network based multiscale nixie tube detection method of claim 8 wherein the expression of the binary cross entropy loss function is:
L class =-(y·log(p)+(1-y)·log(1-p))
where p represents the predictive probability and y represents true labels.
10. The improved YOLO adaptive attention-feature enhancement network based multiscale nixie tube detection method of claim 8 wherein the expression of the boundary regression loss is:
L CIoU =1-CIoU
wherein CIoU represents the coincidence degree of the boundary box and the real target, ioU represents the intersection ratio, D center Representing the Euclidean distance between the geometric center of the predicted frame and the true geometric center of the target, D circumscribe Diagonal line of circumscribed rectangle representing true objectV is a real number between 0 and 1, w is the width and h is the height.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310906241.1A CN117095155A (en) | 2023-07-21 | 2023-07-21 | Multi-scale nixie tube detection method based on improved YOLO self-adaptive attention-feature enhancement network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310906241.1A CN117095155A (en) | 2023-07-21 | 2023-07-21 | Multi-scale nixie tube detection method based on improved YOLO self-adaptive attention-feature enhancement network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117095155A true CN117095155A (en) | 2023-11-21 |
Family
ID=88776096
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310906241.1A Pending CN117095155A (en) | 2023-07-21 | 2023-07-21 | Multi-scale nixie tube detection method based on improved YOLO self-adaptive attention-feature enhancement network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117095155A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117314898A (en) * | 2023-11-28 | 2023-12-29 | 中南大学 | Multistage train rail edge part detection method |
CN118096763A (en) * | 2024-04-28 | 2024-05-28 | 万商电力设备有限公司 | Ring network load switch cabinet surface quality detection method |
CN118552798A (en) * | 2024-07-30 | 2024-08-27 | 绍兴建元电力集团有限公司 | Infrared photovoltaic hot spot detection method for multi-scale center surrounding inhibition |
-
2023
- 2023-07-21 CN CN202310906241.1A patent/CN117095155A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117314898A (en) * | 2023-11-28 | 2023-12-29 | 中南大学 | Multistage train rail edge part detection method |
CN117314898B (en) * | 2023-11-28 | 2024-03-01 | 中南大学 | Multistage train rail edge part detection method |
CN118096763A (en) * | 2024-04-28 | 2024-05-28 | 万商电力设备有限公司 | Ring network load switch cabinet surface quality detection method |
CN118552798A (en) * | 2024-07-30 | 2024-08-27 | 绍兴建元电力集团有限公司 | Infrared photovoltaic hot spot detection method for multi-scale center surrounding inhibition |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111223088B (en) | Casting surface defect identification method based on deep convolutional neural network | |
CN117095155A (en) | Multi-scale nixie tube detection method based on improved YOLO self-adaptive attention-feature enhancement network | |
CN111160249A (en) | Multi-class target detection method of optical remote sensing image based on cross-scale feature fusion | |
CN113920107A (en) | Insulator damage detection method based on improved yolov5 algorithm | |
CN114972213A (en) | Two-stage mainboard image defect detection and positioning method based on machine vision | |
CN115861772A (en) | Multi-scale single-stage target detection method based on RetinaNet | |
CN112861635A (en) | Fire and smoke real-time detection method based on deep learning | |
CN114612937B (en) | Pedestrian detection method based on single-mode enhancement by combining infrared light and visible light | |
CN113065431B (en) | Human body violation prediction method based on hidden Markov model and recurrent neural network | |
CN115830004A (en) | Surface defect detection method, device, computer equipment and storage medium | |
CN115272652A (en) | Dense object image detection method based on multiple regression and adaptive focus loss | |
CN111754507A (en) | Light-weight industrial defect image classification method based on strong attention machine mechanism | |
CN116092179A (en) | Improved Yolox fall detection system | |
CN114782311A (en) | Improved multi-scale defect target detection method and system based on CenterNet | |
CN115775236A (en) | Surface tiny defect visual detection method and system based on multi-scale feature fusion | |
CN115937659A (en) | Mask-RCNN-based multi-target detection method in indoor complex environment | |
CN114758255A (en) | Unmanned aerial vehicle detection method based on YOLOV5 algorithm | |
CN116071315A (en) | Product visual defect detection method and system based on machine vision | |
CN114926400A (en) | Fan blade defect detection method based on improved YOLOv5 | |
CN111667465A (en) | Metal hand basin defect detection method based on far infrared image | |
CN113936034A (en) | Apparent motion combined weak and small moving object detection method combined with interframe light stream | |
CN113808099A (en) | Aluminum product surface defect detection device and method | |
CN117409244A (en) | SCKConv multi-scale feature fusion enhanced low-illumination small target detection method | |
CN113191352A (en) | Water meter pointer reading identification method based on target detection and binary image detection | |
CN114078106A (en) | Defect detection method based on improved Faster R-CNN |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |