CN117095155A - Multi-scale nixie tube detection method based on improved YOLO self-adaptive attention-feature enhancement network - Google Patents

Multi-scale nixie tube detection method based on improved YOLO self-adaptive attention-feature enhancement network Download PDF

Info

Publication number
CN117095155A
CN117095155A CN202310906241.1A CN202310906241A CN117095155A CN 117095155 A CN117095155 A CN 117095155A CN 202310906241 A CN202310906241 A CN 202310906241A CN 117095155 A CN117095155 A CN 117095155A
Authority
CN
China
Prior art keywords
feature
nixie tube
image
detection
scale
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310906241.1A
Other languages
Chinese (zh)
Inventor
孟凯
卢星宇
刘宴兵
涂琪琳
胡思楠
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN202310906241.1A priority Critical patent/CN117095155A/en
Publication of CN117095155A publication Critical patent/CN117095155A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0004Industrial image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30108Industrial image inspection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Image Analysis (AREA)

Abstract

The invention belongs to the field of computer vision, and particularly relates to a multi-scale nixie tube detection method based on an improved YOLO self-adaptive attention-feature enhancement network, which comprises the following steps: acquiring a nixie tube image to be detected, and preprocessing the nixie tube image; inputting the predicted nixie tube image into an improved YOLO self-adaptive attention-feature enhancement network, and performing multi-scale feature aggregation treatment on the feature image through a feature pyramid module and a path aggregation network to obtain an aggregated high-level feature image; inputting the high-level feature map into a detection module to obtain a detection result; the invention adopts the self-adaptive force-meaning module and the characteristic enhancement module to extract and enhance the multi-scale characteristics of the characteristic map, thereby improving the detection precision.

Description

Multi-scale nixie tube detection method based on improved YOLO self-adaptive attention-feature enhancement network
Technical Field
The invention belongs to the field of computer vision, and particularly relates to a multiscale nixie tube detection method based on an improved YOLO self-adaptive attention-feature enhancement network.
Background
In the production and manufacturing links of the nixie tube, the detection of the nixie tube is the last step of the production and manufacturing links of the nixie tube, and the detection links are used for detecting the electrical performance, the appearance and the like of the nixie tube. Compared with the design and the manufacture of the nixie tube, the detection of the nixie tube is a labor-intensive industry with lower technical content, and mainly because the detection of the nixie tube is mostly carried out manually in the detection link of the nixie tube. However, with the improvement of the automation degree of digital production, the conventional manual detection is not suitable for the modern production due to the limitations of human eyes on the working time, accuracy, stability of the judgment result and judgment speed. The application of machine vision instead of manual detection has become an important trend for nixie tube detection.
Object detection is a machine vision task that processes computer digital images and gives an example of some type of visual object (e.g., birds, humans, or vehicles) therein. Object detection gives one of the basic information required for machine vision applications: "what object is where. Target detection is one of the basic problems of machine vision, and is the basis of some machine vision tasks such as image segmentation, target tracking and the like. The output of the target detection is typically an algorithmically generated bounding box indicating the detected target location, class and confidence. As the performance of manual features is increasingly becoming extremely powerful, the performance of traditional detection algorithms encounters bottlenecks in 2010, and during 2010-2012, the target detection algorithms develop more slowly. In 2012, convolutional neural networks (Convolutional Neural Network, CNN) were extensively studied due to the excellent performance of AlexNet exhibited in picture classification. Currently, the target detection algorithms based on neural networks can be classified into two types according to the determination process of the detection target: the "two-stage detection" of the detection frame and the "one-stage detection" of the direct generation detection frame are corrected step by step, and YOLO belongs to a CNN-based one-stage method.
As the latest framework in the current YOLO series, YOLOv5 is one of detection algorithms that achieve good performance in terms of both real-time performance and detection accuracy at present. Many researchers have applied it to various fields and have proposed various improvements. The patent "a method for detecting defects of aluminum sheets based on YOLOv5 (application number: 202211678791)" applies YOLOv5 to image defect detection of aluminum sheets, detects the defects and eliminates them. However, the method does not consider the dimensional change of the target in the detection process, and can have a certain influence on the detection precision.
Disclosure of Invention
In order to solve the problem of low accuracy of detection results caused by large scale variation of a factor tube chip in the prior art, the invention provides a multi-scale nixie tube detection method based on an improved YOLO self-adaptive attention-feature enhancement network, which comprises the following steps: acquiring a nixie tube image to be detected, and preprocessing the nixie tube image; inputting the predicted nixie tube image into a trained improved YOLO self-adaptive attention-feature enhancement network to obtain a detection result; the improved YOLO self-adaptive attention-feature enhancement network consists of a feature pyramid module, a path aggregation network and a detection module;
training the improved YOLO adaptive attention-feature enhancement network includes:
s1: acquiring a nixie tube image dataset, and preprocessing the nixie tube image in the dataset;
s2: dividing the preprocessed nixie tube image data set into a training set and a testing set;
s3: inputting the images in the training set into a feature pyramid module of an improved YOLO self-adaptive attention-feature enhancement network to obtain a multi-scale feature map;
s4: inputting the multi-scale feature map into a path aggregation network to obtain an aggregated high-level feature map;
s5: inputting the high-level feature map into a detection module to obtain a detection result;
s6: calculating a loss function of the model according to the detection result;
s7: parameters of the model are adjusted, and training of the network is completed when the loss function converges;
s8: and inputting the image in the test set into a trained improved YOLO self-adaptive attention-feature enhancement network to obtain a test result.
Preferably, preprocessing the nixie tube image includes: cutting a region of 30% above the vertical direction and 10% below the vertical direction of the nixie tube image; the cut image is divided into two images in a bisection manner from the horizontal direction.
Preferably, the processing of the input image by the feature pyramid module includes: inputting an input image into a backbone network to obtain a high-level feature map F_h; optimizing the high-level characteristic image by adopting a self-adaptive attention module to obtain a fusion characteristic image; inputting the fusion feature map and the high-level feature map into a feature enhancement module, and fusing the enhanced feature map to obtain an optimized high-level feature map; and performing downsampling on the optimized high-level feature images for multiple times to obtain feature images with different scales.
Further, the process of the adaptive attention module for processing the high-level features includes: inputting the high-level features into the self-adaptive pooling layer to obtain the context features with different scales; carrying out 1X 1 convolution on the context features of all scales to obtain feature graphs of different scales with the same channel dimension; upsampling the high-level feature map using bilinear interpolation; inputting the characteristic diagram after the adoption and the contextual characteristics of different scales into a Concat layer for channel combination to obtain a characteristic diagram after channel combination; the feature images after the channels are combined sequentially pass through a 1 multiplied by 1 convolution layer, a ReLU activation layer, a 3 multiplied by 3 convolution layer and a sigmoid activation layer to generate corresponding space weight images; and carrying out Hadamard product operation on the generated weight map and the feature map after the channels are combined to obtain a fusion feature map.
Further, the characteristic enhancement module comprises a multi-branch convolution layer and a multi-branch pooling layer; the multi-branch convolution layer is used for providing receptive fields with different sizes for the input feature images through cavity convolution; the multi-branch pooling layer is used for fusing the nixie tube image information from the three branch receptive fields.
Preferably, the process of the path aggregation network for aggregating the multi-scale feature map includes: upsampling each low resolution feature; and carrying out graph element-by-element addition on the up-sampled feature graphs and the highest resolution feature graphs, inputting the feature graphs after element addition into a path aggregation module, and carrying out a series of convolution operations on the feature graphs after element addition to obtain an aggregated high-level feature graph.
Preferably, the process of the detection module for processing the aggregated high-level feature map includes: performing a series of rolling and pooling operations on the polymerized high-level feature map to obtain a first feature map; converting the first feature map into a feature vector with a fixed size; inputting the feature vector into the full connection layer to obtain a classification score and a boundary frame coordinate; acquiring the position information of the detection target according to the classification score and the boundary frame coordinates; and eliminating the overlapped boundary boxes of the position information of the detection target by adopting a non-maximum suppression algorithm, thereby obtaining a final target detection result.
Preferably, the model's loss function includes a classification loss and a boundary regression loss; the classification loss adopts a binary cross entropy loss function; the loss function of the model is the sum of the classification loss and the boundary regression loss.
Further, the expression of the binary cross entropy loss function is:
L class =-(y·log(p)+(1-y)·log(1-p))
where p represents the predictive probability and y represents true labels.
Further, the expression of the boundary regression loss is:
L CIoU =1-CIoU
wherein CIoU represents the coincidence degree of the boundary box and the real target, ioU represents the intersection ratio, D center Representing the Euclidean distance between the geometric center of the predicted frame and the true geometric center of the target, D circumscribe The euclidean distance of the diagonal of the bounding rectangle representing the true object, v is a real number between 0 and 1, w is the width, and h is the height.
The invention has the beneficial effects that:
1. the invention provides a nixie tube appearance detection method, which replaces the traditional manual inspection mode, adopts an automatic detection mode, greatly improves the production efficiency, reduces the input of human resources and saves the time and the cost. 2. The appearance detection system of the nixie tube can detect appearance defects of the nixie tube, including the problems of cracks, scratches, stains and the like, with high precision and high speed. By timely finding and removing unqualified products, the quality and consistency of the products are effectively improved, and the rate of unqualified products is reduced. 3. The appearance quality of the nixie tube in the production process can be monitored in real time by the appearance detection system, and abnormal conditions can be found in time and fed back. The method is beneficial to the production enterprises to adjust production parameters in time and ensures the stability and consistency of the product quality. 4. The invention adopts the self-adaptive force-meaning module and the characteristic enhancement module to extract and enhance the multi-scale characteristics of the characteristic map, thereby improving the detection precision.
Drawings
FIG. 1 is a general flow chart of the present invention;
FIG. 2 is a network architecture diagram of a recognition method based on adaptive attention-feature enhancement of Yolov 5;
FIG. 3 is a block diagram of an adaptive attention network;
FIG. 4 is a diagram of a feature enhanced network architecture;
fig. 5 is a model training loss diagram.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
A multi-scale nixie tube detection method based on an improved YOLO self-adaptive attention-feature enhancement network is shown in figure 1, and comprises the following steps: acquiring a nixie tube image to be detected, and preprocessing the nixie tube image; inputting the predicted nixie tube image into a trained improved YOLO self-adaptive attention-feature enhancement network to obtain a detection result; the improved YOLO adaptive attention-feature enhancement network consists of a feature pyramid module, a path aggregation network, and a detection module.
Training the improved YOLO adaptive attention-feature enhancement network includes:
s1: acquiring a nixie tube image dataset, and preprocessing the nixie tube image in the dataset;
s2: dividing the preprocessed nixie tube image data set into a training set and a testing set;
s3: inputting the images in the training set into a feature pyramid module of an improved YOLO self-adaptive attention-feature enhancement network to obtain a multi-scale feature map;
s4: inputting the multi-scale feature map into a path aggregation network to obtain an aggregated high-level feature map;
s5: inputting the high-level feature map into a detection module to obtain a detection result;
s6: calculating a loss function of the model according to the detection result;
s7: parameters of the model are adjusted, and training of the network is completed when the loss function converges;
s8: and inputting the image in the test set into a trained improved YOLO self-adaptive attention-feature enhancement network to obtain a test result.
Preprocessing the nixie tube image comprises the following steps: and (3) performing data enhancement on the photographed pictures in four directions and marking the pictures as a training and verification test set. And then, inputting the manufactured nixie tube image data set into a constructed self-adaptive attention-feature enhancement network structure of YOLOv5 to obtain a trained nixie tube defect identification network model. The training process continues for 300 rounds, and the loss function values during the training process are shown in fig. 5, where box_loss, obj_loss, and cls_loss are the boundary regression error, the classification error of foreground-background prediction, and the classification error of positive samples, respectively.
The acquiring the nixie tube image comprises the following steps: the digital tube is connected with the optical fiber sensor, the optical fiber sensor monitors that the digital tube arrives when the digital tube moves to the front of the camera, the PLC receives signals and then controls the camera to acquire digital tube image information, and the image information is transmitted to the cloud platform. The resolution of the acquired nixie tube image is 3072×2048, in order to save calculation resources, the YOLO detection algorithm generally compresses the resolution of the image to 640×640 and then inputs the image into the neural network, so that the defect that the diameter of the original image is about 10 pixels is reduced to a degree that the defect is difficult to detect. Therefore, in order to improve the model performance, the nixie tube image is cropped in the following way: all pictures cut out the upper 30% and lower 10% portions in the vertical direction, with the remaining portion bisected into two images from the horizontal direction.
Construction of an adaptive attention-feature enhanced recognition method module based on YOLOv5, as shown in fig. 2. Feature pyramids (Feature Pyramid Network, FPN) and path aggregation networks (Path Aggregation Network, PAN) in the YOLOv5 model are both methods for multi-scale object detection. FPN is a top-down feature pyramid structure, and multi-scale target detection is achieved by fusing a high-level feature map with a low-level feature map. PAN is a path aggregation network that aggregates feature maps of different levels to achieve multi-scale target detection. In YOLOv5, FPN and PAN structures are used to multi-scale fuse features. In the invention, the self-adaptive attention module and the feature enhancement module are utilized to reduce information loss in the process of generating the feature map and enhance the feature pyramid of the representation capability.
The process of processing the input image by adopting the characteristic pyramid module comprises the following steps: inputting an input image into a backbone network to obtain a high-level feature map F_h; optimizing the high-level characteristic image by adopting a self-adaptive attention module to obtain a fusion characteristic image; inputting the fusion feature map and the high-level feature map into a feature enhancement module, and fusing the enhanced feature map to obtain an optimized high-level feature map; and performing downsampling on the optimized high-level feature images for multiple times to obtain feature images with different scales.
Specifically, the backbone network extracts advanced features: the input image I is passed through a backbone network (e.g., CSPDarknet 53) to obtain an advanced feature map f_h, expressed as: f_h=backbox (I), where the size of f_h is H/32×w/32, and H and W represent the height and width of the input image, respectively. Downsampling to obtain feature maps of different resolutions: through multiple downsampling operations, feature graphs F_ { h/2}, F_ { h/4}, F_ { h/8} and the like with different resolutions are obtained, and are expressed as follows: f_ { h/2} = downsampled (f_h), f_ { h/4} =
Downsamples (f_ { h/2 }), f_ { h/8} = downsamples (f_ { h/4 }), wherein downsamples represent downsampling operations. Namely, the feature pyramid mainly comprises a main network for extracting advanced features and downsampling to obtain feature graphs with different resolutions. The purpose of the feature pyramid is to obtain feature graphs with different scales through multiple downsampling operations so as to capture targets with different sizes.
An adaptive attention module. Adaptive attention network architecture as shown in fig. 3, for the input of the adaptive attention module, the context features of different scales (h1×s, h2×s, h3×s) are first obtained by the adaptive pooling layer. Each context feature is then convolved by 1 x 1 to yield the same channel dimension 256. They are upsampled to the scale of S using bilinear interpolation for subsequent fusion. The spatial attention mechanism merges the channels of the three context features through one Concat layer, and then the feature map sequentially generates corresponding spatial weights through a 1×1 convolution layer, a ReLU activation layer, a 3×3 convolution layer and a sigmoid activation layer. And carrying out Hadamard product operation on the generated weight graph and the feature graph after the channels are combined, separating, adding the obtained product to the input feature graph, and aggregating the context features. The final feature map has rich multi-scale context information, which alleviates to some extent the loss of information due to the reduced number of channels.
And a feature enhancement module. As shown in fig. 4, the feature enhancement module mainly utilizes the cavity convolution to adaptively learn different receptive fields in each feature map according to different scales of the detected nixie tube, so as to improve the accuracy of multi-scale target detection and identification. It can be divided into two parts: a multi-branch convolution layer and a multi-branch pooling layer. The multi-branch convolution layer is used for providing receptive fields with different sizes for the input feature images through hole convolution. And the average pooling layer is used for fusing the nixie tube image information from the three branch receptive fields so as to improve the accuracy of multi-scale prediction.
The process of the path aggregation network for aggregating the multi-scale feature map comprises the following steps: upsampling each low resolution feature; and carrying out graph element-by-element addition on the up-sampled feature graphs and the highest resolution feature graphs, inputting the feature graphs after element addition into a path aggregation module, and carrying out a series of convolution operations on the feature graphs after element addition to obtain an aggregated high-level feature graph.
Specifically, upsampling fuses low resolution features in that, for each lower resolution feature map, the PANet restores its size to the same size as the highest resolution feature map by an upsampling operation. These up-sampled feature maps are then added element by element to the highest resolution feature map, thereby achieving a fusion of the low resolution feature map with the high resolution feature map. Therefore, the information in the low-resolution feature map and the high-resolution feature map can be mutually fused, and feature expression is enriched. Path aggregation: in the fused feature map, the PANet further integrates multi-scale feature information in a path aggregation mode. Specifically, the PANet introduces a path aggregation module, which performs a series of convolution operations on the feature map to aggregate and interact feature information of different scales. Therefore, the feature graphs with different scales can be mutually influenced and promoted, and the characteristic capability of the features is enhanced.
Outputting the aggregated high-level feature map: and the high-level feature map after aggregation is obtained through the processing of the path aggregation module, and the feature map contains feature information from different scales and has richer and diversified feature expression capability. Such a high-level feature map will serve as input for subsequent detection tasks for detection and localization of targets.
And detecting a multi-scale nixie tube cloud platform. Firstly, the cloud platform can splice two pictures cut in the step S12, the detected picture information is obtained, the characteristic information is output at the same time, and for the nixie tube containing the defect information, the PLC can control the rejection hardware to reject the nixie tube after receiving the defect signal, so that the quality control of the nixie tube is realized.
In this embodiment, the process of the detection module processing the aggregated high-level feature map includes: performing a series of rolling and pooling operations on the polymerized high-level feature map to obtain a first feature map; converting the first feature map into a feature vector with a fixed size; inputting the feature vector into the full connection layer to obtain a classification score and a boundary frame coordinate; acquiring the position information of the detection target according to the classification score and the boundary frame coordinates; and eliminating the overlapped boundary boxes of the position information of the detection target by adopting a non-maximum suppression algorithm, thereby obtaining a final target detection result.
Specifically, convolution and pooling operations: the detection module first performs a series of convolution and pooling operations on the input high-level feature map to extract features and reduce the size of the feature map. These operations may be implemented by some convolution layers and pooling layers, e.g., convolution operations may help identify features, while pooling operations may reduce the size of feature maps and reduce the amount of computation.
Object classification and positioning: after the convolution and pooling operations, the detection module converts the feature map into a feature vector with a fixed size. The feature vector contains classification and positioning information for objects in the image. Typically, the detection module will map feature vectors to classification scores and bounding box coordinates using the full connection layer.
Prediction and post-processing: the target object detected in the image can be obtained through the classification score and the boundary frame coordinates output by the detection module. Typically, a threshold value is used to determine whether an object is detected based on the classification score, and then the positioning of the object is performed based on the bounding box coordinates.
Non-maximum suppression (NMS): after the initial results of target detection are obtained, non-maximal suppression (NMS) is typically applied to eliminate overlapping bounding boxes, resulting in final target detection results. The NMS removes repeated detection results according to the confidence coefficient of the target and the overlapping degree of the boundary box, and only the target detection result with the highest confidence coefficient is reserved.
In this embodiment, the loss function of the model includes a classification loss and a boundary regression loss; the classification loss adopts a binary cross entropy loss function; the loss function of the model is the sum of the classification loss and the boundary regression loss.
Specifically, YOLO measures the position information of the finally output target, the target category and the error between the confidence coefficient and the real target by using a loss function, and the smaller the value of the loss function is, the smaller the error between the two is, the error between the two is composed of a classification error and a boundary regression error, wherein the boundary regression error is constructed by CIoU, and the classification error is obtained by binary cross entropy calculation.
The binary cross entropy loss can be expressed as:
L class =-(y·log(p)+(1-y)·log(1-p))
where p represents the predictive probability and y represents true labels.
YOLO constructs the loss function of the bounding box using CIoU. The cross-over-Union (IoU) is a commonly used index in target detection, and is commonly used to measure the accuracy of the position information of the predicted result in the target detection, and the specific definition is as follows:
the CIoU used by YOLO is an extension of IoU because in target detection, the IoU index is too single and is only related to the overlap area. The CIoU simultaneously considers three relevant factors of the overlapping area, the geometric center distance and the length-width ratio of the boundary box and the real target, and comprehensively measures the coincidence degree of the boundary box and the real target. CIoU is specifically defined as the formula:
wherein IoU is the cross-over ratio for measuring the overlap area; d (D) center Euclidean distance between the geometric center of the predicted frame and the real geometric center of the target; d (D) circumscribe Euclidean distance of diagonal lines of the bounding rectangle true for the target;the method is used for measuring the distance between the prediction frame and the true geometric center of the target, and the closer the prediction frame is to the target, the smaller the distance between the prediction frame and the true geometric center of the target is; />The difference between the aspect ratio of the predicted frame and the aspect ratio of the real target is measured, wherein v is defined as the formula:
wherein w is width and h is height.
Since v is a real number between 0 and 1, the closer v is to 0 when the predicted frame aspect ratio is to the true target aspect ratio, the greater the difference between the two, the closer v is to 1.
The boundary regression loss is expressed as:
L CIoU =1-CIoU
the overall loss function of the model can be expressed as:
L=L class +L CIoU
wherein L is class For binary cross entropy loss, L CIoU Is a boundary regression loss.
While the foregoing is directed to embodiments, aspects and advantages of the present invention, other and further details of the invention may be had by the foregoing description, it will be understood that the foregoing embodiments are merely exemplary of the invention, and that any changes, substitutions, alterations, etc. which may be made herein without departing from the spirit and principles of the invention.

Claims (10)

1. The method for detecting the multi-scale nixie tube based on the improved YOLO self-adaptive attention-feature enhancement network is characterized by comprising the following steps of: acquiring a nixie tube image to be detected, and preprocessing the nixie tube image; inputting the predicted nixie tube image into a trained improved YOLO self-adaptive attention-feature enhancement network to obtain a detection result; the improved YOLO self-adaptive attention-feature enhancement network consists of a feature pyramid module, a path aggregation network and a detection module;
training the improved YOLO adaptive attention-feature enhancement network includes:
s1: acquiring a nixie tube image dataset, and preprocessing the nixie tube image in the dataset;
s2: dividing the preprocessed nixie tube image data set into a training set and a testing set;
s3: inputting the images in the training set into a feature pyramid module of an improved YOLO self-adaptive attention-feature enhancement network to obtain a multi-scale feature map;
s4: inputting the multi-scale feature map into a path aggregation network to obtain an aggregated high-level feature map;
s5: inputting the high-level feature map into a detection module to obtain a detection result;
s6: calculating a loss function of the model according to the detection result;
s7: parameters of the model are adjusted, and training of the network is completed when the loss function converges;
s8: and inputting the image in the test set into a trained improved YOLO self-adaptive attention-feature enhancement network to obtain a test result.
2. The method for detecting a multi-scale nixie tube based on an improved YOLO adaptive attention-feature enhancement network of claim 1, wherein preprocessing the nixie tube image comprises: cutting a region of 30% above the vertical direction and 10% below the vertical direction of the nixie tube image; the cut image is divided into two images in a bisection manner from the horizontal direction.
3. The method for multi-scale nixie tube detection based on improved YOLO adaptive attention-feature enhancement network of claim 1, wherein the processing of the input image using the feature pyramid module comprises: inputting an input image into a backbone network to obtain a high-level feature map F_h; optimizing the high-level characteristic image by adopting a self-adaptive attention module to obtain a fusion characteristic image; inputting the fusion feature map and the high-level feature map into a feature enhancement module, and fusing the enhanced feature map to obtain an optimized high-level feature map; and performing downsampling on the optimized high-level feature images for multiple times to obtain feature images with different scales.
4. A multi-scale nixie tube detection method based on an improved YOLO adaptive attention-feature enhancement network as in claim 3 wherein the process of the adaptive attention module processing high-level features comprises: inputting the high-level features into the self-adaptive pooling layer to obtain the context features with different scales; carrying out 1X 1 convolution on the context features of all scales to obtain feature graphs of different scales with the same channel dimension; upsampling the high-level feature map using bilinear interpolation; inputting the characteristic diagram after the adoption and the contextual characteristics of different scales into a Concat layer for channel combination to obtain a characteristic diagram after channel combination; the feature images after the channels are combined sequentially pass through a 1 multiplied by 1 convolution layer, a ReLU activation layer, a 3 multiplied by 3 convolution layer and a sigmoid activation layer to generate corresponding space weight images; and carrying out Hadamard product operation on the generated weight map and the feature map after the channels are combined to obtain a fusion feature map.
5. A multi-scale nixie tube detection method based on an improved YOLO adaptive attention-feature enhancement network as in claim 3 wherein the feature enhancement module comprises a multi-branch convolutional layer and a multi-branch pooling layer; the multi-branch convolution layer is used for providing receptive fields with different sizes for the input feature images through cavity convolution; the multi-branch pooling layer is used for fusing the nixie tube image information from the three branch receptive fields.
6. The method for detecting the multi-scale nixie tube based on the improved YOLO adaptive attention-feature enhancement network according to claim 1, wherein the process of aggregating the multi-scale feature map by the path aggregation network comprises the following steps: upsampling each low resolution feature; and carrying out graph element-by-element addition on the up-sampled feature graphs and the highest resolution feature graphs, inputting the feature graphs after element addition into a path aggregation module, and carrying out a series of convolution operations on the feature graphs after element addition to obtain an aggregated high-level feature graph.
7. The method for detecting the multiscale nixie tube based on the improved YOLO adaptive attention-feature enhancement network according to claim 1, wherein the process of processing the aggregated high-level feature map by the detection module comprises the following steps: performing a series of rolling and pooling operations on the polymerized high-level feature map to obtain a first feature map; converting the first feature map into a feature vector with a fixed size; inputting the feature vector into the full connection layer to obtain a classification score and a boundary frame coordinate; acquiring the position information of the detection target according to the classification score and the boundary frame coordinates; and eliminating the overlapped boundary boxes of the position information of the detection target by adopting a non-maximum suppression algorithm, thereby obtaining a final target detection result.
8. The method for multi-scale nixie tube detection based on improved YOLO adaptive attention-feature enhancement network of claim 1, wherein the model's loss functions include classification loss and boundary regression loss; the classification loss adopts a binary cross entropy loss function; the loss function of the model is the sum of the classification loss and the boundary regression loss.
9. The improved YOLO adaptive attention-feature enhancement network based multiscale nixie tube detection method of claim 8 wherein the expression of the binary cross entropy loss function is:
L class =-(y·log(p)+(1-y)·log(1-p))
where p represents the predictive probability and y represents true labels.
10. The improved YOLO adaptive attention-feature enhancement network based multiscale nixie tube detection method of claim 8 wherein the expression of the boundary regression loss is:
L CIoU =1-CIoU
wherein CIoU represents the coincidence degree of the boundary box and the real target, ioU represents the intersection ratio, D center Representing the Euclidean distance between the geometric center of the predicted frame and the true geometric center of the target, D circumscribe Diagonal line of circumscribed rectangle representing true objectV is a real number between 0 and 1, w is the width and h is the height.
CN202310906241.1A 2023-07-21 2023-07-21 Multi-scale nixie tube detection method based on improved YOLO self-adaptive attention-feature enhancement network Pending CN117095155A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310906241.1A CN117095155A (en) 2023-07-21 2023-07-21 Multi-scale nixie tube detection method based on improved YOLO self-adaptive attention-feature enhancement network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310906241.1A CN117095155A (en) 2023-07-21 2023-07-21 Multi-scale nixie tube detection method based on improved YOLO self-adaptive attention-feature enhancement network

Publications (1)

Publication Number Publication Date
CN117095155A true CN117095155A (en) 2023-11-21

Family

ID=88776096

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310906241.1A Pending CN117095155A (en) 2023-07-21 2023-07-21 Multi-scale nixie tube detection method based on improved YOLO self-adaptive attention-feature enhancement network

Country Status (1)

Country Link
CN (1) CN117095155A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117314898A (en) * 2023-11-28 2023-12-29 中南大学 Multistage train rail edge part detection method
CN118096763A (en) * 2024-04-28 2024-05-28 万商电力设备有限公司 Ring network load switch cabinet surface quality detection method
CN118552798A (en) * 2024-07-30 2024-08-27 绍兴建元电力集团有限公司 Infrared photovoltaic hot spot detection method for multi-scale center surrounding inhibition

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117314898A (en) * 2023-11-28 2023-12-29 中南大学 Multistage train rail edge part detection method
CN117314898B (en) * 2023-11-28 2024-03-01 中南大学 Multistage train rail edge part detection method
CN118096763A (en) * 2024-04-28 2024-05-28 万商电力设备有限公司 Ring network load switch cabinet surface quality detection method
CN118552798A (en) * 2024-07-30 2024-08-27 绍兴建元电力集团有限公司 Infrared photovoltaic hot spot detection method for multi-scale center surrounding inhibition

Similar Documents

Publication Publication Date Title
CN111223088B (en) Casting surface defect identification method based on deep convolutional neural network
CN117095155A (en) Multi-scale nixie tube detection method based on improved YOLO self-adaptive attention-feature enhancement network
CN111160249A (en) Multi-class target detection method of optical remote sensing image based on cross-scale feature fusion
CN113920107A (en) Insulator damage detection method based on improved yolov5 algorithm
CN114972213A (en) Two-stage mainboard image defect detection and positioning method based on machine vision
CN115861772A (en) Multi-scale single-stage target detection method based on RetinaNet
CN112861635A (en) Fire and smoke real-time detection method based on deep learning
CN114612937B (en) Pedestrian detection method based on single-mode enhancement by combining infrared light and visible light
CN113065431B (en) Human body violation prediction method based on hidden Markov model and recurrent neural network
CN115830004A (en) Surface defect detection method, device, computer equipment and storage medium
CN115272652A (en) Dense object image detection method based on multiple regression and adaptive focus loss
CN111754507A (en) Light-weight industrial defect image classification method based on strong attention machine mechanism
CN116092179A (en) Improved Yolox fall detection system
CN114782311A (en) Improved multi-scale defect target detection method and system based on CenterNet
CN115775236A (en) Surface tiny defect visual detection method and system based on multi-scale feature fusion
CN115937659A (en) Mask-RCNN-based multi-target detection method in indoor complex environment
CN114758255A (en) Unmanned aerial vehicle detection method based on YOLOV5 algorithm
CN116071315A (en) Product visual defect detection method and system based on machine vision
CN114926400A (en) Fan blade defect detection method based on improved YOLOv5
CN111667465A (en) Metal hand basin defect detection method based on far infrared image
CN113936034A (en) Apparent motion combined weak and small moving object detection method combined with interframe light stream
CN113808099A (en) Aluminum product surface defect detection device and method
CN117409244A (en) SCKConv multi-scale feature fusion enhanced low-illumination small target detection method
CN113191352A (en) Water meter pointer reading identification method based on target detection and binary image detection
CN114078106A (en) Defect detection method based on improved Faster R-CNN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination