CN116895030B - Insulator detection method based on target detection algorithm and attention mechanism - Google Patents

Insulator detection method based on target detection algorithm and attention mechanism Download PDF

Info

Publication number
CN116895030B
CN116895030B CN202311163428.3A CN202311163428A CN116895030B CN 116895030 B CN116895030 B CN 116895030B CN 202311163428 A CN202311163428 A CN 202311163428A CN 116895030 B CN116895030 B CN 116895030B
Authority
CN
China
Prior art keywords
module
attention
feature map
target
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311163428.3A
Other languages
Chinese (zh)
Other versions
CN116895030A (en
Inventor
经弈逍
高淋锋
吴昀璞
黄永茂
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xihua University
Original Assignee
Xihua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xihua University filed Critical Xihua University
Priority to CN202311163428.3A priority Critical patent/CN116895030B/en
Publication of CN116895030A publication Critical patent/CN116895030A/en
Application granted granted Critical
Publication of CN116895030B publication Critical patent/CN116895030B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/082Learning methods modifying the architecture, e.g. adding, deleting or silencing nodes or connections
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y04INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
    • Y04SSYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
    • Y04S10/00Systems supporting electrical power generation, transmission or distribution
    • Y04S10/50Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Remote Sensing (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to the technical field of image processing, and provides an insulator detection method based on a target detection algorithm and an attention mechanism, which comprises the following steps: acquiring a real shot image of a power supply station to form a training chart set; inputting the training atlas into a YOLOv8 model for training to obtain an insulator prediction model; the YOLOv8 model comprises a backbone network, a neck network and a head network; the backbone network comprises a multi-scale convolution module, a C2f module and a global attention module; the training atlas is subjected to feature image reduction and channel quantity amplification through a multi-scale convolution module, features are extracted through a C2f module, and global information is captured through a global attention module; after extracting and integrating the features again through the neck network, outputting a detected target by the head network; and inputting the real-time acquired images into a trained insulator detection model to obtain an insulator target. The invention reduces the size and complexity of the model and improves the efficiency and accuracy of unmanned aerial vehicle inspection of the insulator while ensuring the detection precision.

Description

Insulator detection method based on target detection algorithm and attention mechanism
Technical Field
The invention relates to the technical field of image processing, in particular to an insulator detection method based on a target detection algorithm and an attention mechanism.
Background
In the case of an electrical power system, an insulator is a critical component of the device, the insulation wire and the pole tower play a role in preventing current leakage in the power line. Therefore, the state of the insulator directly affects the safe and stable operation of the power system. The detection of insulators, in particular to the detection of whether the insulators have defects, is an important link of power system fault prevention and fault diagnosis. Only if the position and the state of the insulator are accurately detected, the subsequent fault detection and fault prevention work can be effectively performed, so that the safe operation of the power system is ensured.
However, detection of insulators is a challenging task due to the complexity of the shape, size, color, and background environment in which the insulators are located. Especially in unmanned aerial vehicle inspection, because unmanned aerial vehicle's computational capacity and memory capacity are limited, have strict requirement to the size and the complexity of detection model, this degree of difficulty that has further increased the insulator and has detected. Although some methods based on deep learning have been proposed to solve this problem, these methods often have some problems, such as oversized model, high computational complexity, low detection accuracy, and the like. These problems limit the effectiveness of these methods in practical applications, particularly in unmanned aerial vehicle inspection insulators.
In addition, the current unmanned aerial vehicle inspection mode is to take down the hard disk carried by the unmanned aerial vehicle, and then import data into a computer for analysis in a studio. Although the detection precision can be ensured by the method, the efficiency is low, the time consumption is long, manual participation is required, and automation cannot be realized. Therefore, how to reduce the size and complexity of the model and improve the efficiency and accuracy of unmanned aerial vehicle inspection of the insulator while ensuring the detection precision is an important direction of current research.
Disclosure of Invention
The invention aims to ensure the detection precision, reduce the size and complexity of a model, improve the efficiency and accuracy of unmanned aerial vehicle inspection of an insulator, and provide an insulator detection method based on a target detection algorithm and an attention mechanism.
In order to achieve the above object, the embodiment of the present invention provides the following technical solutions:
an insulator detection method based on a target detection algorithm and an attention mechanism, the method comprises the following steps:
step 1, acquiring real shot images of a plurality of power supply stations, and forming a training chart set after image preprocessing;
step 2, inputting the training atlas into a YOLOv8 model for training, obtaining a trained insulator prediction model; the YOLOv8 model comprises a backbone network, a neck network and a head network; the main network comprises a multi-scale convolution module, a C2f module and a global attention module; the training atlas is subjected to feature image reduction and channel quantity amplification through a multi-scale convolution module, features are extracted through a C2f module, and global information is captured through a global attention module; after extracting and integrating the features again through the neck network, outputting a detected target by the head network;
and step 3, inputting the real-time acquired image into a trained insulator detection model to obtain an insulator target.
In the scheme, the YOLOv8 model used in the scheme introduces a global attention mechanism to capture more global information in the feature map.
The specific steps of the step 1 include: collecting a real shot image of a power supply station, scaling the real shot image in equal proportion, carrying out mirror image overturning on the image, adding noise square blocks with random sizes and preprocessing of color conversion, and carrying out manual labeling on insulators in the image to form a training image set.
In the scheme, the preprocessing operation on the image is helpful to improve the generalization capability of the model, so that the device can show good detection effect in various practical application scenes.
In the step 2, the backbone network includes a convolutional layer conv_1, a convolutional layer conv_2, a c2f_1 module, a convolutional layer conv_3, a c2f_2 module, a convolutional layer conv_4, a c2f_3 module, a convolutional layer conv_5, a c2f_4 module, a global attention module GAM, and a spatial pyramid module SPPF, which are sequentially connected; the scales of the 5 convolution layers Conv are P1/2, P2/4, P3/8, P4/16 and P5/32 respectively.
In the above scheme, the backbone network is mainly responsible for extracting the features of the image, wherein the convolution layer gradually reduces the size of the image, and increases the number of channels at the same time to extract richer features, the C2f module further extracts and integrates the features, and the C2f module comprises a convolution layer and a plurality of residual layers, so that richer features can be extracted, and meanwhile, the information of the original features is reserved. Finally, through the combined action of the global attention module GAM and the space pyramid module SPPF, the fusion of the multi-scale features is realized, and the complex relationship between the global space information of the feature map and different channels is successfully captured.
In the step 2, the neck network includes a convolutional layer conv_6, a convolutional layer 7, an upsampling layer upsample_1, an upsampling layer upsample_2, a c2f_5, a c2f_6, a c2f_7, a c2f_8, a fully connected layer concat_1, a fully connected layer concat_2, a fully connected layer concat_3, and a fully connected layer concat_4;
after the feature map F1 output by the space pyramid module SPPF passes through the up-sampling layer Upsample_1, the feature map F1 is fused with the feature map F2 output by the C2f_3 module at the full-connection layer Concat_1; the feature map output by the full-connection layer Concat_1 is subjected to feature extraction through a C2f_5 module and is subjected to up-sampling layer Upsample_2, and then is fused with the feature map F1 output by the C2f_3 module at the full-connection layer Concat_2; the feature map output by the full-connection layer Concat_2 is subjected to feature extraction through a C2f_6 module and is fused with the feature map output by the C2f_5 module at the full-connection layer Concat_3 after the feature map passes through a convolution layer Conv_6; after the feature map output by the full connection layer concat_3 is subjected to feature extraction by the C2f_7 module and is subjected to the convolution layer Conv_7, the feature map F1 output by the space pyramid module SPPF is fused at a full-connection layer Concat_4; the feature map output by the full connection layer concat_4 is subjected to feature extraction through a C2f_8 module.
In the scheme, the neck network is mainly responsible for finely extracting the characteristics and fusing the characteristic diagrams with different scales, wherein the characteristic diagrams amplify the size through an up-sampling layer Upsample so as to carry out fine target detection; fusing the feature graphs with different scales through a full-connection layer Concat to acquire richer feature information; and further extracting and integrating the features of the fused feature images through a C2f module.
In the step 2, the header network includes a target detection module Delect_1, a target detection module Delect_2, and a target detection module Delect_3;
the target detection module Delect_1 carries out target detection on the characteristics extracted by the C2f_6 module in the Neck network Neck, and a target 1 is output; the target detection module Delect_2 carries out target detection on the features extracted by the C2f_7 module, and a target 2 is output; the target detection module Delect_3 carries out target detection on the features extracted by the C2f_8 module, and outputs a target Delect_3; and finally, fusing the target 1, the target 2 and the target 3, and outputting a final insulator target.
In the above scheme, the head network generates the detection result of the target according to the feature extracted by the C2f module through the target detection module Delect, where the target detection module Delect includes a convolution layer and a Sigmoid activation function, and can convert the feature into the category and location information of the target.
The global attention module GAM is implemented by a global attention mechanism, and the global attention module GAM comprises a spatial attention module;
the spatial attention module is realized by a self-attention mechanism, processes the input feature map through a 1×1 convolution operation, and generates a query, a key and a value:
wherein Q is a query, K is a bond, and V is a value; x is an input feature map; w (W) q 、W k 、W v Respectively a weight matrix of inquiry, key and value;
by row matrix multiplication of the query and transpose of the key, an attention score matrix is obtained:
wherein S is an attention score; k (K) T Is the transpose of K;
applying a softmax function to the scores of each row such that the sum of the scores of each row is 1, resulting in a attention weight matrix:
wherein A is attention weight; softmax is a Softmax function for converting a real vector into a function of probability distribution, i.e. letting all elements be between 0-1 and the sum of all elements be 1, formulated asWherein z= (z) 1 ,...,z K ) Is a real vector of K dimensions, < >>Representation of0 the j element of the vector after Softmax function conversion;
matrix multiplication is carried out on the values and the attention weight matrix, and a weighted feature diagram is obtained:
wherein O is a weighted sum;
multiplying by an adaptively learned parameter gamma, and adding the input feature map, the spatial attention module finally outputs a spatial attention profile:
wherein,is a parameter gamma which can be learned; f (F) sa Is a spatial attention profile.
In the above scheme, the spatial attention module is realized by a self-attention mechanism, the self-attention mechanism is introduced to capture long-distance dependency relationship in the feature map, in the traditional convolutional neural network, information of one pixel point can only be transferred through a neighborhood of convolution kernel size, and the self-attention mechanism can enable the information of one pixel point to be transferred to any position of the feature map, so that more global information is captured. The parameter gamma plays a role in regulation, is a learnable parameter, and can be automatically regulated in the training process to control the influence degree of the self-attention mechanism on the final spatial attention distribution map. The self-attention mechanism has the main advantages that long-distance dependency in the feature map can be captured, in a traditional convolutional neural network, information of one pixel point can be transferred only through a neighborhood with the size of a convolution kernel, and the self-attention mechanism can transfer the information of one pixel point to any position of the feature map, so that more global information is captured.
The global attention module GAM further includes a channel attention module;
the channel attention module transposes and remodels the input feature map, and then performs nonlinear transformation through a first full connection layer and a ReLU activation function; the channel attention profile is then obtained by the second fully connected layer and Sigmoid activation function:
wherein F is ca Representing a channel attention profile;activating a function for Sigmoid, and compressing the value of the feature map to be between 0 and 1; w (W) 1 Weight of the first full connection layer, W 2 For the second whole the weight of the connection layer; reLU is a ReLU activation function used to enhance the nonlinearity of the model, and the formula is [ f (x) =max (0, x)]。
In the above scheme, two fully connected layers can provide more channel associated information.
The forward propagation process at the global attention module GAM is:
wherein Y represents a feature map of forward propagation output;multiplying the corresponding elements; residual is the Residual connection.
In the above scheme, the spatial attention profile and the channel attention profile are multiplied by the feature map input to the global attention module GAM element by element, so that the feature intensity of each position in the original feature map can be adjusted; a residual connection is then added to preserve some important information of the original feature map, preventing loss in the self-attention mechanism. The residual connection ensures that the performance of the network is at least not degraded by introducing a short circuit mechanism so that the input can be passed directly to the output.
The loss function diou_max of the YOLOv8 model is:
wherein IoU represents the Area of the intersection region of the prediction frame and the real frame divided by the Area of the Union region of the prediction frame and the real frame, area of overlay represents the Area of the intersection region of the prediction frame and the real frame, area of Union represents the Union region Area of the prediction frame and the real frame;representing the square of the distance between the predicted frame center point and the true frame center on the x-axis, +.>Representing the square of the distance between the predicted frame center point and the true frame center on the y-axis; b1 b1 x1 To predict the coordinates of frame left Bian Dingdian on the x-axis, b1 x2 B1 for predicting the coordinates of the right vertex of the frame on the x-axis y1 B1 for predicting the coordinates of the vertex of the upper edge of the frame on the y-axis y2 The coordinates of the vertex of the lower edge of the prediction frame on the y axis are obtained; b2 b2 x1 2 is the coordinate of the true frame left Bian Dingdian on the x-axis x2 B2 is the coordinate of the right vertex of the real frame on the x axis y1 B2 is the coordinate of the vertex of the upper edge of the real frame on the y axis y2 For the vertex of the lower edge of the real frame on the y axisCoordinates; c represents the diagonal length of the minimum closed matrix containing the predicted and real frames; l_ { DIoU_max } is the loss function of the YOLOv8 model.
In the scheme, not only the distance between the center points of the predicted frame and the real frame is considered, but also the diagonal length of the minimum closed frame containing the predicted frame and the real frame is considered, and the DIoU_max can better process the overlapping height of the predicted frame and the real frame, so that the performance of the YOLOv8 model is improved. Multiplying the coefficient 1.3 in the DIoU_max loss function can emphasize rewards for high overlap prediction, and the strategy effectively solves the problem of class imbalance, so that the accuracy of the YOLOv8 model in predicting positive samples is improved, and the overall detection performance is improved.
Compared with the prior art, the invention has the beneficial effects that:
the invention utilizes an improved object detection algorithm YOLOv8 model, firstly, preprocessing the real shot image, enhancing the number of the image and improving the generalization capability, and then training the YOLOv8 model by using the image to enable the model to identify possible insulator positions. Meanwhile, an attention mechanism and a bounding box regression loss function are introduced to the YOLOv8 model, so that the model can pay more attention to targets in images and balance positive and negative samples, and the detection index is improved.
Experimental results show that the performance of the method in insulator detection task is superior to that of the traditional method, and the method has no great influence on the aspects of training difficulty and model size, therefore, the invention provides an effective scheme for insulator detection in the power system.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, it being understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of a backbone network in the YOLOv8 model of the present invention;
FIG. 3 is a schematic diagram of a network structure of a neck network and a head network in a YOLOv8 model of the present invention;
FIG. 4 is a schematic diagram of a global attention module in a backbone network according to the present invention;
FIG. 5 is a schematic diagram of a network structure of a spatial attention module in a global attention module according to the present invention;
fig. 6 is a schematic diagram of a network structure of the C2f module of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. The components of the embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without making any inventive effort, are intended to be within the scope of the present invention.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Also, in the description of the present invention, the terms "first," "second," and the like are used merely to distinguish one from another, and are not to be construed as indicating or implying a relative importance or implying any actual such relationship or order between such entities or operations. In addition, the terms "connected," "coupled," and the like may be used to denote a direct connection between elements, or an indirect connection via other elements.
Example 1
The invention is realized by the following technical scheme, as shown in fig. 1, an insulator detection method based on a target detection algorithm and an attention mechanism comprises the following steps:
and step 1, acquiring real shot images of a plurality of power supply stations, and forming a training chart set after image preprocessing.
As an implementation manner, 1124 real-shot images of a certain power supply station 110V and 220V are collected, and the real-shot images are scaled to be 928 x 512 in size, which has the advantage of maximally preserving the information of the original images. The adjusted image is then subjected to a series of preprocessing operations including mirror-inversion, addition of noise squares of random size, color conversion, etc. The noise square blocks with random sizes are added to simulate various interferences in an actual environment, so that the model is more robust, and the color transformation can make the model insensitive to the color distribution of an image, thereby improving the generalization capability of the model. These operations are intended to allow the model to accommodate insulator detection under various environmental conditions, and UI color is insensitive. The preprocessing operation is helpful for improving the generalization capability of the model, so that the model can show good detection effect in various practical application scenes. After the image preprocessing, the number of images is increased to 2248 sheets, and a training image set is formed after the insulators in the images are manually marked.
Step 2, inputting a training atlas into a YOLOv8 model for training to obtain a trained insulator prediction model; the YOLOv8 model comprises a backbone network, a neck network and a head network; the main network comprises a multi-scale convolution module, a C2f module and a global attention module; the training atlas is subjected to feature image reduction and channel quantity amplification through a multi-scale convolution module, features are extracted through a C2f module, and global information is captured through a global attention module; after re-extracting and integrating the features via the neck network, the head network outputs the detected target.
After the mainstream target detection algorithm is evaluated, the method selects the YOLOv8 model, and finally decides to train by using the improved YOLOv8 model in consideration of the limitation of training resources and edge AI. The improved YOLOv8 model is mainly characterized in that the depth and the width of the model are smaller, and the maximum channel number is 1024, so that the size and the complexity of the model are lower, and the model is suitable for unmanned aerial vehicles with limited computing capacity and storage capacity.
Referring to fig. 2 and 3, the YOLOv8 model modified by the present invention includes a Backbone network Backbone, a Neck network neg, and a Head network Head.
Referring to fig. 2, the Backbone network backhaul is mainly responsible for extracting features of an image, and includes a plurality of convolution layers Conv with different scales, a plurality of C2f modules, a global attention module GAM, and a spatial pyramid module SPPF. In detail, referring to fig. 2, the Backbone network backhaul includes a convolution layer conv_1, a convolution layer conv_2, a convolution layer c2f_1, a convolution layer conv_3, a convolution layer c2f_2, a convolution layer conv_4, a convolution layer c2f_3, a convolution layer conv_5, a convolution layer c2f_4, a global attention module GAM, and a spatial pyramid module SPPF, which are sequentially connected. The scales of the 5 convolution layers Conv are P1/2, P2/4, P3/8, P4/16 and P5/32 respectively, and the convolution layers Conv can reduce the feature map and increase the channels.
In fig. 2, conv_1 is shorthand for a convolutional layer conv_1, c2f_1 is shorthand for a c2f_1 module, and tail "_1", "_2", etc. are only used for convenience of distinction, for example, the structures of the convolutional layers conv_1 and conv_2 are identical, for example, the structures of the c2f_1 module and the c2f_2 module are identical, and the other is the same; GAM is a shorthand for the global attention module GAM, the SPPF is shorthand of a spatial pyramid module SPPF; f1 is a feature map output by the spatial pyramid module SPPF, F2 is a feature map output by the c2f_3 module, and F3 is a feature map output by the c2f_2 module.
The training atlas gradually reduces the size of the image through the convolution layers Conv_1 and Conv_2, and increases the number of channels at the same time so as to extract richer features; the feature is further extracted and integrated through the C2f_1 module, and the C2f_1 module comprises a convolution layer and a residual layer which are connected, so that the richer feature can be extracted, and meanwhile, the information of the original feature is reserved. The Conv and the C2f modules of the subsequent convolution layers are the same principle, and finally the feature map output by the C2f_4 module is realized by a global attention module GAM and a space pyramid module SPPF under the combined action of the global attention module GAM, so that the fusion of multi-scale features is realized, and the complex relationship between the global space information of the feature map and different channels is successfully captured.
The C2f module (CSP Bottleneck with relations) comprises a convolution layer and a plurality of residual layers, please refer to fig. 6, which is a network structure diagram of the C2f module, wherein the input features are processed by a convolution layer ConvBNSiLU first, and then the processed features are divided into two parts by Split; one part is directly transmitted, and the other part is further extracted through n BottleNeck structures; finally, the features of the two parts are fused and output through a Concat operation (the dotted line in fig. 6 is the input of the Concat).
In fig. 6, bottlenegk_1 is a shorthand of the 1 st bottlenegk structure, bottlenegk_n is a shorthand of the n-th bottlenegk structure, and the tail "_1", "_2", etc. are only used for convenience of distinction, and the internal structures of the bottlenegk_1 and the bottlenegk_2 are identical. The BottleNeck structure is used to perform the 1*1 convolution operation.
The Neck network Neck is mainly responsible for finely extracting features and fusing feature graphs with different scales, and comprises a plurality of convolution layers Conv, a plurality of up-sampling layers Upsample, a plurality of C2f modules and a plurality of full-connection layers Concat, wherein the feature graphs amplify the size through the up-sampling layers Upsample so as to perform fine target detection; fusing the feature graphs with different scales through a full-connection layer Concat to acquire richer feature information; and further extracting and integrating the features of the fused feature images through a C2f module. In detail, referring to fig. 3, the Neck network Neck includes a convolutional layer conv_6, a convolutional layer conv_7, an upsampling layer upsample_1, an upsampling layer upsample_2, a c2f_5 module, a c2f_6 module, a c2f_7 module, a c2f_8 module, a fully connected layer concat_1, a fully connected layer concat_2, a fully connected layer concat_3, and a fully connected layer concat_4.
After the feature map F1 output by the space pyramid module SPPF in the Backbone network back passes through the up-sampling layer Upsample_1, the feature map F1 output by the space pyramid module SPPF is fused with the feature map F2 output by the C2f_3 module at the full-connection layer Concat_1; the feature map output by the full-connection layer Concat_1 is subjected to feature extraction through a C2f_5 module and is subjected to up-sampling layer Upsample_2, and then is fused with the feature map F3 output by the C2f_2 module at the full-connection layer Concat_2; the feature map output by the full-connection layer Concat_2 is subjected to feature extraction through a C2f_6 module and is fused with the feature map output by the C2f_5 module at the full-connection layer Concat_3 after the feature map passes through a convolution layer Conv_6; the feature map output by the full-connection layer Concat_3 is subjected to feature extraction through a C2f_7 module and is subjected to convolution layer Conv_7, and then is fused with the feature map F1 output by the space pyramid module SPPF at the full-connection layer Concat_4; the feature map output by the full connection layer concat_4 is subjected to feature extraction through a C2f_8 module.
In fig. 3, concat_1 is a shorthand of the full connection layer concat_1, upsample_1 is a shorthand of the upsampling layer upsample_1, c2f_5 is a shorthand of the c2f_5 module, and tail "_1", "_2", etc. are only used for convenience of distinction, for example, the structures of the full connection layer concat_1 and the full connection layer concat_2 are completely the same, and then, for example, the structures of the upsampling layer upsample_1 and the upsampling layer upsample_2 are completely the same, and other similarities are the same.
The Head network Head is mainly responsible for performing target detection according to features extracted by the Neck network Neck, please continue to refer to fig. 3, and includes 3 target detection modules, namely a target detection module delect_1, a target detection module delect_2, and a target detection module delect_3.
In fig. 3, delete_1 is a shorthand for the target detection module delete_1, delete_2 is a shorthand for the target detection module delete_2, and delete_3 is a shorthand for the target detection module delete_3.
The target detection module Delect_1 carries out target detection on the characteristics extracted by the C2f_6 module in the Neck network Neck, and a target 1 is output; the target detection module Delect_2 carries out target detection on the features extracted by the C2f_7 module, and a target 2 is output; the target detection module Delect_3 carries out target detection on the features extracted by the C2f_8 module, and outputs a target Delect_3; and finally, fusing the target 1, the target 2 and the target 3, and outputting a final insulator target.
The Head network Head generates a detection result of the target according to the characteristics extracted by the C2f module through a target detection module Delect, wherein the target detection module Delect comprises a convolution layer and a Sigmoid activation function, and can convert the characteristics into category and position information of the target.
Referring to fig. 4, the global attention module GAM in the Backbone network backhaul includes a spatial attention module and a channel attention module. Referring to fig. 5, the spatial attention module is implemented by a self-attention mechanism, and firstly, an input feature map is processed through a 1×1 convolution operation to generate a Query (Query), a Key (Key) and a Value (Value); then, carrying out row matrix multiplication on the inquiry and the transposition of the key to obtain an attention score matrix; then, applying a Softmax function to the scores of each row so that the sum of the scores of each row is 1, thereby obtaining an attention weight matrix; finally, the value and the attention weight matrix are subjected to matrix multiplication to obtain a weighted feature map, then the weighted feature map is multiplied by an adaptive learning parameter gamma, and the input feature map is added, so that the spatial attention module finally outputs a spatial attention distribution map.
In fig. 5, 1×1conv represents a 1×1 convolution operation, and Softmax represents a Softmax function.
The parameter gamma plays a role in regulation, is a learnable parameter, and can be automatically regulated in the training process to control the influence degree of the self-attention mechanism on the final spatial attention distribution map. The self-attention mechanism has the main advantages that long-distance dependency in the feature map can be captured, in a traditional convolutional neural network, information of one pixel point can be transferred only through a neighborhood with the size of a convolution kernel, and the self-attention mechanism can transfer the information of one pixel point to any position of the feature map, so that more global information is captured.
The operation flow of the spatial attention module is as follows:
wherein Q is Query, K is Key, and V is Value; x is an input feature map; w (W) q 、W k 、W v Respectively a weight matrix of inquiry, key and value; s is an attention score; k (K) T Is the transpose of K; a is attention weight; softmax is a Softmax function for converting a real vector into a function of probability distribution, i.e. letting all elements be between 0-1 and the sum of all elements be 1, formulated asWherein z= (z) 1 ,...,z K ) Is a real vector of K dimensions, < >>Representing the j-th element of the vector z after being converted by a Softmax function; o is a weighted sum; />Is a parameter gamma which can be learned; f (F) sa Is a spatial attention profile.
With continued reference to fig. 4, the channel attention module transposes and reshapes the input feature map and then performs a nonlinear transformation through the first full connection layer and the ReLU activation function; the channel attention profile is then derived by the second fully connected layer and Sigmoid activation function. Two fully connected layers may provide more channels of association information.
In fig. 4, reLU represents a ReLU activation function, sigmoid represents a Sigmoid activation function.
The operation flow of the channel attention module is as follows:
wherein F is ca Representing a channel attention profile;activating a function for Sigmoid, wherein the function is used for compressing the value of the feature map to be between 0 and 1 and representing the weight of attention; w (W) 1 Weight of the first full connection layer, W 2 Weights for the second fully connected layer; reLU is a ReLU activation function used to enhance the nonlinearity of the model, and the formula is [ f (x) =max (0, x)]。
During the forward propagation of the global attention module GAM, the spatial attention profile and the channel attention profile are multiplied element by element with the feature map input to the global attention module GAM, so that the feature intensity of each position in the original feature map can be adjusted; a residual connection is then added to preserve some important information of the original feature map, preventing loss in the self-attention mechanism. The residual connection ensures that the performance of the network is at least not degraded by introducing a short circuit mechanism so that the input can be passed directly to the output.
The operation flow of forward propagation is as follows:
wherein Y represents a feature map of forward propagation output;multiplying the corresponding elements; residual is the Residual connection.
In the training process of the YOLOv8 model, a diou_max loss function is introduced, which is an improved IoU loss function, and after improvement, the distance between the predicted frame and the center point of the real frame is considered, and the diagonal length of the minimum closed frame comprising the predicted frame and the real frame is also considered. Specifically, the scheme calculates the distance between the center points of the predicted frame and the real frame on the x axis and the y axis, then takes the maximum value of the distance, divides the maximum value by the length of the diagonal line of the minimum closed frame, and obtains a value for measuring the offset of the center point; finally, this value is subtracted from IoU to yield diou_max. Diou_max is better able to handle the overlap height of those predicted and real frames than conventional IoU, with a larger position offset, thereby improving the performance of the YOLOv8 model.
The diou_max loss function is:
wherein IoU represents the Area of the intersection region of the prediction frame and the real frame divided by the Area of the Union region of the prediction frame and the real frame, area of overlay represents the Area of the intersection region of the prediction frame and the real frame, area of Union represents the Union region Area of the prediction frame and the real frame;representing the square of the distance between the predicted frame center point and the true frame center on the x-axis, +.>Representing the square of the distance between the predicted frame center point and the true frame center on the y-axis; b1 b1 x1 To predict the coordinates of frame left Bian Dingdian on the x-axis, b1 x2 B1 for predicting the coordinates of the right vertex of the frame on the x-axis y1 B1 for predicting the coordinates of the vertex of the upper edge of the frame on the y-axis y2 The coordinates of the vertex of the lower edge of the prediction frame on the y axis are obtained; b2 b2 x1 2 is the coordinate of the true frame left Bian Dingdian on the x-axis x2 B2 is the coordinate of the right vertex of the real frame on the x axis y1 B2 is the coordinate of the vertex of the upper edge of the real frame on the y axis y2 The coordinates of the vertex of the lower edge of the real frame on the y axis; c represents the diagonal length of the minimum closed matrix containing the predicted and real frames; l_ { DIoU_max } is the loss function of the YOLOv8 model.
In a training set of images, where the number of pixels of the negative sample (i.e., background or non-target object) is much greater than that of the positive sample (i.e., target object), this imbalance may cause the YOLOv8 model to bias toward predicting the negative sample, thereby disregarding the positive sample and reducing the accuracy of the detection. Therefore, the improved DIoU_max loss function multiplied by the coefficient of 1.3 can emphasize rewards for high overlap prediction, and the strategy effectively solves the problem of unbalanced categories, so that the accuracy of the YOLOv8 model in predicting positive samples is improved, and the overall detection performance is improved.
And step 3, inputting the real-time acquired image into a trained insulator detection model to obtain an insulator target.
Example 2
In this example, experimental verification is performed on the basis of the above-mentioned example 1, and for the performance comparison of the YOLOv8 model of the present invention with other conventional models, accuracy comparison is performed in five dimensions of accuracy, recall, F1, mAP 50-95. In addition, as the YOLOv8 model aims at reducing the size and training difficulty of the model, the model is also compared on four indexes of training video memory, weight size, training speed and prediction speed.
The calculation formula of F1 is as follows:
precision represents precision, and recovery represents recall; the formula of precision is expressed as:
wherein True Positive represents the number of samples that are actually Positive and predicted Positive, false Positive represents the number of samples that are actually negative but predicted Positive;
the calculation formula of the recall rate recovery is as follows:
wherein False positive represents the number of samples that are actually Negative and predicted Negative.
mAP50: ioU is greater than 0.5.
mAP50-95: ioU threshold values are from 0.5 to 0.95, with values taken every 0.05, the average value of the mAP below these threshold values.
Table 1 is a comparison of the performance of each model, optimizer sgd, mixing accuracy, loss function IoU (Yolov 8 of the present invention is a modified DIoU_max loss function), batch size 16, training times 300, image size 300 x 300.
Table 1 comparison of the performance of the various models
In table 1, fasterRCNN is a fast target detection algorithm based on convolutional neural network, SSD is a single-stage target detector, YOLOv8 is an image segmentation model, and the scheme of the invention is improved based on YOLOv 8.
Wherein FasterRCNN, SSD was not tested for mAP 50-95. Obviously, compared with FasterRCNN, SSD, the YOLOv8 has obvious improvement on various indexes, and when the improved YOLOv8 of the invention introduces a self-attention mechanism, the recall rate is slightly reduced by 0.05%, and the accuracy, mAP50 and mAP50-95 are improved.
The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims (5)

1. The insulator detection method based on the target detection algorithm and the attention mechanism is characterized by comprising the following steps of: the method comprises the following steps:
step 1, acquiring real shot images of a plurality of power supply stations, and forming a training chart set after image preprocessing;
step 2, inputting a training atlas into a YOLOv8 model for training to obtain a trained insulator prediction model; the YOLOv8 model comprises a backbone network, a neck network and a head network; the main network comprises a multi-scale convolution module, a C2f module and a global attention module; the training atlas is subjected to feature image reduction and channel quantity amplification through a multi-scale convolution module, features are extracted through a C2f module, and global information is captured through a global attention module; after extracting and integrating the features again through the neck network, outputting a detected target by the head network;
in the step 2, the backbone network includes a convolutional layer conv_1, a convolutional layer conv_2, a c2f_1 module, a convolutional layer conv_3, a c2f_2 module, a convolutional layer conv_4, a c2f_3 module, a convolutional layer conv_5, a c2f_4 module, a global attention module GAM, and a spatial pyramid module SPPF, which are sequentially connected; the scales of the 5 convolution layers Conv are P1/2, P2/4, P3/8, P4/16 and P5/32 respectively;
the global attention module GAM is implemented by a global attention mechanism, and the global attention module GAM comprises a spatial attention module;
the spatial attention module is realized by a self-attention mechanism, processes the input feature map through a 1×1 convolution operation, and generates a query, a key and a value:
wherein Q is a query, K is a bond, and V is a value; x is an input feature map; w (W) q 、W k 、W v Respectively a weight matrix of inquiry, key and value;
by row matrix multiplication of the query and transpose of the key, an attention score matrix is obtained:
wherein S is an attention score; k (K) T Is the transpose of K;
applying a softmax function to the scores of each row such that the sum of the scores of each row is 1, resulting in a attention weight matrix:
wherein A is attention weight; softmax is a Softmax function for converting a real vector into a function of probability distribution, i.e. letting all elements be between 0-1 and the sum of all elements be 1, formulated asWherein z= (z) 1 ,...,z K ) Is a real vector of K dimensions, < >>Representing the j-th element of the vector z after being converted by a Softmax function;
matrix multiplication is carried out on the values and the attention weight matrix, and a weighted feature diagram is obtained:
wherein O is a weighted sum;
multiplying by an adaptively learned parameter gamma, and adding the input feature map, the spatial attention module finally outputs a spatial attention profile:
wherein,is a parameter gamma which can be learned; f (F) sa Is a spatial attention profile;
the global attention module GAM further includes a channel attention module;
the channel attention module transposes and remodels the input feature map, and then performs nonlinear transformation through a first full connection layer and a ReLU activation function; the channel attention profile is then obtained by the second fully connected layer and Sigmoid activation function:
wherein F is ca Representing a channel attention profile;activating a function for Sigmoid, and compressing the value of the feature map to be between 0 and 1; w (W) 1 Weight of the first full connection layer, W 2 Weights for the second fully connected layer; reLU is a ReLU activation function used to augment non-modelThe linearity is given by [ f (x) =max (0, x)];
The forward propagation process at the global attention module GAM is:
wherein Y represents a feature map of forward propagation output;multiplying the corresponding elements; residual is the Residual connection;
and step 3, inputting the real-time acquired image into a trained insulator detection model to obtain an insulator target.
2. The insulator detection method based on the target detection algorithm and the attention mechanism according to claim 1, wherein: in the step 2, the neck network includes a convolutional layer conv_6, a convolutional layer conv_7, an up-sampling layer upsample_1, an up-sampling layer upsample_2, a c2f_5, a c2f_6, a c2f_7, a c2f_8, a full connection layer concat_1, a full connection layer concat_2, a full connection layer concat_3, and a full connection layer concat_4;
after the feature map F1 output by the space pyramid module SPPF passes through the up-sampling layer Upsample_1, the feature map F1 is fused with the feature map F2 output by the C2f_3 module at the full-connection layer Concat_1; the feature map output by the full-connection layer Concat_1 is subjected to feature extraction through a C2f_5 module and is subjected to up-sampling layer Upsample_2, and then is fused with the feature map F3 output by the C2f_2 module at the full-connection layer Concat_2; the feature map output by the full-connection layer Concat_2 is subjected to feature extraction through a C2f_6 module and is fused with the feature map output by the C2f_5 module at the full-connection layer Concat_3 after the feature map passes through a convolution layer Conv_6; the feature map output by the full-connection layer Concat_3 is subjected to feature extraction through a C2f_7 module and is subjected to convolution layer Conv_7, and then is fused with the feature map F1 output by the space pyramid module SPPF at the full-connection layer Concat_4; the feature map output by the full connection layer concat_4 is subjected to feature extraction through a C2f_8 module.
3. The insulator detection method based on the target detection algorithm and the attention mechanism according to claim 2, wherein: in the step 2, the header network includes a target detection module Delect_1, a target detection module Delect_2, and a target detection module Delect_3;
the target detection module Delect_1 carries out target detection on the characteristics extracted by the C2f_6 module in the Neck network Neck, and a target 1 is output; the target detection module Delect_2 carries out target detection on the features extracted by the C2f_7 module, and a target 2 is output; the target detection module Delect_3 carries out target detection on the features extracted by the C2f_8 module, and outputs a target Delect_3; and finally, fusing the target 1, the target 2 and the target 3, and outputting a final insulator target.
4. The insulator detection method based on the target detection algorithm and the attention mechanism according to claim 1, wherein: the loss function diou_max of the YOLOv8 model is:
wherein IoU denotes an Area of an intersection region of the prediction frame and the real frame divided by an Area of a union region of the prediction frame and the real frame, areaof overlay represents the Area of the intersection Area of the prediction frame and the real frame, and Area of Union represents the Area of the Union Area of the prediction frame and the real frame;representing the square of the distance in the x-axis between the predicted box center point and the true box center,representing the square of the distance between the predicted frame center point and the true frame center on the y-axis; b1 b1 x1 To predict the coordinates of frame left Bian Dingdian on the x-axis, b1 x2 B1 for predicting the coordinates of the right vertex of the frame on the x-axis y1 B1 for predicting the coordinates of the vertex of the upper edge of the frame on the y-axis y2 The coordinates of the vertex of the lower edge of the prediction frame on the y axis are obtained; b2 b2 x1 2 is the coordinate of the true frame left Bian Dingdian on the x-axis x2 B2 is the coordinate of the right vertex of the real frame on the x axis y1 B2 is the coordinate of the vertex of the upper edge of the real frame on the y axis y2 The coordinates of the vertex of the lower edge of the real frame on the y axis; c represents the diagonal length of the minimum closed matrix containing the predicted and real frames; l_ { DIoU_max } is the loss function of the YOLOv8 model.
5. The insulator detection method based on the target detection algorithm and the attention mechanism according to claim 1, wherein: the specific steps of the step 1 include: collecting a real shot image of a power supply station, scaling the real shot image in equal proportion, carrying out mirror image overturning on the image, adding noise square blocks with random sizes and preprocessing of color conversion, and carrying out manual labeling on insulators in the image to form a training image set.
CN202311163428.3A 2023-09-11 2023-09-11 Insulator detection method based on target detection algorithm and attention mechanism Active CN116895030B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311163428.3A CN116895030B (en) 2023-09-11 2023-09-11 Insulator detection method based on target detection algorithm and attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311163428.3A CN116895030B (en) 2023-09-11 2023-09-11 Insulator detection method based on target detection algorithm and attention mechanism

Publications (2)

Publication Number Publication Date
CN116895030A CN116895030A (en) 2023-10-17
CN116895030B true CN116895030B (en) 2023-11-17

Family

ID=88313843

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311163428.3A Active CN116895030B (en) 2023-09-11 2023-09-11 Insulator detection method based on target detection algorithm and attention mechanism

Country Status (1)

Country Link
CN (1) CN116895030B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117173542B (en) * 2023-10-26 2024-05-28 山东易图信息技术有限公司 Method and system for detecting and optimizing water floaters based on YOLOV model
CN117765373B (en) * 2024-02-22 2024-05-14 山东大学 Lightweight road crack detection method and system with self-adaptive crack size

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183414A (en) * 2020-09-29 2021-01-05 南京信息工程大学 Weak supervision remote sensing target detection method based on mixed hole convolution
CN113486897A (en) * 2021-07-29 2021-10-08 辽宁工程技术大学 Semantic segmentation method for convolution attention mechanism up-sampling decoding
CN113762081A (en) * 2021-08-09 2021-12-07 江苏大学 Granary pest detection method based on YOLOv5s
CN114842503A (en) * 2022-04-18 2022-08-02 南京理工大学 Helmet detection method based on YOLOv5 network
CN115272828A (en) * 2022-08-11 2022-11-01 河南省农业科学院农业经济与信息研究所 Intensive target detection model training method based on attention mechanism
CN115578626A (en) * 2022-07-07 2023-01-06 福州大学 Multi-scale image tampering detection method based on mixed attention mechanism
CN115588165A (en) * 2022-10-26 2023-01-10 国网重庆市电力公司建设分公司 Tunnel worker safety helmet detection and face recognition method
CN115631411A (en) * 2022-09-28 2023-01-20 西安工程大学 Method for detecting damage of insulator in different environments based on STEN network
CN115661607A (en) * 2022-09-29 2023-01-31 河南云巡智能科技研究院有限公司 Small target identification method based on improved YOLOv5
CN115880660A (en) * 2022-12-27 2023-03-31 南京邮电大学 Track line detection method and system based on structural characterization and global attention mechanism
CN116152342A (en) * 2023-03-10 2023-05-23 山东大学 Guideboard registration positioning method based on gradient
CN116630798A (en) * 2023-05-16 2023-08-22 上海交通大学 SAR image aircraft target detection method based on improved YOLOv5

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CA3016953A1 (en) * 2017-09-07 2019-03-07 Comcast Cable Communications, Llc Relevant motion detection in video
CN112287978B (en) * 2020-10-07 2022-04-15 武汉大学 Hyperspectral remote sensing image classification method based on self-attention context network
US11568543B2 (en) * 2021-03-10 2023-01-31 Western Digital Technologies, Inc. Attention masks in neural network video processing
CN114202696B (en) * 2021-12-15 2023-01-24 安徽大学 SAR target detection method and device based on context vision and storage medium

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112183414A (en) * 2020-09-29 2021-01-05 南京信息工程大学 Weak supervision remote sensing target detection method based on mixed hole convolution
CN113486897A (en) * 2021-07-29 2021-10-08 辽宁工程技术大学 Semantic segmentation method for convolution attention mechanism up-sampling decoding
CN113762081A (en) * 2021-08-09 2021-12-07 江苏大学 Granary pest detection method based on YOLOv5s
CN114842503A (en) * 2022-04-18 2022-08-02 南京理工大学 Helmet detection method based on YOLOv5 network
CN115578626A (en) * 2022-07-07 2023-01-06 福州大学 Multi-scale image tampering detection method based on mixed attention mechanism
CN115272828A (en) * 2022-08-11 2022-11-01 河南省农业科学院农业经济与信息研究所 Intensive target detection model training method based on attention mechanism
CN115631411A (en) * 2022-09-28 2023-01-20 西安工程大学 Method for detecting damage of insulator in different environments based on STEN network
CN115661607A (en) * 2022-09-29 2023-01-31 河南云巡智能科技研究院有限公司 Small target identification method based on improved YOLOv5
CN115588165A (en) * 2022-10-26 2023-01-10 国网重庆市电力公司建设分公司 Tunnel worker safety helmet detection and face recognition method
CN115880660A (en) * 2022-12-27 2023-03-31 南京邮电大学 Track line detection method and system based on structural characterization and global attention mechanism
CN116152342A (en) * 2023-03-10 2023-05-23 山东大学 Guideboard registration positioning method based on gradient
CN116630798A (en) * 2023-05-16 2023-08-22 上海交通大学 SAR image aircraft target detection method based on improved YOLOv5

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
Distance-IoU Loss: Faster and Better Learning for Bounding Box Regression;Shuangqing Zhang等;《The Thirty-Fourth AAAI Conference on Artificial Intelligence (AAAI-20)》;第34卷(第7期);第12993-13000页 *
SAR图像舰船目标快速检测识别技术;张潋钟;《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》(第1期);第C036-196页 *
UAV-YOLOv8: A Small-Object-Detection Model Based on Improved YOLOv8 for UAV Aerial Photography Scenarios;Gang Wang等;《sensors》;第23卷(第16期);第1-27页 *
Word self-update contrastive adversarial networks for text-to-image synthesis;Jian Xiao等;《Neural Networks》;第167卷;第433-444页 *
基于YOLOv8的柑橘病虫害识别系统研究与设计;高伟锋;《智慧农业导刊》;第3卷(第15期);第27-30页 *
改进YOLOv5s的煤矿烟火检测算法;刘春霞等;《计算机工程与应用》;第59卷(第17期);第286-294页 *

Also Published As

Publication number Publication date
CN116895030A (en) 2023-10-17

Similar Documents

Publication Publication Date Title
CN111259930B (en) General target detection method of self-adaptive attention guidance mechanism
CN108961235B (en) Defective insulator identification method based on YOLOv3 network and particle filter algorithm
CN116895030B (en) Insulator detection method based on target detection algorithm and attention mechanism
CN109543606B (en) Human face recognition method with attention mechanism
CN111179217A (en) Attention mechanism-based remote sensing image multi-scale target detection method
CN112070729B (en) Anchor-free remote sensing image target detection method and system based on scene enhancement
CN111461083A (en) Rapid vehicle detection method based on deep learning
CN113052109A (en) 3D target detection system and 3D target detection method thereof
CN113159120A (en) Contraband detection method based on multi-scale cross-image weak supervision learning
CN111612017A (en) Target detection method based on information enhancement
CN111310609B (en) Video target detection method based on time sequence information and local feature similarity
CN112884758B (en) Defect insulator sample generation method and system based on style migration method
CN111353544A (en) Improved Mixed Pooling-Yolov 3-based target detection method
CN111797841A (en) Visual saliency detection method based on depth residual error network
CN113743505A (en) Improved SSD target detection method based on self-attention and feature fusion
CN111738114A (en) Vehicle target detection method based on anchor-free accurate sampling remote sensing image
CN115171074A (en) Vehicle target identification method based on multi-scale yolo algorithm
CN111985488B (en) Target detection segmentation method and system based on offline Gaussian model
CN112837281A (en) Pin defect identification method, device and equipment based on cascade convolutional neural network
CN113962332B (en) Salient target identification method based on self-optimizing fusion feedback
CN115331081A (en) Image target detection method and device
Pang et al. PTRSegNet: A Patch-to-Region Bottom-Up Pyramid Framework for the Semantic Segmentation of Large-Format Remote Sensing Images
CN111950586B (en) Target detection method for introducing bidirectional attention
Li et al. Focus on local: transmission line defect detection via feature refinement
Li et al. Object recognition for augmented reality applications

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant