CN115187921A - Power transmission channel smoke detection method based on improved YOLOv3 - Google Patents

Power transmission channel smoke detection method based on improved YOLOv3 Download PDF

Info

Publication number
CN115187921A
CN115187921A CN202210519144.2A CN202210519144A CN115187921A CN 115187921 A CN115187921 A CN 115187921A CN 202210519144 A CN202210519144 A CN 202210519144A CN 115187921 A CN115187921 A CN 115187921A
Authority
CN
China
Prior art keywords
feature
multiplied
channel
smoke
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210519144.2A
Other languages
Chinese (zh)
Inventor
吴玉香
郑浩东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202210519144.2A priority Critical patent/CN115187921A/en
Publication of CN115187921A publication Critical patent/CN115187921A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/762Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
    • G06V10/763Non-hierarchical techniques, e.g. based on statistics of modelling distributions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a power transmission channel smoke detection method based on improved YOLOv3, which comprises the steps of firstly, replacing a traditional k-means algorithm with an improved k-means + + clustering algorithm, reducing clustering result deviation, and redefining a prior frame; secondly, designing a focus module to improve a feature extraction network, and improving the problem of target feature loss caused by multiple downsampling; then, designing an attention mechanism DAM combined with the cavity convolution as a feature enhancement module to be integrated into a backbone network, and then sending the features into the FPN for multi-scale fusion; and finally, adding a large-scale YOLO detection head in the detection network for feature fusion, and adding SPP behind each YOLO detection head to avoid the reduction of the feature extraction effect due to the increase of the network depth. Compared with the existing YOLOv3 algorithm, the method can effectively improve the accuracy of smoke detection of the power transmission channel on the premise of ensuring the detection speed, and has very important significance for maintaining the safety of a power grid system.

Description

Power transmission channel smoke detection method based on improved YOLOv3
Technical Field
The invention relates to the technical field of computer vision and target detection, in particular to a transmission channel smoke detection method based on improved YOLOv 3.
Background
When a fire disaster happens near a power transmission channel of a power grid, the insulation clearance of a power transmission line is extremely easy to reduce to induce the tripping of the power transmission line, and further, a large-range long-time power failure accident is caused. Therefore, the fire monitoring is carried out on the power transmission channel, the real-time performance and the accuracy of the fire monitoring are improved, and the reliable operation of the power grid is effectively guaranteed.
The detection of the fire in the power transmission channel can be realized through smoke and flame, but once the fire happens, serious property loss and safety risk are brought to the power transmission channel. In the early stage of a fire, smoke is generated and scattered continuously, and if the smoke can be detected in time and effectively suppressed in this stage, property loss and social influence can be minimized, so that the early-stage smoke detection is very important. However, in the smoke detection task of the power transmission channel, firstly, part of targets are low in resolution, less in information and more in noise; secondly, the smoke form is not fixed and can be diffused randomly, and more targets like smoke exist in a smoke site, so that the positioning and the detection of real smoke are not facilitated; finally, the detection effect on the smoke is often poor due to the insufficient extraction of the backbone network features of the depth model, the insufficient semantic information of the shallow convolutional layer and other factors.
In summary, power transmission channel smoke detection based on deep learning is still a challenging issue. Therefore, it is a problem to be solved in the prior art to improve robustness and accuracy for smoke detection of a power transmission channel in a target detection task.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a power transmission channel smoke detection method based on improved YOLOv3, which can effectively detect the smoke of a power transmission channel, accelerate the convergence speed of training, improve the detection precision on the premise that the detection speed is not influenced, and solve the problem of difficult detection caused by unfixed target form, insufficient deep neural network characteristic extraction and the like in the current power transmission channel smoke detection.
In order to realize the purpose, the technical scheme provided by the invention is as follows: a power transmission channel smoke detection method based on improved YOLOv3 comprises the following steps:
s1, amplifying a collected sample set by using a data enhancement means, labeling smoke to be detected of each image in the amplified sample set to obtain a labeled sample set, and dividing the labeled sample set into a training set and a testing set;
s2, clustering the training set by using an improved k-means + + clustering algorithm to obtain a prior frame, and setting the prior frames with different scales to have a higher probability of having the prior frame with good matching degree to the target, so that the model is easier to learn; the principle that the improved k-means + + clustering algorithm selects the clustering centers is as follows: the larger the shortest distance between the point and the current existing clustering center is, the larger the probability of the selected clustering center is, so that the problem of unstable algorithm caused by randomly selecting the initial clustering center is solved;
s3, training the training set by using the improved YOLOv3, and learning model parameters according to a gradient descent method by using the deviation between the prior frame and the marked real frame to obtain an optimal model; wherein the improved YOLOv3 comprises: firstly, designing a focus Module to be added into an original backbone network Darknet-53, improving the problem of target feature loss caused by multiple downsampling, secondly, designing a constrained Attention Module (DAM) combined with hole convolution as a feature enhancement Module to be integrated into the backbone network, further enhancing the feature extraction capability of the backbone network, sending the features into a FPN for multi-scale fusion, then improving a detection network of YOLOv3, adding a YOLO detection head with the scale of 104 x 104 into the detection network for feature fusion, and adding SPP behind each YOLO detection head, so that the feature extraction effect is prevented from being reduced due to the increase of the network depth;
and S4, inputting the test set into the optimal model, and marking the smoke of the power transmission channel to be detected in a rectangular frame form.
Further, the step S1 includes:
s11, collecting an image sample set containing smoke to be detected: collecting images containing smoke to be detected in a scene of a plurality of power transmission channels to form an image sample set;
s12, expanding the sample set by adopting a data enhancement means, wherein the specific method comprises the following steps: copying and pasting smoke to be detected in a sample, and enhancing the position diversity of the smoke, so that the number of anchor frames matched with the smoke is increased, the training weight of the smoke is increased, and in addition, angle rotation, horizontal turnover, gaussian noise, brightness conversion and saturation adjustment are also included;
s13, labeling the smoke to be detected in each image by using image labeling software labelImg, wherein a labeled area is a positive sample, an unlabeled area is a negative sample, and corresponding category and position information is stored in an xml file; and the whole sample set is divided into a training set and a testing set according to the proportion.
Further, in step S2, using an improved k-means + + clustering algorithm to cluster the size of the anchor frame in the training set, and obtaining the prior frame includes:
s21, randomly selecting a point from the training set as a first clustering center;
s22, calculating the shortest distance between each point in the training set and the current existing cluster center, namely the distance between each point and the nearest cluster center, and using the Euclidean distance square as the distance d (x) between samples i ,x j ) That is to say have
Figure BDA0003642557670000031
Wherein x i 、x j Represents the ith and jth samples;
s23, selecting a new data point as a new clustering center, wherein the selection principle is as follows: the larger the shortest distance between the point and the current existing clustering center is, the larger the probability of the selected point as the clustering center is;
s24, repeating S22 and S23 until k clustering centers are selected;
s25, calculating the distance between the residual samples and each clustering center, finding the clustering center with the closest sample distance, and dividing the sample into corresponding clusters;
s26, moving the clustering center to the center of the cluster group belonging to the clustering center, and solving the center by directly determining the center as the average value of each coordinate of the cluster group;
and S27, repeating S25 and S26 until the cluster center does not move any more.
Further, in step S3, the training set is trained by using the improved YOLOv3, and the process of obtaining the optimal model is as follows:
s31, improving a backbone network of YOLOv3, designing a focus module to be added into an original backbone network Darknet-53, and adopting the following principle: five times of downsampling are carried out on a backbone network Darknet-53 of YOLOv3 to obtain three feature images for training, however, the downsampling operation can cause the characteristic value of a morpheme of an image to be lost, so that a target is difficult to detect, in order to avoid losing information, before the image enters the backbone network, a focus module can carry out slicing operation on the image, the specific operation is to take a value in every other pixel in one image, so that four complementary images can be obtained, the complementary images are stacked in the channel direction, namely the width and height information of the image is concentrated into a channel space, an input channel is expanded by four times, the spliced image is changed into twelve channels compared with an original RGB three-channel mode, then the obtained new image is subjected to convolution operation, and finally a double downsampling feature image is obtained on the premise of no information is lost;
s32, improving the feature extraction capability of YOLOv3, designing an attention mechanism DAM combined with cavity convolution as a feature enhancement module to be integrated into a backbone network, wherein the DAM mainly comprises three parts: the cavity module, the channel attention submodule and the space attention submodule are implemented by the following processes:
s321, a hole module is formed by two continuous hole blocks, each hole block is similar to a bottleneck layer in structure and is formed by three layers of convolutions, and the sizes of kernels are 1 x 1, 3 x 3 and 1 x 1 respectively, the difference is that the hole module sets an expansion rate n for a 3 x 3 convolution layer, the expansion rate represents the distance between inner values of convolution kernels, namely, a 0 value is inserted into each interval n between continuous filter values, the kernel size is enlarged under the condition of not increasing the number of parameters and the calculation cost, specifically, the expanded 3 x 3 convolution and a standard convolution with the convolution kernel size of 3+2 x (n-1) have the same sensing field, and a batch processing normalization layer and an activation function are sequentially arranged between the two convolution layers;
s322, the specific process of the channel attention submodule implementation is as follows: respectively performing global maximum pooling and global average pooling on the input feature map F with the size of H multiplied by W multiplied by C to obtain two feature maps of 1 multiplied by C; then, inputting the two MLPs into a two-layer MLP respectively, wherein the number of neurons in the first layer is C/r, r is the attenuation rate, the number of neurons in the second layer is C, and the parameters of the two MLPs are shared; then, carrying out addition operation based on element-wise on the two characteristics output by the MLP, and generating a weight coefficient Mc through a sigmoid activation function; finally, performing element-wise multiplication operation on the weighting coefficient Mc and the input feature graph F to generate input features required by the space attention submodule;
s323, the specific process of the space attention submodule implementation is as follows: taking a feature map output by a channel attention submodule as an input feature map of a space attention submodule, and firstly, obtaining two H multiplied by W multiplied by 1 feature maps through global maximum pooling and global average pooling based on a channel; then the two characteristic graphs are subjected to channel splicing; then, after 7 multiplied by 7 convolution operation, the dimension is reduced to 1 channel number, namely H multiplied by W multiplied by 1; then generating a weight coefficient Ms through a sigmoid activation function; finally, multiplying the weight coefficient Ms and the input feature diagram of the spatial attention submodule based on element-wise to obtain a finally generated feature;
s324, integrating the cavity module, the channel attention submodule and the space attention submodule into a DAM, and integrating the DAM into a backbone network as a feature enhancement module;
s33, improving a detection network of YOLOv3, wherein the flow is as follows:
s331, two layers of residual error networks are added before the improved YOLOv3 network is detected, the residual error networks can effectively solve the problem of difficult convergence caused by too deep networks, and the problems of gradient attenuation and network degradation in a deep learning network model are avoided;
s332 and Darknet-53 obtain feature maps with three scales of 13 multiplied by 13, 26 multiplied by 26 and 52 multiplied by 52 respectively in the process of multiple down sampling, and then fuse the feature maps with the three scales to ensure that the model has the capability of detecting targets with different sizes, in order to enhance the feature extraction capability of the model to the targets, a feature map with the scale of 104 multiplied by 104 is added on the original detection output layer, namely the feature map with the scale of 13 multiplied by 13, 26 multiplied by 26, 52 multiplied by 52 and 104 multiplied by 104, the feature map with the scale of 52 multiplied by 52 is firstly subjected to 1 multiplied by 1 convolution kernel operation, then the output scale is converted into 104 multiplied by 104 after the feature map is sampled on the feature map, and finally, shallow layer high resolution information and deep layer high semantic information are effectively utilized through channel splicing and fusion with the newly added feature scale;
s333, adding a spatial pyramid pooling SPP between the fifth convolution layer and the sixth convolution layer in front of each YOLO detection head to optimize feature extraction and avoid the feature extraction effect from being reduced along with the increase of network depth, wherein the SPP is composed of four parallel branches, namely, the maximum pooling with convolution kernels of 5 × 5, 9 × 9 and 13 × 13 and a jump connection are respectively performed, namely, the maximum pooling operations of 5 × 5, 9 × 9 and 13 × 13 are performed on the detection graph subjected to feature scale extraction, the pooling step length is 1, the feature graph subjected to the pooling step is further subjected to the concatenate operation with the input feature graph of the SPP, the outputs of 4 branches are spliced on the dimension of a channel to obtain a new feature graph, and the newly obtained feature graph finally passes through one convolution layer to obtain the number of channels of the original features;
s34, training the amplified and labeled training set by using the improved YOLOv3, and specifically comprising the following steps: and (3) sending the amplified and labeled training set into an improved YOLOv3 for training, learning model parameters according to a gradient descent method by using the deviation between a prior frame and a labeled real frame in the training process, keeping the model parameters floating up and down at a certain value all the time when a loss function is converged, stopping training, and forming an optimal model by the network structure and the weight file at the moment.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. an improved k-means + + clustering algorithm is used for replacing a traditional k-means algorithm to generate a prior frame, so that clustering result deviation caused by randomness is reduced, and the scale of the prior frame is closer to a real data set.
2. A focus module and a DAM (direct memory access) mechanism combined with cavity convolution are designed and introduced into a feature extraction network, so that the problem of target feature loss caused by multiple downsampling can be solved, meanwhile, the focus mechanism can better focus on an interested area, the positioning precision of a target is improved, and the smoke detection error rate is reduced.
3. A large-scale YOLO detection head is added in a detection network for feature fusion, more shallow high-resolution detail information and deep high-semantic information can be contained, and SPP is added behind each YOLO detection head to avoid the reduction of feature extraction effect due to the increase of network depth.
4. Compared with the existing YOLOv3 algorithm, the method has the advantages that the convergence rate of the model in the training process is increased, the accuracy of smoke detection of the power transmission channel is effectively improved on the premise of ensuring the detection speed, and the method has very important significance for maintaining the safety of a power grid system.
Drawings
Fig. 1 is a schematic diagram of the focus module principle and structure, where Input Image is the Input Image, slice is the Slice operation, concat is the stitching operation in the channel direction, and Conv3 × 3 is a convolution kernel of size 3 × 3.
FIG. 2 is a diagram of an attention mechanism DAM module combined with hole convolution and its components, where Input feature is the Input feature of the image, scaled block is the hole block, channel attention is the Channel attention, spatial attention is the Spatial attention, referred feature is the output feature of the DAM module, BN is batch normalization, leaky Relu is the activation function, channel attention module is the Channel attention submodule, spatial attention module is the Spatial attention submodule, maxPool is the maximum pooling, avgPool is the average pooling, shared MLP is the Shared multilayer perceptron, FC is the fully connected layer, sigmoid is the non-linear activation function, channel-defined attention is the output feature of the Channel attention submodule.
Fig. 3 is a diagram of an improved YOLOv3 overall network structure, where Res block is a residual block, add is a feature map addition, concat is a channel number addition, and Upsample is an upsampling operation.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but the embodiments of the present invention are not limited thereto.
The embodiment provides a power transmission channel smoke detection method based on improved YOLOv3, the flow of the method is divided into three parts, namely data preparation, network design and model detection, and the specific steps are described as follows:
s1, amplifying a collected sample set by using a data enhancement means, labeling smoke to be detected of each image in the amplified sample set to obtain a labeled sample set, and dividing the labeled sample set into a training set and a testing set;
collecting a sample set of images containing smoke to be detected: collecting images containing smoke to be detected in a scene of a plurality of power transmission channels to form an image sample set;
the method adopts a data enhancement means to expand the sample set, and comprises the following specific steps: copying and sticking the smoke to be detected in the sample, and enhancing the position diversity of the smoke, thereby increasing the number of anchor frames matched with the smoke and increasing the training weight of the smoke; in addition, the method also comprises angle rotation, horizontal turnover, gaussian noise, brightness conversion and saturation adjustment, so that the robustness and generalization capability of the model are improved;
labeling the smoke to be detected in each image by using image labeling software labelImg, wherein a labeled area is a positive sample, an unlabeled area is a negative sample, and corresponding category and position information are stored in an xml file; and dividing the whole sample set into a training set and a testing set according to a certain proportion.
S2, clustering the training set by using an improved k-means + + clustering algorithm to obtain a prior frame, and setting the prior frames with different scales to have a higher probability of having the prior frame with good matching degree to the target, so that the model is easier to learn; the principle that the improved k-means + + clustering algorithm selects the clustering centers is that, different from the k-means algorithm randomly selecting k clustering centers, the improved k-means + + clustering algorithm selects the clustering centers: the larger the shortest distance between the point and the current existing clustering center is, the larger the probability of the selected clustering center is, so that the problem of unstable algorithm caused by randomly selecting the initial clustering center is solved;
clustering the size of an anchor frame in a training set by using an improved k-means + + clustering algorithm, and acquiring a priori frame, wherein the basic steps are as follows:
s21, randomly selecting a point from the training set as a first clustering center;
s22, for each point in the training set, calculating the shortest distance between the point and the current existing cluster center (namely the distance between the point and the nearest cluster center), and using the Euclidean distance square as the distance d (x) between samples i ,x j ) I.e. have
Figure BDA0003642557670000081
Wherein x i 、x j Represents the ith and jth sample;
s23, selecting a new data point as a new clustering center, wherein the selection principle is as follows: the larger the shortest distance between the point and the current existing clustering center is, the larger the probability of the point selected as the clustering center is;
s24, repeating S22 and S23 until k clustering centers are selected;
s25, calculating the distance between the residual samples and each clustering center, finding the clustering center with the closest sample distance, and dividing the sample into corresponding clusters;
s26, moving the clustering center to the center of the cluster belonging to the clustering center, and solving the center by directly determining the center as the average value of each coordinate of the cluster;
and S27, repeating S25 and S26 until the cluster center does not move any more.
S3, training the training set by using the improved YOLOv3, and learning model parameters according to a gradient descent method by using the deviation between the prior frame and the marked real frame to obtain an optimal model; wherein the improved YOLOv3 comprises: firstly, designing a focus Module to be added into an original backbone network Darknet-53, improving the problem of target feature loss caused by multiple downsampling, secondly, designing a constrained Attention Module (DAM) combined with hole convolution as a feature enhancement Module to be integrated into the backbone network, further enhancing the feature extraction capability of the backbone network, sending the features into a FPN for multi-scale fusion, then improving a detection network of YOLOv3, adding a YOLO detection head with the scale of 104 x 104 into the detection network for feature fusion, and adding SPP behind each YOLO detection head, so that the feature extraction effect is prevented from being reduced due to the increase of the network depth;
training the training set by using the improved YOLOv3, and obtaining the optimal model by the basic flow:
the backbone network of YOLOv3 is improved, a focus module is designed to be added into an original backbone network Darknet-53, and as shown in figure 1, the basic principle is as follows: the Darknet-53 of Yolov3 was downsampled five times to obtain three feature images for training. However, the down-sampling operation may cause the morpheme feature value of the image to be lost, so that the detection of the target becomes difficult. Therefore, in order not to lose information, the focus module introduced by the present invention performs a slicing operation on the image before the image enters the backbone network. The specific operation is similar to the approach downsampling, every other pixel in one picture takes one value, so that four complementary pictures can be obtained, the complementary pictures are stacked in the channel direction, namely, the width and height information of the image is concentrated into the channel space, the input channel is expanded by four times, the spliced pictures are changed into twelve channels compared with the original RGB three-channel mode, the obtained new pictures are subjected to convolution operation, and finally the double downsampling characteristic diagram is obtained on the premise of no information loss.
Further improving the feature extraction capability of YOLOv3, designing an attention mechanism DAM combined with the cavity convolution as a feature enhancement module to be integrated into a backbone network, as shown in FIG. 2, wherein the DAM mainly comprises three parts, namely a cavity module, a channel attention submodule and a space attention submodule, and the implementation basic flow is as follows:
two continuous cavity blocks (partitioned blocks) are adopted to form a cavity module, each cavity block is similar to a bottleneck layer (bottleneck) in structure, the bottleneck layer (bottleneck) is formed by convolution of three layers, and the sizes of kernels are 1 × 1, 3 × 3 and 1 × 1 respectively. The difference is that we set a certain expansion ratio n for the 3 x 3 convolutional layer, which represents the spacing between values in the convolutional kernels, i.e. a 0 value is inserted every interval n between consecutive filter values, which enlarges the kernel size without increasing the number of parameters and the computational cost. Specifically, the extended 3 × 3 convolution has the same receptive field as a standard convolution with a convolution kernel size of 3+2 × (n-1). Between the two convolutional layers, in turn, a Batch Normalization (BN) layer and an activation function (leak ReLU);
the specific flow for realizing the channel attention submodule is as follows: respectively carrying out global maximum pooling and global average pooling on the basis of width and height on an input feature map F (H multiplied by W multiplied by C) to obtain two 1 multiplied by C feature maps; then, they are inputted into a two-layered MLP, the number of neurons in the first layer is C/r (r is the decay rate), the number of neurons in the second layer is C, and the parameters of the two-layered MLP are shared. Then, carrying out addition operation based on element-wise on the two characteristics output by the MLP, and generating a weight coefficient Mc through a sigmoid activation function; finally, multiplying the weighting coefficient Mc and the input feature graph F based on element-wise to generate input features required by the spatial attention submodule;
the specific process of the space attention submodule implementation is as follows: and taking the feature map output by the channel attention submodule as an input feature map of the spatial attention submodule. Firstly, obtaining two H multiplied by W multiplied by 1 feature maps through a global maximum pooling and a global average pooling based on a channel; then, channel splicing (concat) is carried out on the two characteristic graphs; then, after 7 multiplied by 7 convolution operation, the dimension is reduced to 1 channel number, namely H multiplied by W multiplied by 1; then generating a weight coefficient Ms through a sigmoid activation function; finally, performing element-wise multiplication operation on the weight coefficient Ms and the input feature diagram of the spatial attention submodule to obtain a finally generated feature;
and integrating the cavity module, the channel attention submodule and the space attention submodule into the DAM. The cavity convolution enlarges the receptive field on the premise of not losing the resolution, adjusts the expansion rate for objects with different scales and adaptively adjusts the size of the receptive field, and aims to solve the problem caused by scale change. The channel attention and spatial attention sub-modules are applied in turn, letting each know what and where attention should be paid on the channel axis and spatial axis, respectively. An integrated attention mechanism can help our model learning to suppress irrelevant regions while highlighting salient features useful for target detection.
The detection network of improved YOLOv3 comprises the following basic processes:
the residual error network can effectively solve the problem of difficult convergence caused by too deep network, and avoid the problems of gradient attenuation, network degradation and the like in a deep learning network model. Through a plurality of experiments, the fact that a good effect can be achieved by adding two layers of residual error networks is verified. Therefore, the improved YOLOv3 adds two layers of residual error networks before detecting the network;
the Darknet-53 obtains characteristic diagrams with three scales of 13 x 13, 26 x 26 and 52 x 52 respectively in the process of multiple down-sampling, and the characteristic diagrams with the three scales are fused, so that the model has the capability of detecting targets with different sizes. In order to enhance the feature extraction capability of the model on the target, the invention adds a feature map with the scale of 104 × 104 on the original detection output layer, namely, the feature scale is changed into 13 × 13, 26 × 26, 52 × 52 and 104 × 104. The 52 x 52 characteristic graph is firstly subjected to 1 x 1 convolution kernel operation, then the output scale is changed into 104 x 104 after the up-sampling, and finally the shallow high-resolution information and the deep high-semantic information are effectively utilized through channel splicing (concatenate) and newly-added characteristic scale fusion;
a Spatial Pyramid Pooling (SPP) is added between the fifth and sixth convolution layers before each YOLO detector head to optimize feature extraction and avoid the decrease of feature extraction effect with the increase of network depth. The SPP consists of four parallel branches, a maximal pooling with convolution kernels of 5 × 5, 9 × 9, 13 × 13 and a jump connection. That is, the detection graph after feature scale extraction is subjected to maximum pooling operations of 5 × 5, 9 × 9 and 13 × 13, the pooling step is 1, and the feature graph after the pooling step is completed and the input feature graph of the SPP are further subjected to a concatenate operation. And splicing the outputs of the 4 branches on the channel dimension to obtain a new characteristic diagram. And finally, passing the newly obtained feature diagram through a convolution layer to obtain the channel number of the original feature.
The design of SPP is intended to be plug and play, so it is important to keep the dimensions constant, so that SPP can be guaranteed to be inserted anywhere in the network without error. In addition, multi-scale local region feature information can be extracted from the feature maps and is fused into subsequent global features to obtain richer feature representation, and detection accuracy is improved.
The device-dependent configuration, parameter setting and specific steps of training the augmented, labeled training set using the improved YOLOv3 as shown in fig. 3 are:
operating the system: ubuntu16.04, operating environment: python3.8+ pytorch1.3.1, GPU: NVIDIA GeForce GTX 1080Ti, GPU acceleration library: CUDA10.0+ CUDNN7.4.1;
image size of input network: 416 × 416, initial learning rate: 0.001, the learning rate adjustment mode is exponential decay, and each momentum parameter: 0.9, weight decay regularization term: 0.0005, training round: 2000;
and (3) sending the amplified and labeled training set into an improved YOLOv3 for training, learning model parameters according to a gradient descent method by using the deviation between a prior frame and a labeled real frame in the training process, keeping the model parameters floating up and down at a certain value all the time when a loss function is converged, stopping training, and forming an optimal model by the network structure and the weight file at the moment.
And S4, inputting the test set into the optimal model, and marking the smoke of the power transmission channel to be detected in a rectangular frame form.
The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims (4)

1. A power transmission channel smoke detection method based on improved YOLOv3 is characterized by comprising the following steps:
s1, amplifying a collected sample set by using a data enhancement means, labeling smoke to be detected of each image in the amplified sample set to obtain a labeled sample set, and dividing the labeled sample set into a training set and a testing set;
s2, clustering the training set by using an improved k-means + + clustering algorithm to obtain a prior frame, and setting the prior frames with different scales to enable the prior frame with high probability to have good matching degree with the target to appear, so that the model is easy to learn; the principle that the improved k-means + + clustering algorithm selects the clustering centers is as follows: the larger the shortest distance between the point and the current existing clustering center is, the larger the probability of the selected clustering center is, so that the problem of unstable algorithm caused by randomly selecting the initial clustering center is solved;
s3, training the training set by using the improved YOLOv3, and learning model parameters according to a gradient descent method by using the deviation between the prior frame and the marked real frame to obtain an optimal model; wherein the improved YOLOv3 comprises: firstly, designing a focus Module to be added into an original backbone network Darknet-53, improving the problem of target feature loss caused by multiple downsampling, secondly, designing a related Attention mechanism (DAM) combined with cavity convolution to be integrated into the backbone network as a feature enhancement Module, further enhancing the feature extraction capability of the backbone network, sending the features into a FPN for multi-scale fusion, then improving a detection network of YOLov3, adding a YOLO detection head with the scale of 104 x 104 into the detection network for feature fusion, and adding SPP behind each YOLO detection head, so that the feature extraction effect is prevented from being reduced due to the increase of the network depth;
and S4, inputting the test set into the optimal model, and marking the smoke of the power transmission channel to be detected in a rectangular frame form.
2. The method for detecting smoke in a power transmission channel based on improved YOLOv3 as claimed in claim 1, wherein said step S1 comprises:
s11, collecting an image sample set containing smoke to be detected: collecting images containing smoke to be detected in a scene of a plurality of power transmission channels to form an image sample set;
s12, expanding the sample set by adopting a data enhancement method, wherein the specific method comprises the following steps: copying and pasting smoke to be detected in a sample, and enhancing the position diversity of the smoke, so that the number of anchor frames matched with the smoke is increased, the training weight of the smoke is increased, and in addition, angle rotation, horizontal turnover, gaussian noise, brightness conversion and saturation adjustment are also included;
s13, labeling the smoke to be detected in each image by using image labeling software labelImg, wherein a labeled area is a positive sample, an unlabeled area is a negative sample, and corresponding category and position information is stored in an xml file; and the whole sample set is divided into a training set and a testing set according to the proportion.
3. The improved YOLOv 3-based power transmission channel smoke detection method as claimed in claim 1, wherein in step S2, the size of an anchor frame in a training set is clustered by using an improved k-means + + clustering algorithm, and the step of obtaining a priori frame is:
s21, randomly selecting a point from the training set as a first clustering center;
s22, calculating the shortest distance between each point in the training set and the current existing cluster center, namely the distance between each point and the nearest cluster center, and using the Euclidean distance square as the distance d (x) between samples i ,x j ) I.e. have
Figure FDA0003642557660000021
Wherein x is i 、x j Represents the ith and jth samples;
s23, selecting a new data point as a new clustering center, wherein the selection principle is as follows: the larger the shortest distance between the point and the current existing clustering center is, the larger the probability of the selected point as the clustering center is;
s24, repeating S22 and S23 until k clustering centers are selected;
s25, calculating the distance between the residual samples and each clustering center, finding the clustering center with the closest sample distance, and dividing the sample into corresponding clusters;
s26, moving the clustering center to the center of the cluster group belonging to the clustering center, wherein the method for calculating the center is to directly determine the center as the average value of each coordinate of the cluster group;
and S27, repeating S25 and S26 until the cluster center does not move any more.
4. The method for detecting smoke in a power transmission channel based on improved YOLOv3 as claimed in claim 1, wherein in step S3, the training set is trained by using improved YOLOv3, and the procedure for obtaining the optimal model is as follows:
s31, improving a backbone network of YOLOv3, designing a focus module to be added into an original backbone network Darknet-53, and adopting the following principle: five times of downsampling are carried out on a backbone network Darknet-53 of YOLOv3 to obtain three feature images for training, however, the downsampling operation can cause the characteristic value of a morpheme of an image to be lost, so that a target is difficult to detect, in order to avoid losing information, before the image enters the backbone network, a focus module can carry out slicing operation on the image, the specific operation is to take a value in every other pixel in one image, so that four complementary images can be obtained, the complementary images are stacked in the channel direction, namely the width and height information of the image is concentrated into a channel space, an input channel is expanded by four times, the spliced image is changed into twelve channels compared with an original RGB three-channel mode, then the obtained new image is subjected to convolution operation, and finally a double downsampling feature image is obtained on the premise of no information is lost;
s32, improving the feature extraction capability of YOLOv3, designing an attention mechanism DAM combined with cavity convolution as a feature enhancement module to be integrated into a backbone network, wherein the DAM mainly comprises three parts: the system comprises a cavity module, a channel attention submodule and a space attention submodule, and the implementation process comprises the following steps:
s321, a hole module is formed by two continuous hole blocks, each hole block is similar to a bottleneck layer in structure and is formed by three layers of convolutions, and the sizes of kernels are 1 x 1, 3 x 3 and 1 x 1 respectively, the difference is that the hole module sets an expansion rate n for a 3 x 3 convolution layer, the expansion rate represents the distance between inner values of convolution kernels, namely, a 0 value is inserted into each interval n between continuous filter values, the kernel size is enlarged under the condition of not increasing the number of parameters and the calculation cost, specifically, the expanded 3 x 3 convolution and a standard convolution with the convolution kernel size of 3+2 x (n-1) have the same sensing field, and a batch processing normalization layer and an activation function are sequentially arranged between the two convolution layers;
s322, the specific process of the channel attention submodule implementation is as follows: respectively performing global maximum pooling and global average pooling on the input feature map F with the size of H multiplied by W multiplied by C to obtain two feature maps of 1 multiplied by C; then, inputting the two MLPs into a two-layer MLP respectively, wherein the number of neurons in the first layer is C/r, r is the attenuation rate, the number of neurons in the second layer is C, and the parameters of the two MLPs are shared; then, carrying out addition operation based on element-wise on the two characteristics output by the MLP, and generating a weight coefficient Mc through a sigmoid activation function; finally, performing element-wise multiplication operation on the weighting coefficient Mc and the input feature graph F to generate input features required by the space attention submodule;
s323, the specific process of the space attention submodule implementation is as follows: taking a feature map output by a channel attention submodule as an input feature map of a space attention submodule, and firstly performing global maximum pooling and global average pooling based on a channel to obtain two H multiplied by W multiplied by 1 feature maps; then the two characteristic graphs are subjected to channel splicing; then, after 7 × 7 convolution operation, reducing the dimension to 1 channel number, namely H × W × 1; then generating a weight coefficient Ms through a sigmoid activation function; finally, performing element-wise multiplication operation on the weight coefficient Ms and the input feature diagram of the spatial attention submodule to obtain a finally generated feature;
s324, integrating the cavity module, the channel attention submodule and the space attention submodule into a DAM, and integrating the DAM into a backbone network as a feature enhancement module;
s33, improving a detection network of YOLOv3, wherein the flow is as follows:
s331, two layers of residual error networks are added before the improved YOLOv3 network is detected, the residual error networks can effectively solve the problem of difficult convergence caused by too deep networks, and the problems of gradient attenuation and network degradation in a deep learning network model are avoided;
s332 and Darknet-53 obtain feature maps with three scales of 13 multiplied by 13, 26 multiplied by 26 and 52 multiplied by 52 respectively in the process of multiple down sampling, and then fuse the feature maps with the three scales to ensure that the model has the capability of detecting targets with different sizes, in order to enhance the feature extraction capability of the model to the targets, a feature map with the scale of 104 multiplied by 104 is added on the original detection output layer, namely the feature map with the scale of 13 multiplied by 13, 26 multiplied by 26, 52 multiplied by 52 and 104 multiplied by 104, the feature map with the scale of 52 multiplied by 52 is firstly subjected to 1 multiplied by 1 convolution kernel operation, then the output scale is converted into 104 multiplied by 104 after the feature map is sampled on the feature map, and finally, shallow layer high resolution information and deep layer high semantic information are effectively utilized through channel splicing and fusion with the newly added feature scale;
s333, adding a spatial pyramid pooling SPP between the fifth convolution layer and the sixth convolution layer in front of each YOLO detection head to optimize feature extraction and avoid the feature extraction effect from being reduced along with the increase of network depth, wherein the SPP is composed of four parallel branches, namely, the maximum pooling with convolution kernels of 5 × 5, 9 × 9 and 13 × 13 and a jump connection are respectively performed, namely, the maximum pooling operations of 5 × 5, 9 × 9 and 13 × 13 are performed on the detection graph subjected to feature scale extraction, the pooling step length is 1, the feature graph subjected to the pooling step is further subjected to the concatenate operation with the input feature graph of the SPP, the outputs of 4 branches are spliced on the dimension of a channel to obtain a new feature graph, and the newly obtained feature graph finally passes through one convolution layer to obtain the number of channels of the original features;
s34, training the amplified and labeled training set by using the improved YOLOv3, and specifically comprising the following steps: and (3) sending the amplified and labeled training set into an improved YOLOv3 for training, learning model parameters according to a gradient descent method by using the deviation between the prior frame and the labeled real frame in the training process, keeping the model parameters floating up and down at a certain value all the time when a loss function is converged, stopping training, and forming an optimal model by the network structure and the weight file at the moment.
CN202210519144.2A 2022-05-13 2022-05-13 Power transmission channel smoke detection method based on improved YOLOv3 Pending CN115187921A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210519144.2A CN115187921A (en) 2022-05-13 2022-05-13 Power transmission channel smoke detection method based on improved YOLOv3

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210519144.2A CN115187921A (en) 2022-05-13 2022-05-13 Power transmission channel smoke detection method based on improved YOLOv3

Publications (1)

Publication Number Publication Date
CN115187921A true CN115187921A (en) 2022-10-14

Family

ID=83512820

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210519144.2A Pending CN115187921A (en) 2022-05-13 2022-05-13 Power transmission channel smoke detection method based on improved YOLOv3

Country Status (1)

Country Link
CN (1) CN115187921A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116152720A (en) * 2023-04-17 2023-05-23 山东科技大学 Smoke detection method
CN116612336B (en) * 2023-07-19 2023-10-03 浙江华诺康科技有限公司 Method, apparatus, computer device and storage medium for classifying smoke in endoscopic image
CN116994287A (en) * 2023-07-04 2023-11-03 北京市农林科学院 Animal counting method and device and animal counting equipment
CN117040983A (en) * 2023-09-28 2023-11-10 联通(江苏)产业互联网有限公司 Data sharing method and system based on big data analysis
CN117689731A (en) * 2024-02-02 2024-03-12 陕西德创数字工业智能科技有限公司 Lightweight new energy heavy-duty truck battery pack identification method based on improved YOLOv5 model

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116152720A (en) * 2023-04-17 2023-05-23 山东科技大学 Smoke detection method
CN116994287A (en) * 2023-07-04 2023-11-03 北京市农林科学院 Animal counting method and device and animal counting equipment
CN116994287B (en) * 2023-07-04 2024-05-24 北京市农林科学院 Animal counting method and device and animal counting equipment
CN116612336B (en) * 2023-07-19 2023-10-03 浙江华诺康科技有限公司 Method, apparatus, computer device and storage medium for classifying smoke in endoscopic image
CN117040983A (en) * 2023-09-28 2023-11-10 联通(江苏)产业互联网有限公司 Data sharing method and system based on big data analysis
CN117040983B (en) * 2023-09-28 2023-12-22 联通(江苏)产业互联网有限公司 Data sharing method and system based on big data analysis
CN117689731A (en) * 2024-02-02 2024-03-12 陕西德创数字工业智能科技有限公司 Lightweight new energy heavy-duty truck battery pack identification method based on improved YOLOv5 model
CN117689731B (en) * 2024-02-02 2024-04-26 陕西德创数字工业智能科技有限公司 Lightweight new energy heavy-duty battery pack identification method based on improved YOLOv model

Similar Documents

Publication Publication Date Title
CN115187921A (en) Power transmission channel smoke detection method based on improved YOLOv3
CN109508681B (en) Method and device for generating human body key point detection model
US20230377190A1 (en) Method and device for training models, method and device for detecting body postures, and storage medium
CN112597985B (en) Crowd counting method based on multi-scale feature fusion
CN111402130B (en) Data processing method and data processing device
CN111524135A (en) Image enhancement-based method and system for detecting defects of small hardware fittings of power transmission line
CN110674704A (en) Crowd density estimation method and device based on multi-scale expansion convolutional network
CN110378398B (en) Deep learning network improvement method based on multi-scale feature map jump fusion
CN111079739A (en) Multi-scale attention feature detection method
CN113239825B (en) High-precision tobacco beetle detection method in complex scene
CN114898284B (en) Crowd counting method based on feature pyramid local difference attention mechanism
CN113269775B (en) Defect detection method and device based on multi-scale feature fusion SSD
CN115908772A (en) Target detection method and system based on Transformer and fusion attention mechanism
Wang et al. STCD: efficient Siamese transformers-based change detection method for remote sensing images
CN114926682A (en) Local outlier factor-based industrial image anomaly detection and positioning method and system
CN112668532A (en) Crowd counting method based on multi-stage mixed attention network
CN110059658B (en) Remote sensing satellite image multi-temporal change detection method based on three-dimensional convolutional neural network
CN111881914A (en) License plate character segmentation method and system based on self-learning threshold
US20240037713A1 (en) System, devices and/or processes for image anti-aliasing
CN116580330A (en) Machine test abnormal behavior detection method based on double-flow network
CN115797684A (en) Infrared small target detection method and system based on context information
CN110516669B (en) Multi-level and multi-scale fusion character detection method in complex environment
CN115205518A (en) Target detection method and system based on YOLO v5s network structure
CN114399696A (en) Target detection method and device, storage medium and electronic equipment
CN112464989A (en) Closed loop detection method based on target detection network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination