CN115187921A

CN115187921A - Power transmission channel smoke detection method based on improved YOLOv3

Info

Publication number: CN115187921A
Application number: CN202210519144.2A
Authority: CN
Inventors: 吴玉香; 郑浩东
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2022-05-13
Filing date: 2022-05-13
Publication date: 2022-10-14

Abstract

The invention discloses a power transmission channel smoke detection method based on improved YOLOv3, which comprises the steps of firstly, replacing a traditional k-means algorithm with an improved k-means + + clustering algorithm, reducing clustering result deviation, and redefining a prior frame; secondly, designing a focus module to improve a feature extraction network, and improving the problem of target feature loss caused by multiple downsampling; then, designing an attention mechanism DAM combined with the cavity convolution as a feature enhancement module to be integrated into a backbone network, and then sending the features into the FPN for multi-scale fusion; and finally, adding a large-scale YOLO detection head in the detection network for feature fusion, and adding SPP behind each YOLO detection head to avoid the reduction of the feature extraction effect due to the increase of the network depth. Compared with the existing YOLOv3 algorithm, the method can effectively improve the accuracy of smoke detection of the power transmission channel on the premise of ensuring the detection speed, and has very important significance for maintaining the safety of a power grid system.

Description

Power transmission channel smoke detection method based on improved YOLOv3

Technical Field

The invention relates to the technical field of computer vision and target detection, in particular to a transmission channel smoke detection method based on improved YOLOv 3.

Background

When a fire disaster happens near a power transmission channel of a power grid, the insulation clearance of a power transmission line is extremely easy to reduce to induce the tripping of the power transmission line, and further, a large-range long-time power failure accident is caused. Therefore, the fire monitoring is carried out on the power transmission channel, the real-time performance and the accuracy of the fire monitoring are improved, and the reliable operation of the power grid is effectively guaranteed.

The detection of the fire in the power transmission channel can be realized through smoke and flame, but once the fire happens, serious property loss and safety risk are brought to the power transmission channel. In the early stage of a fire, smoke is generated and scattered continuously, and if the smoke can be detected in time and effectively suppressed in this stage, property loss and social influence can be minimized, so that the early-stage smoke detection is very important. However, in the smoke detection task of the power transmission channel, firstly, part of targets are low in resolution, less in information and more in noise; secondly, the smoke form is not fixed and can be diffused randomly, and more targets like smoke exist in a smoke site, so that the positioning and the detection of real smoke are not facilitated; finally, the detection effect on the smoke is often poor due to the insufficient extraction of the backbone network features of the depth model, the insufficient semantic information of the shallow convolutional layer and other factors.

In summary, power transmission channel smoke detection based on deep learning is still a challenging issue. Therefore, it is a problem to be solved in the prior art to improve robustness and accuracy for smoke detection of a power transmission channel in a target detection task.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a power transmission channel smoke detection method based on improved YOLOv3, which can effectively detect the smoke of a power transmission channel, accelerate the convergence speed of training, improve the detection precision on the premise that the detection speed is not influenced, and solve the problem of difficult detection caused by unfixed target form, insufficient deep neural network characteristic extraction and the like in the current power transmission channel smoke detection.

In order to realize the purpose, the technical scheme provided by the invention is as follows: a power transmission channel smoke detection method based on improved YOLOv3 comprises the following steps:

s1, amplifying a collected sample set by using a data enhancement means, labeling smoke to be detected of each image in the amplified sample set to obtain a labeled sample set, and dividing the labeled sample set into a training set and a testing set;

s2, clustering the training set by using an improved k-means + + clustering algorithm to obtain a prior frame, and setting the prior frames with different scales to have a higher probability of having the prior frame with good matching degree to the target, so that the model is easier to learn; the principle that the improved k-means + + clustering algorithm selects the clustering centers is as follows: the larger the shortest distance between the point and the current existing clustering center is, the larger the probability of the selected clustering center is, so that the problem of unstable algorithm caused by randomly selecting the initial clustering center is solved;

s3, training the training set by using the improved YOLOv3, and learning model parameters according to a gradient descent method by using the deviation between the prior frame and the marked real frame to obtain an optimal model; wherein the improved YOLOv3 comprises: firstly, designing a focus Module to be added into an original backbone network Darknet-53, improving the problem of target feature loss caused by multiple downsampling, secondly, designing a constrained Attention Module (DAM) combined with hole convolution as a feature enhancement Module to be integrated into the backbone network, further enhancing the feature extraction capability of the backbone network, sending the features into a FPN for multi-scale fusion, then improving a detection network of YOLOv3, adding a YOLO detection head with the scale of 104 x 104 into the detection network for feature fusion, and adding SPP behind each YOLO detection head, so that the feature extraction effect is prevented from being reduced due to the increase of the network depth;

and S4, inputting the test set into the optimal model, and marking the smoke of the power transmission channel to be detected in a rectangular frame form.

Further, the step S1 includes:

s11, collecting an image sample set containing smoke to be detected: collecting images containing smoke to be detected in a scene of a plurality of power transmission channels to form an image sample set;

s12, expanding the sample set by adopting a data enhancement means, wherein the specific method comprises the following steps: copying and pasting smoke to be detected in a sample, and enhancing the position diversity of the smoke, so that the number of anchor frames matched with the smoke is increased, the training weight of the smoke is increased, and in addition, angle rotation, horizontal turnover, gaussian noise, brightness conversion and saturation adjustment are also included;

s13, labeling the smoke to be detected in each image by using image labeling software labelImg, wherein a labeled area is a positive sample, an unlabeled area is a negative sample, and corresponding category and position information is stored in an xml file; and the whole sample set is divided into a training set and a testing set according to the proportion.

Further, in step S2, using an improved k-means + + clustering algorithm to cluster the size of the anchor frame in the training set, and obtaining the prior frame includes:

s21, randomly selecting a point from the training set as a first clustering center;

s22, calculating the shortest distance between each point in the training set and the current existing cluster center, namely the distance between each point and the nearest cluster center, and using the Euclidean distance square as the distance d (x) between samples _i ,x _j ) That is to say have

Wherein x _i 、x _j Represents the ith and jth samples;

s23, selecting a new data point as a new clustering center, wherein the selection principle is as follows: the larger the shortest distance between the point and the current existing clustering center is, the larger the probability of the selected point as the clustering center is;

s24, repeating S22 and S23 until k clustering centers are selected;

s25, calculating the distance between the residual samples and each clustering center, finding the clustering center with the closest sample distance, and dividing the sample into corresponding clusters;

s26, moving the clustering center to the center of the cluster group belonging to the clustering center, and solving the center by directly determining the center as the average value of each coordinate of the cluster group;

and S27, repeating S25 and S26 until the cluster center does not move any more.

Further, in step S3, the training set is trained by using the improved YOLOv3, and the process of obtaining the optimal model is as follows:

s31, improving a backbone network of YOLOv3, designing a focus module to be added into an original backbone network Darknet-53, and adopting the following principle: five times of downsampling are carried out on a backbone network Darknet-53 of YOLOv3 to obtain three feature images for training, however, the downsampling operation can cause the characteristic value of a morpheme of an image to be lost, so that a target is difficult to detect, in order to avoid losing information, before the image enters the backbone network, a focus module can carry out slicing operation on the image, the specific operation is to take a value in every other pixel in one image, so that four complementary images can be obtained, the complementary images are stacked in the channel direction, namely the width and height information of the image is concentrated into a channel space, an input channel is expanded by four times, the spliced image is changed into twelve channels compared with an original RGB three-channel mode, then the obtained new image is subjected to convolution operation, and finally a double downsampling feature image is obtained on the premise of no information is lost;

s32, improving the feature extraction capability of YOLOv3, designing an attention mechanism DAM combined with cavity convolution as a feature enhancement module to be integrated into a backbone network, wherein the DAM mainly comprises three parts: the cavity module, the channel attention submodule and the space attention submodule are implemented by the following processes:

s321, a hole module is formed by two continuous hole blocks, each hole block is similar to a bottleneck layer in structure and is formed by three layers of convolutions, and the sizes of kernels are 1 x 1, 3 x 3 and 1 x 1 respectively, the difference is that the hole module sets an expansion rate n for a 3 x 3 convolution layer, the expansion rate represents the distance between inner values of convolution kernels, namely, a 0 value is inserted into each interval n between continuous filter values, the kernel size is enlarged under the condition of not increasing the number of parameters and the calculation cost, specifically, the expanded 3 x 3 convolution and a standard convolution with the convolution kernel size of 3+2 x (n-1) have the same sensing field, and a batch processing normalization layer and an activation function are sequentially arranged between the two convolution layers;

s322, the specific process of the channel attention submodule implementation is as follows: respectively performing global maximum pooling and global average pooling on the input feature map F with the size of H multiplied by W multiplied by C to obtain two feature maps of 1 multiplied by C; then, inputting the two MLPs into a two-layer MLP respectively, wherein the number of neurons in the first layer is C/r, r is the attenuation rate, the number of neurons in the second layer is C, and the parameters of the two MLPs are shared; then, carrying out addition operation based on element-wise on the two characteristics output by the MLP, and generating a weight coefficient Mc through a sigmoid activation function; finally, performing element-wise multiplication operation on the weighting coefficient Mc and the input feature graph F to generate input features required by the space attention submodule;

s323, the specific process of the space attention submodule implementation is as follows: taking a feature map output by a channel attention submodule as an input feature map of a space attention submodule, and firstly, obtaining two H multiplied by W multiplied by 1 feature maps through global maximum pooling and global average pooling based on a channel; then the two characteristic graphs are subjected to channel splicing; then, after 7 multiplied by 7 convolution operation, the dimension is reduced to 1 channel number, namely H multiplied by W multiplied by 1; then generating a weight coefficient Ms through a sigmoid activation function; finally, multiplying the weight coefficient Ms and the input feature diagram of the spatial attention submodule based on element-wise to obtain a finally generated feature;

s324, integrating the cavity module, the channel attention submodule and the space attention submodule into a DAM, and integrating the DAM into a backbone network as a feature enhancement module;

s33, improving a detection network of YOLOv3, wherein the flow is as follows:

s331, two layers of residual error networks are added before the improved YOLOv3 network is detected, the residual error networks can effectively solve the problem of difficult convergence caused by too deep networks, and the problems of gradient attenuation and network degradation in a deep learning network model are avoided;

s332 and Darknet-53 obtain feature maps with three scales of 13 multiplied by 13, 26 multiplied by 26 and 52 multiplied by 52 respectively in the process of multiple down sampling, and then fuse the feature maps with the three scales to ensure that the model has the capability of detecting targets with different sizes, in order to enhance the feature extraction capability of the model to the targets, a feature map with the scale of 104 multiplied by 104 is added on the original detection output layer, namely the feature map with the scale of 13 multiplied by 13, 26 multiplied by 26, 52 multiplied by 52 and 104 multiplied by 104, the feature map with the scale of 52 multiplied by 52 is firstly subjected to 1 multiplied by 1 convolution kernel operation, then the output scale is converted into 104 multiplied by 104 after the feature map is sampled on the feature map, and finally, shallow layer high resolution information and deep layer high semantic information are effectively utilized through channel splicing and fusion with the newly added feature scale;

s333, adding a spatial pyramid pooling SPP between the fifth convolution layer and the sixth convolution layer in front of each YOLO detection head to optimize feature extraction and avoid the feature extraction effect from being reduced along with the increase of network depth, wherein the SPP is composed of four parallel branches, namely, the maximum pooling with convolution kernels of 5 × 5, 9 × 9 and 13 × 13 and a jump connection are respectively performed, namely, the maximum pooling operations of 5 × 5, 9 × 9 and 13 × 13 are performed on the detection graph subjected to feature scale extraction, the pooling step length is 1, the feature graph subjected to the pooling step is further subjected to the concatenate operation with the input feature graph of the SPP, the outputs of 4 branches are spliced on the dimension of a channel to obtain a new feature graph, and the newly obtained feature graph finally passes through one convolution layer to obtain the number of channels of the original features;

s34, training the amplified and labeled training set by using the improved YOLOv3, and specifically comprising the following steps: and (3) sending the amplified and labeled training set into an improved YOLOv3 for training, learning model parameters according to a gradient descent method by using the deviation between a prior frame and a labeled real frame in the training process, keeping the model parameters floating up and down at a certain value all the time when a loss function is converged, stopping training, and forming an optimal model by the network structure and the weight file at the moment.

Compared with the prior art, the invention has the following advantages and beneficial effects:

1. an improved k-means + + clustering algorithm is used for replacing a traditional k-means algorithm to generate a prior frame, so that clustering result deviation caused by randomness is reduced, and the scale of the prior frame is closer to a real data set.

2. A focus module and a DAM (direct memory access) mechanism combined with cavity convolution are designed and introduced into a feature extraction network, so that the problem of target feature loss caused by multiple downsampling can be solved, meanwhile, the focus mechanism can better focus on an interested area, the positioning precision of a target is improved, and the smoke detection error rate is reduced.

3. A large-scale YOLO detection head is added in a detection network for feature fusion, more shallow high-resolution detail information and deep high-semantic information can be contained, and SPP is added behind each YOLO detection head to avoid the reduction of feature extraction effect due to the increase of network depth.

4. Compared with the existing YOLOv3 algorithm, the method has the advantages that the convergence rate of the model in the training process is increased, the accuracy of smoke detection of the power transmission channel is effectively improved on the premise of ensuring the detection speed, and the method has very important significance for maintaining the safety of a power grid system.

Drawings

Fig. 1 is a schematic diagram of the focus module principle and structure, where Input Image is the Input Image, slice is the Slice operation, concat is the stitching operation in the channel direction, and Conv3 × 3 is a convolution kernel of size 3 × 3.

FIG. 2 is a diagram of an attention mechanism DAM module combined with hole convolution and its components, where Input feature is the Input feature of the image, scaled block is the hole block, channel attention is the Channel attention, spatial attention is the Spatial attention, referred feature is the output feature of the DAM module, BN is batch normalization, leaky Relu is the activation function, channel attention module is the Channel attention submodule, spatial attention module is the Spatial attention submodule, maxPool is the maximum pooling, avgPool is the average pooling, shared MLP is the Shared multilayer perceptron, FC is the fully connected layer, sigmoid is the non-linear activation function, channel-defined attention is the output feature of the Channel attention submodule.

Fig. 3 is a diagram of an improved YOLOv3 overall network structure, where Res block is a residual block, add is a feature map addition, concat is a channel number addition, and Upsample is an upsampling operation.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but the embodiments of the present invention are not limited thereto.

The embodiment provides a power transmission channel smoke detection method based on improved YOLOv3, the flow of the method is divided into three parts, namely data preparation, network design and model detection, and the specific steps are described as follows:

collecting a sample set of images containing smoke to be detected: collecting images containing smoke to be detected in a scene of a plurality of power transmission channels to form an image sample set;

the method adopts a data enhancement means to expand the sample set, and comprises the following specific steps: copying and sticking the smoke to be detected in the sample, and enhancing the position diversity of the smoke, thereby increasing the number of anchor frames matched with the smoke and increasing the training weight of the smoke; in addition, the method also comprises angle rotation, horizontal turnover, gaussian noise, brightness conversion and saturation adjustment, so that the robustness and generalization capability of the model are improved;

labeling the smoke to be detected in each image by using image labeling software labelImg, wherein a labeled area is a positive sample, an unlabeled area is a negative sample, and corresponding category and position information are stored in an xml file; and dividing the whole sample set into a training set and a testing set according to a certain proportion.

S2, clustering the training set by using an improved k-means + + clustering algorithm to obtain a prior frame, and setting the prior frames with different scales to have a higher probability of having the prior frame with good matching degree to the target, so that the model is easier to learn; the principle that the improved k-means + + clustering algorithm selects the clustering centers is that, different from the k-means algorithm randomly selecting k clustering centers, the improved k-means + + clustering algorithm selects the clustering centers: the larger the shortest distance between the point and the current existing clustering center is, the larger the probability of the selected clustering center is, so that the problem of unstable algorithm caused by randomly selecting the initial clustering center is solved;

clustering the size of an anchor frame in a training set by using an improved k-means + + clustering algorithm, and acquiring a priori frame, wherein the basic steps are as follows:

s22, for each point in the training set, calculating the shortest distance between the point and the current existing cluster center (namely the distance between the point and the nearest cluster center), and using the Euclidean distance square as the distance d (x) between samples _i ,x _j ) I.e. have

Wherein x _i 、x _j Represents the ith and jth sample;

s23, selecting a new data point as a new clustering center, wherein the selection principle is as follows: the larger the shortest distance between the point and the current existing clustering center is, the larger the probability of the point selected as the clustering center is;

s24, repeating S22 and S23 until k clustering centers are selected;

s26, moving the clustering center to the center of the cluster belonging to the clustering center, and solving the center by directly determining the center as the average value of each coordinate of the cluster;

and S27, repeating S25 and S26 until the cluster center does not move any more.

training the training set by using the improved YOLOv3, and obtaining the optimal model by the basic flow:

the backbone network of YOLOv3 is improved, a focus module is designed to be added into an original backbone network Darknet-53, and as shown in figure 1, the basic principle is as follows: the Darknet-53 of Yolov3 was downsampled five times to obtain three feature images for training. However, the down-sampling operation may cause the morpheme feature value of the image to be lost, so that the detection of the target becomes difficult. Therefore, in order not to lose information, the focus module introduced by the present invention performs a slicing operation on the image before the image enters the backbone network. The specific operation is similar to the approach downsampling, every other pixel in one picture takes one value, so that four complementary pictures can be obtained, the complementary pictures are stacked in the channel direction, namely, the width and height information of the image is concentrated into the channel space, the input channel is expanded by four times, the spliced pictures are changed into twelve channels compared with the original RGB three-channel mode, the obtained new pictures are subjected to convolution operation, and finally the double downsampling characteristic diagram is obtained on the premise of no information loss.

Further improving the feature extraction capability of YOLOv3, designing an attention mechanism DAM combined with the cavity convolution as a feature enhancement module to be integrated into a backbone network, as shown in FIG. 2, wherein the DAM mainly comprises three parts, namely a cavity module, a channel attention submodule and a space attention submodule, and the implementation basic flow is as follows:

two continuous cavity blocks (partitioned blocks) are adopted to form a cavity module, each cavity block is similar to a bottleneck layer (bottleneck) in structure, the bottleneck layer (bottleneck) is formed by convolution of three layers, and the sizes of kernels are 1 × 1, 3 × 3 and 1 × 1 respectively. The difference is that we set a certain expansion ratio n for the 3 x 3 convolutional layer, which represents the spacing between values in the convolutional kernels, i.e. a 0 value is inserted every interval n between consecutive filter values, which enlarges the kernel size without increasing the number of parameters and the computational cost. Specifically, the extended 3 × 3 convolution has the same receptive field as a standard convolution with a convolution kernel size of 3+2 × (n-1). Between the two convolutional layers, in turn, a Batch Normalization (BN) layer and an activation function (leak ReLU);

the specific flow for realizing the channel attention submodule is as follows: respectively carrying out global maximum pooling and global average pooling on the basis of width and height on an input feature map F (H multiplied by W multiplied by C) to obtain two 1 multiplied by C feature maps; then, they are inputted into a two-layered MLP, the number of neurons in the first layer is C/r (r is the decay rate), the number of neurons in the second layer is C, and the parameters of the two-layered MLP are shared. Then, carrying out addition operation based on element-wise on the two characteristics output by the MLP, and generating a weight coefficient Mc through a sigmoid activation function; finally, multiplying the weighting coefficient Mc and the input feature graph F based on element-wise to generate input features required by the spatial attention submodule;

the specific process of the space attention submodule implementation is as follows: and taking the feature map output by the channel attention submodule as an input feature map of the spatial attention submodule. Firstly, obtaining two H multiplied by W multiplied by 1 feature maps through a global maximum pooling and a global average pooling based on a channel; then, channel splicing (concat) is carried out on the two characteristic graphs; then, after 7 multiplied by 7 convolution operation, the dimension is reduced to 1 channel number, namely H multiplied by W multiplied by 1; then generating a weight coefficient Ms through a sigmoid activation function; finally, performing element-wise multiplication operation on the weight coefficient Ms and the input feature diagram of the spatial attention submodule to obtain a finally generated feature;

and integrating the cavity module, the channel attention submodule and the space attention submodule into the DAM. The cavity convolution enlarges the receptive field on the premise of not losing the resolution, adjusts the expansion rate for objects with different scales and adaptively adjusts the size of the receptive field, and aims to solve the problem caused by scale change. The channel attention and spatial attention sub-modules are applied in turn, letting each know what and where attention should be paid on the channel axis and spatial axis, respectively. An integrated attention mechanism can help our model learning to suppress irrelevant regions while highlighting salient features useful for target detection.

The detection network of improved YOLOv3 comprises the following basic processes:

the residual error network can effectively solve the problem of difficult convergence caused by too deep network, and avoid the problems of gradient attenuation, network degradation and the like in a deep learning network model. Through a plurality of experiments, the fact that a good effect can be achieved by adding two layers of residual error networks is verified. Therefore, the improved YOLOv3 adds two layers of residual error networks before detecting the network;

the Darknet-53 obtains characteristic diagrams with three scales of 13 x 13, 26 x 26 and 52 x 52 respectively in the process of multiple down-sampling, and the characteristic diagrams with the three scales are fused, so that the model has the capability of detecting targets with different sizes. In order to enhance the feature extraction capability of the model on the target, the invention adds a feature map with the scale of 104 × 104 on the original detection output layer, namely, the feature scale is changed into 13 × 13, 26 × 26, 52 × 52 and 104 × 104. The 52 x 52 characteristic graph is firstly subjected to 1 x 1 convolution kernel operation, then the output scale is changed into 104 x 104 after the up-sampling, and finally the shallow high-resolution information and the deep high-semantic information are effectively utilized through channel splicing (concatenate) and newly-added characteristic scale fusion;

a Spatial Pyramid Pooling (SPP) is added between the fifth and sixth convolution layers before each YOLO detector head to optimize feature extraction and avoid the decrease of feature extraction effect with the increase of network depth. The SPP consists of four parallel branches, a maximal pooling with convolution kernels of 5 × 5, 9 × 9, 13 × 13 and a jump connection. That is, the detection graph after feature scale extraction is subjected to maximum pooling operations of 5 × 5, 9 × 9 and 13 × 13, the pooling step is 1, and the feature graph after the pooling step is completed and the input feature graph of the SPP are further subjected to a concatenate operation. And splicing the outputs of the 4 branches on the channel dimension to obtain a new characteristic diagram. And finally, passing the newly obtained feature diagram through a convolution layer to obtain the channel number of the original feature.

The design of SPP is intended to be plug and play, so it is important to keep the dimensions constant, so that SPP can be guaranteed to be inserted anywhere in the network without error. In addition, multi-scale local region feature information can be extracted from the feature maps and is fused into subsequent global features to obtain richer feature representation, and detection accuracy is improved.

The device-dependent configuration, parameter setting and specific steps of training the augmented, labeled training set using the improved YOLOv3 as shown in fig. 3 are:

operating the system: ubuntu16.04, operating environment: python3.8+ pytorch1.3.1, GPU: NVIDIA GeForce GTX 1080Ti, GPU acceleration library: CUDA10.0+ CUDNN7.4.1;

image size of input network: 416 × 416, initial learning rate: 0.001, the learning rate adjustment mode is exponential decay, and each momentum parameter: 0.9, weight decay regularization term: 0.0005, training round: 2000;

and (3) sending the amplified and labeled training set into an improved YOLOv3 for training, learning model parameters according to a gradient descent method by using the deviation between a prior frame and a labeled real frame in the training process, keeping the model parameters floating up and down at a certain value all the time when a loss function is converged, stopping training, and forming an optimal model by the network structure and the weight file at the moment.

The above embodiments are preferred embodiments of the present invention, but the present invention is not limited to the above embodiments, and any other changes, modifications, substitutions, combinations, and simplifications which do not depart from the spirit and principle of the present invention should be construed as equivalents thereof, and all such changes, modifications, substitutions, combinations, and simplifications are intended to be included in the scope of the present invention.

Claims

1. A power transmission channel smoke detection method based on improved YOLOv3 is characterized by comprising the following steps:

s2, clustering the training set by using an improved k-means + + clustering algorithm to obtain a prior frame, and setting the prior frames with different scales to enable the prior frame with high probability to have good matching degree with the target to appear, so that the model is easy to learn; the principle that the improved k-means + + clustering algorithm selects the clustering centers is as follows: the larger the shortest distance between the point and the current existing clustering center is, the larger the probability of the selected clustering center is, so that the problem of unstable algorithm caused by randomly selecting the initial clustering center is solved;

s3, training the training set by using the improved YOLOv3, and learning model parameters according to a gradient descent method by using the deviation between the prior frame and the marked real frame to obtain an optimal model; wherein the improved YOLOv3 comprises: firstly, designing a focus Module to be added into an original backbone network Darknet-53, improving the problem of target feature loss caused by multiple downsampling, secondly, designing a related Attention mechanism (DAM) combined with cavity convolution to be integrated into the backbone network as a feature enhancement Module, further enhancing the feature extraction capability of the backbone network, sending the features into a FPN for multi-scale fusion, then improving a detection network of YOLov3, adding a YOLO detection head with the scale of 104 x 104 into the detection network for feature fusion, and adding SPP behind each YOLO detection head, so that the feature extraction effect is prevented from being reduced due to the increase of the network depth;

2. The method for detecting smoke in a power transmission channel based on improved YOLOv3 as claimed in claim 1, wherein said step S1 comprises:

s12, expanding the sample set by adopting a data enhancement method, wherein the specific method comprises the following steps: copying and pasting smoke to be detected in a sample, and enhancing the position diversity of the smoke, so that the number of anchor frames matched with the smoke is increased, the training weight of the smoke is increased, and in addition, angle rotation, horizontal turnover, gaussian noise, brightness conversion and saturation adjustment are also included;

3. The improved YOLOv 3-based power transmission channel smoke detection method as claimed in claim 1, wherein in step S2, the size of an anchor frame in a training set is clustered by using an improved k-means + + clustering algorithm, and the step of obtaining a priori frame is:

s22, calculating the shortest distance between each point in the training set and the current existing cluster center, namely the distance between each point and the nearest cluster center, and using the Euclidean distance square as the distance d (x) between samples _i ,x _j ) I.e. have

Wherein x is _i 、x _j Represents the ith and jth samples;

s24, repeating S22 and S23 until k clustering centers are selected;

s26, moving the clustering center to the center of the cluster group belonging to the clustering center, wherein the method for calculating the center is to directly determine the center as the average value of each coordinate of the cluster group;

and S27, repeating S25 and S26 until the cluster center does not move any more.

4. The method for detecting smoke in a power transmission channel based on improved YOLOv3 as claimed in claim 1, wherein in step S3, the training set is trained by using improved YOLOv3, and the procedure for obtaining the optimal model is as follows:

s32, improving the feature extraction capability of YOLOv3, designing an attention mechanism DAM combined with cavity convolution as a feature enhancement module to be integrated into a backbone network, wherein the DAM mainly comprises three parts: the system comprises a cavity module, a channel attention submodule and a space attention submodule, and the implementation process comprises the following steps:

s323, the specific process of the space attention submodule implementation is as follows: taking a feature map output by a channel attention submodule as an input feature map of a space attention submodule, and firstly performing global maximum pooling and global average pooling based on a channel to obtain two H multiplied by W multiplied by 1 feature maps; then the two characteristic graphs are subjected to channel splicing; then, after 7 × 7 convolution operation, reducing the dimension to 1 channel number, namely H × W × 1; then generating a weight coefficient Ms through a sigmoid activation function; finally, performing element-wise multiplication operation on the weight coefficient Ms and the input feature diagram of the spatial attention submodule to obtain a finally generated feature;

s33, improving a detection network of YOLOv3, wherein the flow is as follows:

s34, training the amplified and labeled training set by using the improved YOLOv3, and specifically comprising the following steps: and (3) sending the amplified and labeled training set into an improved YOLOv3 for training, learning model parameters according to a gradient descent method by using the deviation between the prior frame and the labeled real frame in the training process, keeping the model parameters floating up and down at a certain value all the time when a loss function is converged, stopping training, and forming an optimal model by the network structure and the weight file at the moment.