CN116129111A - Power line semantic segmentation method for improving deep Labv3+ model - Google Patents
Power line semantic segmentation method for improving deep Labv3+ model Download PDFInfo
- Publication number
- CN116129111A CN116129111A CN202211658461.9A CN202211658461A CN116129111A CN 116129111 A CN116129111 A CN 116129111A CN 202211658461 A CN202211658461 A CN 202211658461A CN 116129111 A CN116129111 A CN 116129111A
- Authority
- CN
- China
- Prior art keywords
- convolution
- power line
- network
- feature map
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 71
- 230000011218 segmentation Effects 0.000 title claims abstract description 52
- 238000012549 training Methods 0.000 claims abstract description 45
- 238000012360 testing method Methods 0.000 claims abstract description 36
- 238000011176 pooling Methods 0.000 claims abstract description 19
- 230000008569 process Effects 0.000 claims abstract description 9
- 238000005070 sampling Methods 0.000 claims abstract description 6
- 230000006870 function Effects 0.000 claims description 19
- 230000000694 effects Effects 0.000 claims description 18
- 238000010586 diagram Methods 0.000 claims description 14
- 238000002474 experimental method Methods 0.000 claims description 14
- 239000011800 void material Substances 0.000 claims description 10
- 238000012795 verification Methods 0.000 claims description 9
- 238000011156 evaluation Methods 0.000 claims description 5
- 230000005284 excitation Effects 0.000 claims description 5
- 239000000284 extract Substances 0.000 claims description 5
- 230000014509 gene expression Effects 0.000 claims description 5
- 238000012545 processing Methods 0.000 claims description 5
- 238000010200 validation analysis Methods 0.000 claims description 5
- 230000004913 activation Effects 0.000 claims description 4
- 230000006835 compression Effects 0.000 claims description 4
- 238000007906 compression Methods 0.000 claims description 4
- 238000004519 manufacturing process Methods 0.000 claims description 4
- 239000011159 matrix material Substances 0.000 claims description 4
- 230000008859 change Effects 0.000 claims description 3
- 238000012935 Averaging Methods 0.000 claims description 2
- 230000005484 gravity Effects 0.000 claims 1
- 238000000605 extraction Methods 0.000 description 13
- 101100295091 Arabidopsis thaliana NUDT14 gene Proteins 0.000 description 5
- 238000007689 inspection Methods 0.000 description 5
- PCHJSUWPFVWCPO-UHFFFAOYSA-N gold Chemical compound [Au] PCHJSUWPFVWCPO-UHFFFAOYSA-N 0.000 description 4
- 239000010931 gold Substances 0.000 description 4
- 229910052737 gold Inorganic materials 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000005540 biological transmission Effects 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 230000010339 dilation Effects 0.000 description 2
- 238000003708 edge detection Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000002156 mixing Methods 0.000 description 2
- 230000003321 amplification Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000007499 fusion processing Methods 0.000 description 1
- 238000003709 image segmentation Methods 0.000 description 1
- 238000012423 maintenance Methods 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000003199 nucleic acid amplification method Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/26—Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Medical Informatics (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Molecular Biology (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a power line semantic segmentation method based on an improved deep Labv3+ model, which replaces an original deep Labv3+ main network Xattention with a lightweight PP-LCNet, effectively reduces the quantity of parameters and improves the prediction speed, adds a cavity convolution branch and cascade convolution in a cavity space pyramid pooling module, acquires multi-scale features with larger receptive fields so as to reduce the missing segmentation phenomenon, further changes the cavity convolution branch into a bottleneck structure so as to reduce the quantity of parameters, and fuses 3 layers of shallow layer features in a decoder again so as to recover the lost detail features and space information in the down sampling process. And finally, introducing a bottleneck attention module to reduce the phenomenon of power line mis-segmentation. And configuring an experimental environment, setting training parameters according to the characteristics of equipment performance and a power line, and finally training, verifying and testing the model by utilizing the divided data set.
Description
Technical Field
The invention belongs to the field of power line detection, and relates to a power line semantic segmentation method based on an improved deep Labv3+ model.
Background
The power transmission line inspection is an important component part for daily maintenance of the power grid, and has an important function for guaranteeing stable operation of the power system. Because the power transmission line erection environment is complex and changeable, the manual inspection efficiency is low, and the inspection requirement cannot be met due to dangers. With the rapid development of unmanned aerial vehicles and high-resolution camera technologies in recent years, intelligent power inspection based on unmanned aerial vehicles is widely applied. However, the unmanned aerial vehicle is easy to collide with the power line, wind and other accidents in the inspection process, and great potential safety hazards are brought to the stable operation of the power transmission line. The power line segmentation is a key technology for realizing the automatic obstacle avoidance of the unmanned aerial vehicle and guaranteeing the low-altitude flight safety of the unmanned aerial vehicle, so that the development of a power line segmentation algorithm with high precision and good instantaneity has very important significance.
The existing power line segmentation method can be divided into a conventional image processing method and a semantic segmentation method based on deep learning. The traditional method can be divided into an extraction algorithm based on an edge detection operator and an extraction algorithm based on joint characteristics. The former typically extracts the power line with a combination of edge detection operators and line detectors that introduce a priori knowledge. The latter extracts the power lines with a line detector in combination with global aids or context information. The traditional method has great influence on the extraction precision of the power line by priori knowledge and auxiliary objects, and is easy to generate false detection and omission under a complex background, so that the method is only suitable for some specific scenes.
The above information disclosed in the background section is only for enhancement of understanding of the background of the invention and therefore may contain information that does not form the prior art that is already known to a person of ordinary skill in the art.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides a power line semantic segmentation method based on an improved deep Labv3+ model, which has the characteristics of high precision and good instantaneity, and provides a reliable basis for realizing automatic obstacle avoidance of an unmanned aerial vehicle and guaranteeing the low-altitude flight safety of the unmanned aerial vehicle.
The invention aims at realizing the following technical scheme:
the power line semantic segmentation method based on the improved deep Labv3+ model comprises the following steps of:
step 1, replacing an original backbone network Xreception by a PP-LCNet network aiming at an encoder of a deep Labv3+ model;
step 4: the introduction of the bottleneck attention module reduces the power line misclassification.
Step 5, selecting a power line image from the disclosed power line data set to manufacture the data set, and dividing the data set into a training set, a verification set and a test set;
step 7, training and verifying the model by using the training set and the testing set in the data set;
and 8, testing the model by using a test set in the data set, and checking the segmentation effect.
In the method, in step 1, the PP-LCNet network replaces the standard convolution with the depth separable convolution, wherein the activation function ReLU in the depth separable convolution is replaced by H-Swish, the PP-LCNet network replaces the 3×3 convolution with the 5×5 convolution at the end of the network, and the compression and excitation network modules are added in the last two depth separable convolution blocks to weight the network channels.
In the method, in the step 2, a bottleneck cascade cavity space pyramid pooling module with the cavity rate combination of 6, 12 and 18 extracts multi-scale features by adding cavity convolution branches and cascade convolution.
In the method, in step 2, a hole convolution branch is added in the structure of a pyramid pooling module of the bottleneck cascade hole space, the hole rate combination is modified into 3, 6, 9 and 12, then 2 times of 3×3 convolution cascade operation is carried out on the middle 4 convolution branches, multi-scale characteristics are extracted, and when the hole rate of the hole convolution branch is r and the convolution kernel size is k, the receptive field size is:
R=(r-1)×(k-1)+k,
and the size of the receptive field when two layers of cavity convolution branches are cascaded is as follows:
R=R 1 +R 2 -1,
wherein R is 1 、R 2 The method comprises the steps of respectively providing receptive fields for two layers of cavity convolution, reducing the dimension of the middle 4 convolution branches to 64 channels by using 1X 1 convolution, then performing 3X 3 convolution processing for 2 times, and then utilizing 1X 1 convolution to increase the dimension to 256 channels.
In the method, in step 3, 3 layers of shallow layer feature maps with downsampling coefficients of 1/4, 1/8 and 1/16 are introduced into a PP-LCNet in a decoder, and the channel numbers of the 3 layers of shallow layer feature maps are respectively adjusted to 48, 32 and 16 by using 1X 1 convolution.
In the method, in step 4, the bottleneck attention module is composed of a parallel channel attention network and a space attention network, the input feature map F is processed by two parallel networks to obtain a feature map M (F), then the feature map M (F) and the feature map F are multiplied by each other to highlight important features, the obtained feature map and the feature map F are added to output an attention feature map F', and the expressions of the feature map M (F) and the feature map F are as follows:
M(F)=σ(M c (F)+M s (F)),
F′=F+F⊙M(F),
wherein sigma is a sigmoid function, and by multiplying corresponding elements of the matrix point by point, M c (F) And M s (F) The channel attention network and the spatial attention network output feature diagrams are respectively represented.
In the method, in step 4, in the channel attention network, features in each channel are first aggregated by global average pooling to generate a channel vector F c Then evaluate from F using a multi-layer perceptron c The channel attention profile is obtained via the fully connected layer.
In the method, in step 4, in the spatial attention network, the dimension of the feature map is compressed by 1×1 convolution, then the context information is acquired by 23×3 convolutions with the void ratio of 4, and finally the feature map of the spatial attention network is output after the dimension is further compressed by 1×1 convolution.
In the method, in step 5, 415 images and 437 Zhang Baohan images of the power line are selected from the disclosed power line data sets TTPLA and WireDataset respectively, the 852 images are expanded to 5000 images as data sets by means of rotation, left-right overturn, amplification, reduction, clipping and brightness change, 10% of the data sets are used as test sets, and the rest are as follows: the scale of 1 is divided into training and validation sets.
In the method, in step 6, the operating system used in the experiment is Windows 10, the processor is Intel (R) Xeon (R) Gold 6230, the graphics card is NVIDIA Tesla V100-PCIE-16GB, the RAM size is 320G, the network framework used in the experiment is pytorch, the python version is 3.7.11, and the initial learning rate is 5e. The experiment uses the average pixel precision and the average intersection ratio as evaluation criteria for the segmentation precision.
In the method, in step 7, the size of the training input image is 512×512 pixels, the Batchsize adopted in the training is 8, the iteration number is 100, two Loss functions of CE Loss and Dice Loss are combined to serve as the Loss functions of the training, and after each training is finished, the effect of the model is verified by using a verification set, so that the overfitting problem is prevented.
In the method, in step 8, a model is tested by using a test set part divided in a data set, the average pixel precision and the average cross-over ratio of the segmentation of the model are tested, and the average pixel precision, the average cross-over ratio and the actual segmentation effect of the power line image are compared with other test models.
Advantageous effects
The invention provides a power line semantic segmentation method based on an improved deep Labv3+ model, which effectively reduces the quantity of parameters and improves the prediction speed by replacing an original backbone network Xattention with a lightweight PP-LCNet; adding a cavity convolution branch and cascade convolution to the ASPP module to strengthen the extraction of the detail characteristics of the power line, reduce the missing segmentation phenomenon, and change the cavity convolution branch into a bottleneck structure to reduce the parameter number; introducing 3 layers of shallow layer features into a decoder part, and more fully utilizing different layers of features extracted by a main network to recover the lost detail features and spatial information in the down-sampling process; the bottleneck attention module can acquire more interesting channels and spatial feature information, effectively reduces interference of background and other factors, and is more beneficial to power line feature extraction.
Drawings
Various other advantages and benefits of the present invention will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. It is evident that the figures described below are only some embodiments of the invention, from which other figures can be obtained without inventive effort for a person skilled in the art. Also, like reference numerals are used to designate like parts throughout the figures.
In the drawings:
FIG. 1 is a flow chart of a power line semantic segmentation method based on an improved deep Labv3+ model according to one embodiment of the present invention;
fig. 2 is a schematic structural diagram of a deepcapv3+ model of the power line semantic segmentation method based on an improved deepcapv3+ model according to an embodiment of the present invention;
fig. 3 is a schematic diagram of PP-LCNet structure of a power line semantic segmentation method based on an improved deeplbv3+ model according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a BC-ASPP module based on the improved DeepLabv3+ model power line semantic segmentation method according to one embodiment of the present invention;
FIG. 5 is a bottleneck attention module architecture diagram of a power line semantic segmentation method based on an improved deep Labv3+ model according to an embodiment of the present invention;
FIG. 6 is a complete structural schematic diagram of a power line semantic segmentation method based on an improved deep Labv3+ model according to one embodiment of the present invention;
fig. 7 is an effect diagram of the power line image segmentation by the power line semantic segmentation method based on the modified deep labv3+ model according to an embodiment of the present invention.
The invention is further explained below with reference to the drawings and examples.
Detailed Description
Specific embodiments of the present invention will be described in more detail below with reference to fig. 1 to 7. While specific embodiments of the invention are shown in the drawings, it should be understood that the invention may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.
It should be noted that certain terms are used throughout the description and claims to refer to particular components. Those of skill in the art will understand that a person may refer to the same component by different names. The description and claims do not identify differences in terms of components, but rather differences in terms of the functionality of the components. As used throughout the specification and claims, the terms "include" and "comprise" are used in an open-ended fashion, and thus should be interpreted to mean "include, but not limited to. The description hereinafter sets forth a preferred embodiment for practicing the invention, but is not intended to limit the scope of the invention, as the description proceeds with reference to the general principles of the description. The scope of the invention is defined by the appended claims.
For the purpose of facilitating an understanding of the embodiments of the present invention, reference will now be made to the drawings, by way of example, and specific examples of which are illustrated in the accompanying drawings.
For better understanding, as shown in fig. 1 to 6, the power line semantic segmentation method based on the improved deelabv3+ model includes the following steps:
step 1: aiming at the problem of low prediction speed of the deep Labv3+ model, the original backbone network Xattention is replaced by a lightweight PP-LCNet in the encoder part, so that the prediction speed is improved, the parameter quantity is reduced, and the real-time performance of power line segmentation is further improved;
step 2: to enhance feature extraction of the elongated power lines, a cavitation space pyramid pooling (atrous spatial pyramid pooling, ASPP) module is improved, cavitation convolution branches and cascade convolution are added, and multi-scale features with larger receptive fields are obtained, so that the missing segmentation phenomenon is reduced. In addition, the bottleneck structure of firstly reducing the dimension and then increasing the dimension is further adopted for the cavity convolution branch to reduce the calculated amount;
step 3: in order to further improve the segmentation precision, 3 layers of shallow layer features are introduced into a decoder part, and different layers of features extracted by a backbone network are more fully utilized to recover the lost detail features and spatial information in the downsampling process;
step 4: aiming at the problem that the power line segmentation is easy to be interfered by background objects, bottleneck attention modules (bottleneck attention module, BAM) are respectively introduced into an encoder and a decoder to strengthen the extraction of the power line characteristics and reduce the error segmentation of the background objects.
Step 5: 852 pictures are selected based on the power line data set and expanded to 5000 pictures to serve as the data set, 10% of the data set is served as the test set, and the rest is as follows 9: the scale of 1 is divided into training and validation sets.
Step 6: the operating system used in the experiment is Windows 10, the processor is Intel (R) Xeon (R) Gold 6230, the display card is NVIDIA Tesla V100-PCIE-16GB, the RAM size is 320G, the network frame used in the experiment is pytorch, the python version is 3.7.11, and the initial learning rate is 5e -4 . The experiment uses the average pixel precision and the average intersection ratio as the evaluation standard of the segmentation precision.
Step 7: the size of the training input image is 512 multiplied by 512 pixels, the Batchsize adopted in the training is 8, the training iteration number is 100, two Loss functions of CE Loss and Dice Loss are combined to be used as the Loss functions of the training, and after each training, the effect of the model is verified by using a verification set, so that the problem of over fitting is prevented.
Step 8: and testing the model by utilizing a test set part divided in the data set, testing the average pixel precision and the average blending ratio of the segmentation of the model, and comparing the average pixel precision, the average blending ratio and the actual segmentation effect of the power line image with other models.
In one embodiment, the method includes,
step 1, replacing an original backbone network Xreception by a PP-LCNet network aiming at an encoder of a deep Labv3+ model;
step 4: the introduction of the bottleneck attention module reduces the power line misclassification.
Step 5, selecting a power line image from the disclosed power line data set to manufacture the data set, and dividing the data set into a training set, a verification set and a test set;
step 7, training and verifying the model by using the training set and the testing set in the data set;
and 8, testing the model by using a test set in the data set, and checking the segmentation effect.
In a preferred embodiment of the method, in step 1, the PP-LCNet network replaces the standard convolution with a depth separable convolution, wherein the activation function ReLU in the depth separable convolution is replaced with H-Swish, the PP-LCNet network replaces the 3 x 3 convolution with a 5 x 5 convolution at the end of the network, and the last two depth separable convolutions are weighted by adding compression and excitation network modules.
In the preferred embodiment of the method, in step 2, the bottleneck cascade void space pyramid pooling module with the void ratio combination of 6, 12 and 18 extracts multi-scale features by adding a void convolution branch and cascade convolution.
In the preferred embodiment of the method, in step 2, a hole convolution branch is added to the structure of the bottleneck cascade hole space pyramid pooling module, the hole rate combination is modified to 3, 6, 9 and 12, then 3×3 convolution cascade operation is performed on the middle 4 convolution branches for 2 times, multi-scale characteristics are extracted, and when the hole rate of the hole convolution branch is r and the convolution kernel size is k, the receptive field size is:
R=(r-1)×(k-1)+k,
and the size of the receptive field when two layers of cavity convolution branches are cascaded is as follows:
R=R 1 +R 2 -1,
wherein R is 1 、R 2 The method comprises the steps of respectively providing receptive fields for two layers of cavity convolution, reducing the dimension of the middle 4 convolution branches to 64 channels by using 1X 1 convolution, then performing 3X 3 convolution processing for 2 times, and then utilizing 1X 1 convolution to increase the dimension to 256 channels.
In a preferred embodiment of the method, in step 3, 3 layer shallow feature maps with downsampling coefficients of 1/4, 1/8 and 1/16 are introduced into the PP-LCNet, and the channel numbers of the 3 layer shallow feature maps are respectively adjusted to 48, 32 and 16 by 1×1 convolution.
In the preferred embodiment of the method, in step 4, the bottleneck attention module is composed of a parallel channel attention network and a space attention network, the input feature map F is processed by two parallel networks to obtain a feature map M (F), then the feature map M (F) and the feature map F are multiplied by each other to highlight important features, and the obtained feature map and the feature map F are added to output an attention feature map F', and the expressions of the feature map M (F) and the feature map F are as follows:
M(F)=σ(M c (F)+M s (F)),
F′=F+F⊙M(F),
wherein sigma is a sigmoid function, and by multiplying corresponding elements of the matrix point by point, M c (F) And M s (F) The channel attention network and the spatial attention network output feature diagrams are respectively represented.
In a preferred embodiment of the method, in step 4, in the channel attention network, features in each channel are first aggregated with global averaging pooling to generate a channel vector F c Then evaluate from F using a multi-layer perceptron c The channel attention profile is obtained via the fully connected layer.
In the preferred embodiment of the method, in step 4, in the spatial attention network, the dimension of the feature map is compressed by 1×1 convolution, then the context information is obtained by 23×3 convolutions with the void ratio of 4, and finally the feature map of the spatial attention network is output after the dimension is further compressed by 1×1 convolution.
In the preferred embodiment of the method, in step 5, 415 and 437 Zhang Baohan images of the power line are selected from the disclosed power line data sets TTPLA and WireDataset respectively, and the 852 images are expanded to 5000 images as data sets by means of rotation, left-right turning, zooming in, zooming out, cropping and brightness changing, 10% of the data sets are used as test sets, and the rest are as follows: the scale of 1 is divided into training and validation sets.
In the preferred embodiment of the method, in step 6, the operating system used in the experiment is Windows 10, the processor is Intel (R) Xeon (R) Gold 6230, the graphics card is NVIDIA Tesla Vi00-PCIE-16GB, the RAM size is 320G, the network frame used in the experiment is pytorch, the python version is 3.7.11, and the initial learning rate is 5e. The experiment uses the average pixel precision and the average intersection ratio as evaluation criteria for the segmentation precision.
In the preferred implementation manner of the method, in step 7, the size of the training input image is 512×512 pixels, the batch size adopted in training is 8, the training iteration number is 100, two Loss functions of CE Loss and Dice Loss are combined to serve as the Loss functions of training, and after each training, the effect of the model is verified by using a verification set, so that the problem of fitting is prevented.
In the preferred embodiment of the method, in step 8, the model is tested by using a test set part divided in the data set, the average pixel precision and the average intersection ratio of the segmentation of the test model are 91.9% and 82.07%, respectively, and the average pixel precision, the average intersection ratio and the actual segmentation effect of the power line image are compared with other models.
In one embodiment, the method comprises the steps of:
step 1: the lightweight PP-LCNet is used for replacing the original deep Labv3+ backbone network Xreception, so that the parameter quantity is reduced and the prediction speed is improved;
the PP-LCNet structure is shown in figure 3, the core part of the PP-LCNet is to replace standard convolution by depth separable convolution, and the operations such as short circuit connection and the like are avoided, so that the parameter number is effectively reduced, and the running speed is increased. And the activation function ReLU in the depth separable convolution block is replaced by H-Swish in the PP-LCNet, so that the performance is further improved.
Under the condition of balanced speed and precision, the PP-LCNet replaces 3X 3 convolution with 5X 5 convolution at the tail of the network, so that the characteristic extraction capability of the network is effectively improved. And a compression and excitation network (squeeze and excitation Network, SENet) module is added in the last two depth separable convolution blocks to weight the network channels, so that the feature extraction of important information is enhanced, and the performance of the network is improved under the condition of not affecting the speed.
Step 2: adding a cavity convolution branch and cascade convolution in the ASPP module to obtain multi-scale characteristics so as to reduce the missing segmentation phenomenon;
hole convolution: hole convolution is also called dilation convolution or dilation convolution, which is simply a process of adding some spaces (zeros) between elements of a convolution kernel to enlarge the convolution kernel.
Cascaded convolution: concatenation is herein a concatenation of 1×1 convolutions with 1×3 convolutions in the original hole space pyramid pooling changed to 23×3 convolutions and 2 bottleneck structures.
The deep labv3+ model utilizes ASPP modules with a void fraction combination of 6, 12, 18 to extract multi-scale features. The larger the void ratio is, the larger receptive field can be obtained, but the problems of insufficient extraction of detail characteristics, poor extraction effect of small targets and the like can be brought. In order to improve the ASPP module, a bottleneck cascade cavity space pyramid pooling (bottleneck cascade atrous spatial pyramid pooling, BC-ASPP) module is provided, and the extraction of the detail features of the power line is enhanced by adding cavity convolution branches and cascade cavity convolution, so that the missing segmentation phenomenon is reduced.
The structure of the BC-ASPP module is shown in fig. 4, a hole convolution branch is added first, and the hole rate combination is modified to be 3, 6, 9 and 12, so that the extraction of small targets and detail features is facilitated. And then, carrying out 3 multiplied by 3 convolution cascade operation for 2 times on the middle 4 convolution branches, extracting multi-scale features with larger receptive fields, and effectively reducing the problem of missing segmentation. When the cavity rate of the cavity convolution is r and the convolution kernel size is k, the receptive field size is:
R=(r-1)×(k-1)+k (1)
and the size of the receptive field when two layers of cavities are convolved and cascaded is as follows:
R=R 1 +R 2 -1 (2)
wherein R is 1 、R 2 The receptive field provided for the two-layer cavity convolution respectively can be increased by nearly 1 time when the two-layer cavity convolution is cascaded. Finally, parameters of the BC-ASPP module are reduced, the middle 4 convolution branches are firstly reduced to 64 channels by using 1X 1 convolution, then 2 times of 3X 3 convolution processing is carried out, and then the 1X 1 convolution is increased to 256 channels, so that the bottleneck structure is effectively reducedThe number of parameters is increased.
Step 3: fusing 3 layers of shallow layer features in a decoder to recover the detail features and the spatial information lost in the down-sampling process;
detail characteristics: the detail features of the image are mainly divided into color features, character features, texture features and spatial relationship features.
Spatial information: the spatial position information can be generally classified into relative spatial position information and absolute spatial position information. The former relationship emphasizes the relative situation between the targets, such as the up-down-left-right relationship, etc., and the latter relationship emphasizes the distance magnitude and orientation between the targets.
Partial shallow feature maps with downsampling coefficients of 1/4, 1/8 and 1/16 of the size are introduced in the decoder section in PP-LCNet. And the 3 layers of shallow feature maps are respectively adjusted to 48, 32 and 16 by using 1X 1 convolution, so that excessive shallow semantic information is prevented from affecting the expression of deep semantic information output by the encoder. The multi-scale fusion process is only added with 2 times of 1×1 convolution operation, and a few parameter amounts are added compared with the characteristic fusion mode of the original deep Labv3+ network, but the segmentation accuracy is effectively improved.
Step 4: the bottleneck attention module is introduced to reduce the phenomenon of power line mis-segmentation;
the bottleneck attention module is a mixed attention mechanism network, and is composed of a parallel channel attention network and a spatial attention network, and the structure diagram of the bottleneck attention module is shown in fig. 5. The input feature map F is processed by two parallel networks to obtain a feature map M (F). M (F) is then multiplied by F point by point to highlight important features, and the resulting feature map is added to F to output an attention feature map F'. The expressions of the feature maps M (F) and F are as follows:
M(F)=σ(M c (F)+M s (F)) (3)
F′=F+F⊙M(F) (4)
wherein sigma is a sigmoid function, and by multiplying corresponding elements of the matrix point by point, M C (F) And M s (F) The channel attention network and the spatial attention network output feature diagrams are respectively represented.
Attention on the channelIn the network, the features in each channel are first aggregated by global average pooling to generate a channel vector F c . Then evaluate from F using a multi-layer perceptron c The channel attention profile is obtained via the fully connected layer. In a spatial attention network, feature map dimensions are first compressed with a 1×1 convolution, and then context information is obtained with 23×3 convolutions with a 4-hole rate. Finally, the space attention network characteristic diagram is output after the dimension is further compressed by 1X 1 convolution.
And a bottleneck attention module is respectively introduced into the encoder and the decoder, so that higher attention is given to the power line characteristics, and the error segmentation of background objects is reduced.
Step 5: selecting a power line image from the disclosed power line data set to perform data set manufacturing, and dividing the data set into a training set, a verification set and a test set;
the method comprises the steps of selecting 415 images of a 437 power line and 437 Zhang Baohan images from a disclosed power line data set TTPLA and a wire dataset respectively, expanding 852 images to 5000 images as a data set in a manner of rotating, turning left and right, amplifying, shrinking, cutting and changing brightness, taking 10% of the data set as a test set, and taking the rest of the data set as 9: the scale of 1 is divided into training and validation sets.
Step 6: configuring an experimental environment, and setting training parameters according to the equipment performance and the characteristics of a power line;
the operating system used in the experiment is Windows 10, the processor is Intel (R) Xeon (R) Gold 6230, the display card is NVIDIA Tesla V100-PCIE-16GB, the RAM size is 320G, the network framework used in the experiment is pytorch, the python version is 3.7.11, and the initial learning rate is 5e. The experiment uses the average pixel precision and the average intersection ratio as evaluation criteria for the segmentation precision.
Step 7: training and verifying the model by using a training set and a testing set in the data set;
the size of the training input image is 512 multiplied by 512 pixels, the Batchsize adopted in the training is 8, the iteration number is 100, two Loss functions of CE Loss and Dice Loss are combined to be used as the Loss functions of the training, and after each training, the effect of the model is verified by a verification set, so that the problem of overfitting is prevented.
Step 8: and testing the model by using a test set in the data set, and testing the segmentation effect.
And testing the model by utilizing a test set part divided in the data set, wherein the average pixel precision and the average intersection ratio of the segmentation of the test model are 91.9% and 82.07%, and comparing the average pixel precision, the average intersection ratio and the actual segmentation effect of the power line image with other models.
Although the embodiments of the present invention have been described above with reference to the accompanying drawings, the present invention is not limited to the above-described specific embodiments and application fields, and the above-described specific embodiments are merely illustrative, and not restrictive. Those skilled in the art, having the benefit of this disclosure, may effect numerous forms of the invention without departing from the scope of the invention as claimed.
Claims (12)
1. A power line semantic segmentation method based on an improved deeplbv3+ model, which is characterized by comprising the following steps of:
step 1, replacing an original backbone network Xreception by a PP-LCNet network aiming at an encoder of a deep Labv3+ model;
step 2, adding a cavity convolution branch and cascade convolution by a cavity space pyramid pooling module to obtain multi-scale features;
step 3, introducing 3-layer shallow layer characteristics into a decoder to recover the detail characteristics and the space information lost in the down-sampling process;
step 4: the bottleneck attention module is introduced to reduce the power line error segmentation;
step 5, selecting a power line image based on the power line data set to manufacture the data set, and dividing the data set into a training set, a verification set and a test set;
step 6, configuring an experimental environment, and setting training parameters according to the equipment performance and the characteristics of the power line;
step 7, training and verifying the model by using the training set and the testing set in the data set;
and 8, testing the model by using a test set in the data set, and checking the segmentation effect.
2. The method according to claim 1, characterized in that preferably in step 1 the PP-LCNet network replaces the standard convolution with a depth separable convolution, wherein the activation function ReLU in the depth separable convolution is replaced by H-Swish, the PP-LCNet network replaces the 3 x 3 convolution with a 5 x 5 convolution at the end of the network, and the network channels are weighted by adding compression and excitation network modules in the last two depth separable convolution blocks.
3. The method of claim 1, wherein in step 2, the bottleneck cascade void space pyramid pooling module with a void fraction combination of 6, 12, 18 extracts multi-scale features by adding a void convolution branch and cascade convolution.
4. A method according to claim 3, wherein in step 2, a hole convolution branch is added to the structure of the bottleneck cascade hole space pyramid pooling module, and the hole ratios are modified to 3, 6, 9 and 12, then 2 times of 3×3 convolution cascade operations are performed on the middle 4 convolution branches, a multi-scale feature is extracted, and when the hole ratio of the hole convolution branch is r and the convolution kernel size is k, the receptive field size is:
R=(r-1)×(k-1)+k,
and the size of the receptive field when two layers of cavity convolution branches are cascaded is as follows:
R=R 1 +R 2 -1,
wherein R is 1 、R 2 The method comprises the steps of respectively providing receptive fields for two layers of cavity convolution, reducing the dimension of the middle 4 convolution branches to 64 channels by using 1X 1 convolution, then performing 3X 3 convolution processing for 2 times, and then utilizing 1X 1 convolution to increase the dimension to 256 channels.
5. The method as claimed in claim 4, wherein in step 3, 3-layer shallow feature maps with downsampling coefficients of 1/4, 1/8 and 1/16 are introduced into the PP-LCNet, and the number of channels is adjusted to 48, 32 and 16 by 1 x 1 convolution, respectively.
6. The method according to claim 1, wherein in step 4, the bottleneck attention module is composed of a parallel channel attention network and a spatial attention network, the input feature map F is processed by two parallel networks to obtain a feature map M (F), the feature map M (F) is multiplied by a point to highlight important features, and the obtained feature map is added to the feature map F to output an attention feature map F', and expressions of the feature map M (F) and the feature map F are as follows:
M(F)=σ(M c (F)+M s (F)),
F′=F+F⊙M(F),
wherein F is an input feature map, the feature map M (F) is obtained by processing the input feature map M (F) through two parallel networks respectively, sigma is a sigmoid function, M c (F) And M s (F) The method is characterized in that the method respectively represents a channel attention network and a space attention network output characteristic diagram, wherein the gravity is the point-by-point multiplication of matrix corresponding elements, and F' is the characteristic diagram obtained by multiplying M (F) and F by the point-by-point multiplication and the F are added to output an attention characteristic diagram.
7. The method as claimed in claim 6, wherein in step 4, in the channel attention network, features in each channel are first aggregated with global averaging pooling to generate a channel vector F c Then evaluate from F using a multi-layer perceptron c The channel attention profile is obtained via the fully connected layer.
8. The method as claimed in claim 6, wherein in step 4, in the spatial attention network, the feature map dimension is compressed by 1×1 convolution first, then the context information is acquired by 23×3 convolutions with a hole rate of 4, and finally the spatial attention network feature map is output after the dimension is further compressed by 1×1 convolution.
9. The method according to claim 1, wherein in step 5, 415 and 437 Zhang Baohan images of the power line are selected based on the power line data sets TTPLA and WireDataset, respectively, and the 852 images are expanded to 5000 images as the data set by means of rotation, left-right turning, zooming in, zooming out, cropping and brightness change, 10% of the data set is used as the test set, and the rest is as follows 9: the scale of 1 is divided into training and validation sets.
10. The method of claim 1, wherein in step 6, the network framework used for the experiment is pytorch, and the initial learning rate is 5e -4 The m experiment uses the average pixel precision MPA and the average intersection ratio MIoU as evaluation criteria for the segmentation precision.
11. The method according to claim 1, wherein in step 7, the size of the training input image is 512×512 pixels, the batch size used in the training is 8, the iteration number is 100, two Loss functions of CE Loss and Dice Loss are combined as the Loss functions of the training, and the effect of the model is verified by using a verification set after each training is finished, so as to prevent the occurrence of the overfitting problem.
12. The method according to claim 1, wherein in step 8, the model is tested by using a test set part divided in the data set, the average pixel precision MPA and the average intersection ratio MIoU of the test model segmentation are compared, and the actual segmentation effect of the average pixel precision MPA, the average intersection ratio MIoU and the power line image is compared with other test models.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211658461.9A CN116129111A (en) | 2022-12-22 | 2022-12-22 | Power line semantic segmentation method for improving deep Labv3+ model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211658461.9A CN116129111A (en) | 2022-12-22 | 2022-12-22 | Power line semantic segmentation method for improving deep Labv3+ model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116129111A true CN116129111A (en) | 2023-05-16 |
Family
ID=86300054
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211658461.9A Pending CN116129111A (en) | 2022-12-22 | 2022-12-22 | Power line semantic segmentation method for improving deep Labv3+ model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116129111A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116343070A (en) * | 2023-05-22 | 2023-06-27 | 武汉天地鹰测绘科技有限公司 | Intelligent interpretation method for aerial survey image ground object elements |
CN117237644A (en) * | 2023-11-10 | 2023-12-15 | 广东工业大学 | Forest residual fire detection method and system based on infrared small target detection |
-
2022
- 2022-12-22 CN CN202211658461.9A patent/CN116129111A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116343070A (en) * | 2023-05-22 | 2023-06-27 | 武汉天地鹰测绘科技有限公司 | Intelligent interpretation method for aerial survey image ground object elements |
CN116343070B (en) * | 2023-05-22 | 2023-10-13 | 武汉天地鹰测绘科技有限公司 | Intelligent interpretation method for aerial survey image ground object elements |
CN117237644A (en) * | 2023-11-10 | 2023-12-15 | 广东工业大学 | Forest residual fire detection method and system based on infrared small target detection |
CN117237644B (en) * | 2023-11-10 | 2024-02-13 | 广东工业大学 | Forest residual fire detection method and system based on infrared small target detection |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112541503B (en) | Real-time semantic segmentation method based on context attention mechanism and information fusion | |
CN113159051B (en) | Remote sensing image lightweight semantic segmentation method based on edge decoupling | |
CN111612807B (en) | Small target image segmentation method based on scale and edge information | |
CN116129111A (en) | Power line semantic segmentation method for improving deep Labv3+ model | |
CN113642390B (en) | Street view image semantic segmentation method based on local attention network | |
CN111738111A (en) | Road extraction method of high-resolution remote sensing image based on multi-branch cascade void space pyramid | |
CN112233129B (en) | Deep learning-based parallel multi-scale attention mechanism semantic segmentation method and device | |
CN111860683B (en) | Target detection method based on feature fusion | |
CN111476133B (en) | Unmanned driving-oriented foreground and background codec network target extraction method | |
CN115457498A (en) | Urban road semantic segmentation method based on double attention and dense connection | |
CN113850324B (en) | Multispectral target detection method based on Yolov4 | |
CN113780132A (en) | Lane line detection method based on convolutional neural network | |
CN113139551A (en) | Improved semantic segmentation method based on deep Labv3+ | |
CN113298817A (en) | High-accuracy semantic segmentation method for remote sensing image | |
CN114419612A (en) | Image super-resolution reconstruction method and device for scenic spot license plate recognition | |
CN115861756A (en) | Earth background small target identification method based on cascade combination network | |
CN114170581B (en) | Anchor-Free traffic sign detection method based on depth supervision | |
CN116579992A (en) | Small target bolt defect detection method for unmanned aerial vehicle inspection | |
CN117197462A (en) | Lightweight foundation cloud segmentation method and system based on multi-scale feature fusion and alignment | |
CN116993987A (en) | Image semantic segmentation method and system based on lightweight neural network model | |
CN111612803A (en) | Vehicle image semantic segmentation method based on image definition | |
CN116310757A (en) | Multitasking real-time smoke detection method | |
CN114067175B (en) | Hyperspectral image small sample classification method and device based on channel feature fusion | |
CN114998101A (en) | Satellite image super-resolution method based on deep learning | |
CN113902744A (en) | Image detection method, system, equipment and storage medium based on lightweight network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |