CN115376024A - Semantic segmentation method for power accessory of power transmission line - Google Patents

Semantic segmentation method for power accessory of power transmission line Download PDF

Info

Publication number
CN115376024A
CN115376024A CN202210921552.0A CN202210921552A CN115376024A CN 115376024 A CN115376024 A CN 115376024A CN 202210921552 A CN202210921552 A CN 202210921552A CN 115376024 A CN115376024 A CN 115376024A
Authority
CN
China
Prior art keywords
visible light
infrared
module
feature
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210921552.0A
Other languages
Chinese (zh)
Inventor
成云朋
张庆富
王鑫
冯兴明
王永
张济韬
李建华
张学波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yancheng Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Original Assignee
Yancheng Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yancheng Power Supply Co of State Grid Jiangsu Electric Power Co Ltd filed Critical Yancheng Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Priority to CN202210921552.0A priority Critical patent/CN115376024A/en
Publication of CN115376024A publication Critical patent/CN115376024A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • G06V20/17Terrestrial scenes taken from planes or by drones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Remote Sensing (AREA)

Abstract

The invention provides a semantic segmentation method for power accessories of a power transmission line, which comprises the following steps: (1) Using an unmanned aerial vehicle to carry an infrared thermal imager and a visible light camera to surround the power tower, and shooting an infrared visible light image pair of the power accessory to obtain an infrared visible light original sample set; (2) Processing the original sample set by adopting an image registration module based on edge intensity to generate an infrared visible light image registration result with high matching degree; (3) Obtaining a sample set of the infrared and visible light image power accessories; (4) Performing final semantic segmentation on the marked infrared and visible light images; (5) And (3) randomly dividing the infrared and visible light image sample set group in the step (3) into a training set and a testing set according to the proportion of 4. The invention provides a semantic segmentation method for power accessories of a power transmission line, which can accurately identify power accessories such as a power tower, an insulator, a strain clamp, a vibration damper and the like in the power transmission line.

Description

Semantic segmentation method for power accessory of power transmission line
Technical Field
The invention belongs to the technical field of semantic segmentation of power accessories, and particularly relates to a semantic segmentation method for power accessories of a power transmission line based on cooperation of infrared and visible light images.
Background
The electric power accessory identification is used as an important basis for realizing functions of scene understanding of the electric transmission line, autonomous flight of the inspection unmanned aerial vehicle, intelligent detection of equipment defects and the like, and has important research and practical significance. The key step of power accessory identification is to distinguish the power accessories from the background and finish the classification of the power accessories by combining the extracted semantic information of the image and the distinguishing result of the power accessories. The semantic segmentation method can give semantic category labels to all pixels of the image, and when the semantic segmentation method based on deep learning is trained by using the image of the power accessory of the power transmission line, the positions of different power accessories can be positioned in the image and the type of equipment can be identified, so that a fault diagnosis scheme is determined according to the type of the power accessory, and the intelligent degree of unmanned aerial vehicle routing inspection is improved to a certain extent.
However, the semantic segmentation method only relying on the visible light image is easily interfered by rapid motion of the unmanned aerial vehicle and camera shake, meanwhile, visible light imaging can be affected by external environments such as illumination and weather, and the performance of the semantic segmentation algorithm can be further reduced due to the reduction of the quality of the visible light image.
In order to solve the above problems, patent CN114612659A discloses a power accessory segmentation method and system based on fusion mode contrast learning, in which a fusion mode feature encoder is adopted to respectively extract features of infrared and visible light image pairs, and the features are fused and fed back to a feature extraction network to supervise specific mode features, and finally, a fusion mode result is sent to a decoder and a contrast learning module to obtain a final fusion result. However, the infrared and visible light image pairs are utilized to cause mismatching of pixel points of the infrared and visible light images due to fine difference of registration, and further the segmentation precision is influenced; moreover, the infrared and visible light images are fused equally, so that specific attributes of the modes are ignored, the mode fusion efficiency is reduced, and the segmentation effect is influenced.
Disclosure of Invention
In order to solve the problems, the invention provides a semantic segmentation method for power transmission line power accessories with infrared and visible light images in a coordinated mode.
The invention particularly relates to a semantic segmentation method for an electric power accessory of an electric transmission line by cooperation of infrared and visible light images, which comprises the following steps:
step (1): an unmanned aerial vehicle is used for carrying an infrared thermal imager and a visible light camera to surround an electric power tower, and an infrared visible light image pair of an electric power accessory is shot to obtain an infrared visible light original sample set containing a transverse insulator, a vertical insulator, an insulating wire clamp, a vibration damper and a tower target;
step (2): processing the infrared visible light original sample set by adopting an image registration module based on edge intensity to generate an infrared visible light image registration result with high matching degree;
and (3): sending the registered infrared and visible light image to a sample set construction module to construct an infrared and visible light power accessory sample set, manually calibrating all visible lights and corresponding infrared images by using a labelme marking tool aiming at the five targets in the step (1), and obtaining an infrared and visible light image power accessory sample set by taking the rest unmarked areas as backgrounds;
and (4): inputting the marked infrared and visible light images into a feature fusion module for multi-level modal information feature extraction and weight adaptive learning so as to fully extract infrared and visible light data features with specificity, fully fusing the infrared and visible light data features by utilizing the complementarity of the infrared and visible light images, sending the fused result into a high-level feature activation module to enhance the differential features and eliminate the influence of interference noise, and finally sending the global feature map into a multi-level decoding module to obtain a final semantic segmentation result;
and (5): and (4) randomly dividing the infrared and visible light image sample set group in the step (3) into a training set and a testing set according to the proportion of 4.
In the step (2), the image registration module based on the edge intensity comprises an edge intensity detection module, a feature selection module and a descriptor matching module, wherein the edge intensity detection module comprises the following steps:
(1) Sending an original infrared visible light image pair into the image processing system, extracting the edge characteristics of the infrared image, and recording that the infrared image A is corroded by a structural element B as follows:
Figure BDA0003777892860000021
wherein B is z Is a translation of the vector z to B, in particular
Figure BDA0003777892860000022
E is the corroded image; similarly, the infrared image a is expanded by the structuring element B as:
Figure BDA0003777892860000023
wherein B is s The process can be regarded as that the structuring element B rotates 180 degrees around the origin and then corrodes the infrared image A;
(2) According to (1), an inner edge D of the infrared image can be obtained 1 = A- (A! B) and outer edge
Figure BDA0003777892860000031
Thereby obtaining basic gradient edge of infrared image
Figure BDA0003777892860000032
The feature selection module performs feature selection based on an ORB (ordered FAST and linked BRIEF) algorithm, and comprises the following steps:
(1) Taking a pixel block p with the size of S multiplied by S, and obtaining a key point q = (x, y) on the pixel block through smoothing processing T And defining a binary decision criterion Γ:
Figure BDA0003777892860000033
wherein r = (x ', y') T For a selected threshold keypoint, p (r) is the grayscale value at r;
(2) At the key point qSelecting n pairs of position coordinates (x) in the neighborhood i ,y i ) And comparing the two-value judgment standard gamma to obtain an n-dimensional feature descriptor f n (p):
Figure BDA0003777892860000034
(3) Using a 2 Xn order matrix M as a rotation matrix R θ Operate to obtain a new matrix M θ Comprises the following steps:
Figure BDA0003777892860000035
wherein
Figure BDA0003777892860000036
Rotation matrix R corresponding to key point q principal direction theta θ Is shown as
Figure BDA0003777892860000037
(4) From (3), the matrix M becomes a directed form, that is, the feature descriptor has rotation invariance, and the binary descriptor is:
g n (p,θ)=f n (p)|(x i ,y i )∈S θ
the descriptor matching module comprises the following steps:
(1) Taking the Hamming distance as the similarity measurement of two points, and roughly matching the feature points by adopting a violent matching method;
(2) Solving a transformation matrix by a progressive consistent sampling method;
(3) Calculating a final transformation matrix H in the high-precision interior point set by using a RANSAC algorithm, and transforming the images according to the transformation matrix H to calculate a common region between the two images;
(4) Affine transformation is adopted as a solution model of registration, and the mathematical expression of the solution model is as follows:
Figure BDA0003777892860000041
wherein (x) 1 ,y 1 ) Represents the original coordinates (x) 2 ,y 2 ) Representing projectively transformed coordinates, H 1 Representing an affine transformation matrix, (a) 02 ,a 12 ) Indicating an offset.
The step (4) comprises a sample set construction module, a multi-level modal information feature extraction module, a weight self-adaptive learning feature fusion module, a high-level feature activation module, a multi-level decoding module and a network training module, and comprises the following steps:
(1) Respectively inputting the marked infrared image and visible light image into two parallel encoders in corresponding modes, wherein the two parallel encoders use ResNet50 as a main network, the lower branch encoder changes the number of input channels of the first convolutional layer into 1 to match with the corresponding infrared gray image, and the upper branch is used for extracting the corresponding visible light image characteristics;
(2) From the second layer, sending the infrared image into a feature fusion module of weight adaptive learning positioned in front of each convolution layer of the visible light encoder, and taking a fusion result as the input of the next convolution layer of the visible light encoder to realize the auxiliary supervision effect of the infrared image on the visible light image;
(3) Sending the final fusion result of the step (2) to the high-level feature activation module to enhance the differential features and eliminate the influence of interference noise, and obtaining a global feature map;
(4) And sending the global feature map into the multilevel decoding module to obtain a final semantic segmentation result.
The weight adaptive learning feature fusion module comprises the following steps:
(1) Recording the characteristic of the input visible light image as R i-1 The infrared image is characterized by T i-1 A 1 to R i-1 And T i-1 After the feature connection, a 3X 3 convolutional layer is sent to reduce the number of channels to 1/4 of the original number, denoted as P i
(2) Processing P with four largest pooling layers of sizes 1, 5, 9, 13, respectively i Obtaining four kinds of different feature maps with different target sizes, sending them into two convolution layers containing a 3X 3 convolution kernel and a ReLU activation function, and obtaining 4 feature maps v j ,j=1,2,3,4;
(3) Performing up-sampling operation on the result of (2) until the result is compared with P i Keep consistent to construct a residual structure, P i Is reduced to v after 2 and after upsampling j Performing summation operation at a pixel level to perform fusion to obtain a fusion graph U;
(4) Inputting the result in the step (3) into two convolution layers of 3 multiplied by 3, and then activating a softmax function to obtain a dual-channel mapping W i
(5) Separating the result in (4) into two weight maps W i RGB And W i T And are respectively connected with the input R i-1 And T i-1 Multiplying at pixel level to obtain input for next layer visible encoder convolution layer
Figure BDA0003777892860000042
(6) In each intermediate stage of the feature extraction encoder, a weight self-adaptive learning feature fusion module is used for fusing infrared image features into visible light image features, the results are sent to the next multi-level modal information feature extraction module for feature extraction, and the steps (1) to (5) are iterated continuously until the final fusion result R is obtained 5
The high-level feature activation module comprises the following steps:
(1) R is to be 5 Divided into two mutually independent characterizers M avg And M max And separately combine M avg And M max Sending into an adaptive average pooling layer and an adaptive maximum pooling layer to obtain an output result with a size of 1 × 1
M avg =AvgPool 1 (R 5 ),M max =MaxPool 1 (R 5 );
(2) And (3) sending the two descriptors in the step (1) to a shared block, wherein the shared block is realized by two convolutional layers and a ReLU function, the sizes of the two convolutional layers are both 1 multiplied by 1, the first convolutional layer reduces the number of channels to half of the number of the original channels, and after the ReLU function, the second convolutional layer restores the channels. Then, element summation, sigmoid function and element multiplication are utilized to generate a channel attention feature map:
Figure BDA0003777892860000051
(3) The result F of (2) c And sending the data to an empty space pyramid pool (ASPP) module to expand the characteristic receptive field so as to capture multi-scale information, connecting the up-sampling result to obtain a self-adaptive global characteristic diagram G, and exporting the number of channels by using a 1 multiplied by 1 convolutional layer.
The multi-level decoding module comprises the following steps:
(1) Taking the first decoder as an example, the global feature map G, the corresponding output feature C of the visible light image encoder, and the output O of the previous decoding module are sent to the decoder, and the global feature map G is used instead because the first decoder does not have the output of the previous decoding module;
(2) In this case C is R 4 And the rest of the modules are respectively R 3 And R 2 Sending the output O of the previous decoding module and the output characteristic C of the visible light image encoder into an adaptive spatial channel attention module to strengthen the modal specific characteristic;
(3) Upsampling the result in the step (2) until the resolution is consistent with the characteristic C, and reducing the number of channels mapped by the three characteristics to 1/4 of the number of channels of the characteristic graph O;
(4) Carrying out pixel-level summation operation on the three results in the step (3), and reconstructing a summed feature map by using a convolution kernel of 3 multiplied by 3;
(5) And repeating the steps until the last decoder sends the last decoder into a 1 multiplied by 1 convolution kernel, and then performing up-sampling operation to change the number of channels into the number of segmentation target categories, thereby obtaining a final semantic segmentation result Y.
In the step (5), the marked infrared and visible light images are combinedThe ratio of 4 label ={Y label (i) I =1, 2.. Multidot.m }, and takes a cross entropy loss function as supervision to update network parameters to obtain better segmentation effect.
Compared with the prior art, the beneficial effects are: according to the semantic segmentation method for the power accessory of the power transmission line, an infrared and visible light image registration module is added before a semantic segmentation task is carried out, edge information is used for guiding, the accuracy of image registration is remarkably improved, and the distortion degree after affine transformation is reduced; according to the method, a multi-level feature extraction module is added in a semantic segmentation module of an electric power accessory, specific infrared and visible light data features are fully extracted, an asymmetric two-way coding network model is constructed, and the multi-level infrared image features are taken as supervision and fused with corresponding level visible light image features in a feature fusion module of weight adaptive learning; in addition, the method also uses a high-level feature activation module to eliminate noise interference, enhances modal difference features, and combines a multi-level decoding module to obtain a semantic segmentation result with clear edges and high precision; by training on the self-constructed and manually-calibrated infrared and visible light electric power accessory data set, a semantic segmentation model with good adaptability and strong generalization capability can be obtained, and the method has high engineering value.
Drawings
FIG. 1 is a system block diagram of a semantic segmentation method for power accessories of a power transmission line based on infrared and visible light image cooperation according to the present invention;
FIG. 2 is an operation diagram of the semantic segmentation method for the power accessory of the power transmission line based on the cooperation of the infrared image and the visible image;
FIG. 3 is a schematic diagram of an edge intensity detection module, wherein (a) is an image erosion operation and (b) is an image dilation operation;
FIG. 4 is a schematic diagram of a power accessory semantic segmentation module;
FIG. 5 is a schematic diagram of a residual network block;
FIG. 6 is a schematic diagram of a feature fusion module for weight adaptive learning;
FIG. 7 is a high level feature activation module diagram.
Detailed Description
The following describes in detail a specific embodiment of the infrared and visible light image collaborative power transmission line power accessory semantic segmentation method according to the present invention with reference to the accompanying drawings.
As shown in FIG. 1, the infrared and visible light image collaborative power accessory semantic segmentation method provided by the invention is composed of an image registration module based on edge intensity and a power accessory semantic segmentation module, wherein the image registration module covers algorithm parts such as edge intensity detection, feature selection and descriptor matching, the power accessory semantic segmentation module mainly comprises a sample set construction module, a multi-level modal information feature extraction module, a weight adaptive learning feature fusion module, a high-level feature activation module, a multi-level decoding module and a network training module, the sample set construction module and the network training module are respectively used for constructing a complete infrared and visible light image data set pair and training a network, and an asymmetric two-way coding network model is constructed through the combination of the modules, stable image semantic information is extracted from an infrared image and is fused into visible light image features, so that all-weather and high-efficiency power accessory semantic segmentation is realized.
As shown in fig. 2, the specific operation flow of the infrared and visible light image collaborative power accessory semantic segmentation method of the present invention is as follows:
step (1): an unmanned aerial vehicle is used for carrying an infrared thermal imager and a visible light camera to surround an electric power tower, and an infrared visible light image pair of an electric power accessory is shot to obtain an infrared visible light original sample set containing a transverse insulator, a vertical insulator, an insulating wire clamp, a vibration damper and a tower target;
step (2): processing the infrared visible light original sample set shot in the step (1) by adopting an edge intensity-based image registration module, wherein the edge intensity-based image registration module comprises an edge intensity detection module, a feature selection module and a descriptor matching module; the specific process is as follows:
the edge strength detection module comprises the following steps:
(1) As shown in fig. 3, first, the edge features of the infrared image are extracted based on the morphological gradient, and it is noted that the infrared image a is corroded by the structural element B as follows:
Figure BDA0003777892860000071
wherein, each element value in B is 0 or 1, and can form images with any shape, wherein B is a convolution kernel of 3 multiplied by 3; b is z Is a translation of the vector z to B, in particular
Figure BDA0003777892860000072
E is the corroded image; similarly, the infrared image a is expanded by the structuring element B as:
Figure BDA0003777892860000073
wherein B is s And F is an expanded image, and the process can be regarded as that the structural element B rotates 180 degrees around the origin and then corrodes the infrared image A.
(2) According to (1), an inner edge D of the infrared image can be obtained 1 = A- (A! B) and outer edge
Figure BDA0003777892860000074
Further obtaining basic gradient edge of infrared image
Figure BDA0003777892860000075
The characteristic selection module selects characteristics based on an ORB (ordered FAST and ported BRIEF) algorithm, and the specific process is as follows:
(1) Taking a pixel block p with the size of S multiplied by S, and obtaining a key point q = (x, y) on the pixel block through smoothing processing T And defining a binary decision criterion Γ:
Figure BDA0003777892860000081
wherein r = (x ', y') T For a selected threshold keypoint, p (r) is the gray value at r;
(2) Selecting n pairs of position coordinates (x) in the neighborhood of the key point q i ,y i ) And comparing the two-value judgment standard gamma in the step (1) to obtain an n-dimensional feature descriptor f n (p):
Figure BDA0003777892860000082
(3) Using a 2 Xn order matrix M as a rotation matrix R θ Operate to obtain a new matrix M θ Comprises the following steps:
Figure BDA0003777892860000083
wherein
Figure BDA0003777892860000084
Rotation matrix R corresponding to key point q principal direction theta θ Is shown as
Figure BDA0003777892860000085
(4) From (3), the matrix M becomes a directed form, that is, the feature descriptor has rotation invariance, and the binary descriptor is:
g n (p,θ)=f n (p)|(x i ,y i )∈S θ (8);
the descriptor matching module performs descriptor matching to realize infrared and visible light image registration, and the specific process is as follows:
(1) The Hamming distance is used as the similarity measurement of two points, and a violent matching method is adopted to roughly match the characteristic points;
(2) Randomly selecting a pair of matching point pairs, selecting a neighborhood with a fixed size of 4 multiplied by 4 according to the image position information of the matching point pairs, and selecting binary mutual information I (A, B) as 0.4 to screen error feature points, wherein the error feature points are selected
I(A,B)=H(A)+H(B)-H(A,B) (9),
Represents the mutual information of image A and image B, H (A) and H (B) represent the information entropy of image A and image B, respectively, taking image A as an example,
Figure BDA0003777892860000086
p i the ratio of the image with the gray value i in the image to the image A is shown, and H (B) is also the same;
Figure BDA0003777892860000087
representing the joint entropy of the image A and the image B, x and y are the gray values of the image A and the image B respectively, and p (x and y) is the gray joint probability distribution of the image A and the image B;
(3) Solving the transformation matrix by a progressive consistent sampling method, comprising the following steps of:
a. initializing parameters, and recording v as a threshold value of the number of interior points, T as a threshold value of an error of the interior points, N as the maximum iteration number of the algorithm, wherein the initial value of v is set to be 4, the initial value of T is set to be 0.02, and N is set to be 2000;
b. according to the ratio of the shortest Euclidean distance to the second shortest Euclidean distance between the matching point pairs, the quality factor alpha = r of the characteristic point is obtained min1 /r min2 Quality factor β =1/α r for each pair of matching points min1 Wherein the shortest Euclidean distance is r min1 The next shortest Euclidean distance is r min2
c. According to the value of beta, the matching points are sequentially arranged from large to small and are stored in D inner Collecting, namely taking the first M pairs of feature points as a preset interior point set, and randomly selecting four pairs of feature points from the preset interior point set for calculating an initial homography matrix H;
d. other pairs of matching points are tested in conjunction with the transformation matrix and the projection error E satisfying the model is calculated,
Figure BDA0003777892860000091
wherein (x, y) T Is the three-dimensional space coordinate of the characteristic point, (u, v) T If other matching point pairs meet the condition that E is less than T, classifying the matching point pairs as inner points and recording the number k of the inner points;
e. if k is less than v, repeating the steps a to d, otherwise, storing the set meeting the conditions into a preferred set; if the iteration times exceed N, acquiring a set C and a transformation matrix H with the largest number of inner points in the optimal consistency set;
(4) Selecting a matching point pair g from the inner point set C, then randomly selecting another 4 pairs of matching points, and calculating the distance a between the transformed position of the five matching point pairs in the registration chart A and B and the g of the other 4 pairs of matching points through the transformation matrix H 1 ,a 2 ,a 3 ,a 4 And b 1 ,b 2 ,b 3 ,b 4 If b is i -a i If the value is less than epsilon, keeping the matching point pairs, otherwise, removing the matching point pairs, wherein epsilon is a preset threshold value, i =1,2,3,4;
(5) Calculating a final transformation matrix H in the high-precision interior point set by using a RANSAC algorithm, and transforming the images according to the transformation matrix H to calculate a common region between the two images;
(6) Affine transformation is adopted as a solution model of registration, and the mathematical expression of the solution model is as follows:
Figure BDA0003777892860000092
wherein (x) 1 ,y 1 ) Represents the original coordinates (x) 2 ,y 2 ) Representing projectively transformed coordinates, H 1 Representing an affine transformation matrix, (a) 02 ,a 12 ) Represents an offset;
and (3): a power accessory semantic segmentation module shown in FIG. 4 is used for training a sample set formed by registering infrared and visible light images and then labeling, so that a semantic segmentation task of infrared and visible light image cooperation is realized, and the method specifically comprises the following steps:
sending the registered infrared and visible light images into a sample set construction module to construct an infrared and visible light power accessory sample set, manually calibrating all visible lights and corresponding infrared images by using a labelme marking tool, taking five targets, namely a transverse insulator, a vertical insulator, an insulating wire clamp, a vibration damper and a tower as objects, and taking the rest unmarked areas as backgrounds to obtain an infrared and visible light image power accessory sample set;
and (4): the system comprises a sample set construction module, a multi-level modal information feature extraction module, a weight self-adaptive learning feature fusion module, a high-level feature activation module, a multi-level decoding module and a network training module;
the method comprises the following steps of inputting labeled infrared and visible light images into a feature fusion module for multi-level modal information feature extraction and weight adaptive learning so as to fully extract infrared and visible light data features with specificity, fully fusing the infrared and visible light data features by utilizing the complementarity of the infrared and visible light images, sending fused results into a high-level feature activation module to enhance the differential features and eliminate the influence of interference noise, and finally sending a global feature map into a multi-level decoding module to obtain a final semantic segmentation result, wherein the specific steps are as follows:
(1) Respectively inputting the marked infrared image and visible light image into two parallel encoders in corresponding modes, wherein the two parallel encoders use ResNet50 as a main network, the lower branch encoder changes the number of input channels of the first convolutional layer into 1 to match with the corresponding infrared gray image, and the upper branch is used for extracting the corresponding visible light image characteristics;
(2) From (1), 5 corresponding feature maps of different levels, namely R, can be obtained i And T i Wherein i =1,2,3,4,5;
(3) Starting from the second layer, the infrared image is sent to a feature fusion module for weight adaptive learning located in front of each convolution layer of the visible light encoder, and the fusion result is used as the input of the next convolution layer of the visible light encoder to realize the auxiliary supervision effect of the infrared image on the visible light image, as shown in fig. 5 and 6, the specific fusion steps are as follows:
a. recording the characteristic of the input visible light image as R i-1 The infrared image is characterized by T i-1 R is to be i-1 And T i-1 After the feature connection, a 3X 3 convolution layer is sent to reduce the number of channels to 1/4 of the original number, denoted as P i
b. Processing P with four largest pooling layers of sizes 1, 5, 9, 13, respectively i Obtaining four kinds of different feature maps with different target sizes, sending them into two convolution layers containing a 3X 3 convolution kernel and a ReLU activation function, and obtaining 4 feature maps v j ,j=1,2,3,4;
c. The result of b is up-sampled until and P i Keep consistent to form a residual structure, P i Is reduced to 2 and up-sampled v j Performing summation operation at a pixel level to perform fusion to obtain a fusion graph U;
d. inputting the result in the step c into two 3 multiplied by 3 convolutional layers, and activating a softmax function to obtain a dual-channel mapping W i
e. Separating the result in d into two weight maps W i RGB And W i T And are respectively connected with the input R i-1 And T i-1 Multiplying at pixel level to obtain input for next layer visible encoder convolution layer
Figure BDA0003777892860000101
f. And (3) fusing the infrared image features into the visible light image features by using a weight self-adaptive learning feature fusion module at each intermediate stage of the feature extraction encoder, sending the result to the next multi-level modal information feature extraction module for feature extraction, and continuously iterating the steps from a to e until the final fusion result R is obtained 5
(4) Then the final fusion result R in (3) 5 Sending the high-level feature activation module to enhance the differential features and eliminate the influence of interference noise, as shown in fig. 7, the specific steps are as follows:
a. r is to be 5 Divided into two mutually independent characterizers M avg And M max And separately combine M avg And M max Sending into an adaptive average pooling layer and an adaptive maximum pooling layer to obtain an output result with a size of 1 × 1
M avg =AvgPool 1 (R 5 ),M max =MaxPool 1 (R 5 ) (12);
b. And b, sending the two descriptors in the a to a shared block, wherein the shared block is realized by two convolutional layers and a ReLU function, the sizes of the two convolutional layers are both 1 multiplied by 1, the first convolutional layer reduces the number of channels to half of the number of the original channels, and the second convolutional layer restores the channels after the ReLU function. Then, element summation, sigmoid function and element multiplication are utilized to generate a channel attention feature map:
Figure BDA0003777892860000111
c. b. result F c Sending the information into an empty space pyramid pool (ASPP) module to expand a characteristic receptive field so as to capture multi-scale information, and the specific steps are as follows:
c1. f is to be c Respectively feeding 3 x 3 convolutional layers with the step sizes of 1, 5, 9 and 13, and generating four multi-scale region characteristics V i ,i=1,2,3,4;
c2. F is to be c Sending the data into a self-adaptive average pooling layer and two 3 multiplied by 3 convolutional layers, and adopting a bilinear interpolation mode to carry out up-sampling on the result to obtain the self-adaptive region characteristic V 5
c3. Connecting the feature maps of the c1. And the c2. To obtain a self-adaptive global feature map G, and deriving the number of channels by using a 1 multiplied by 1 convolutional layer;
(5) And finally, sending the global feature map G into a multilevel decoding module to obtain a final semantic segmentation result, wherein the specific steps are as follows:
a. taking the first decoder as an example, the global feature map G, the corresponding output feature C of the visible light image encoder, and the output O of the previous decoding module are sent to the decoder, and the global feature map G is used instead because the first decoder does not have the output of the previous decoding module;
b. in this case C is R 4 R in the rest modules 3 And R 2 Sending the output O of the previous decoding module and the output characteristic C of the visible light image encoder into an adaptive spatial channel attention module to strengthen the modal specific characteristic;
c. b, upsampling the result until the resolution is consistent with the feature C, and reducing the number of channels mapped by the three features to 1/4 of the number of channels of the feature map O;
d. c, performing pixel-level summation operation on the three results in the step c, and reconstructing the summed feature mapping by using a convolution kernel of 3 x 3;
e. repeating the steps until the last decoder sends the last decoder into a 1 x 1 convolution kernel and then carries out up-sampling operation to change the number of channels into the number of segmentation target categories, thus obtaining a final semantic segmentation result Y;
and (5): randomly dividing the labeled infrared and visible light image group into a training set and a testing set according to a ratio of 4 to 1 according to the infrared and visible light power accessory data set obtained in the step (4), wherein the infrared and visible light image size is 640 × 480, and training the network by using the training set, wherein the prediction result Y = { Y (i), i =1, 2.. The., m } and the truth diagram Y = { Y (i) } is given to the prediction result I =1, 2.. The.., m } and the truth diagram Y = label ={Y label (i) I =1, 2.. M }, and updates the network parameters to obtain better segmentation effect with the cross entropy loss function as supervision:
Figure BDA0003777892860000121
the optimization device adopts an Adam optimization strategy, the attenuation parameter value is set to be 0.99, the initial learning rate is 0.001, and training iteration is 8000 times.
Finally, it should be noted that the above embodiments are only used for illustrating the technical solutions of the present invention and not for limiting the same. It will be understood by those skilled in the art that various modifications and equivalents may be made to the embodiments of the invention as described herein, and such modifications and variations are intended to be within the scope of the claims appended hereto.

Claims (10)

1. A semantic segmentation method for power accessories of a power transmission line is characterized by comprising the following steps:
step (1): an unmanned aerial vehicle is used for carrying an infrared thermal imager and a visible light camera to surround an electric power tower, and an infrared visible light image pair of an electric power accessory is shot to obtain an infrared visible light original sample set containing a transverse insulator, a vertical insulator, an insulating wire clamp, a vibration damper and a tower target;
step (2): processing the infrared visible light original sample set by adopting an image registration module based on edge intensity to generate an infrared visible light image registration result with high matching degree;
and (3): sending the registered infrared and visible light images into a sample set construction module to construct an infrared and visible light power accessory sample set, manually calibrating all visible lights and corresponding infrared images by using a labelme marking tool aiming at the five types of targets in the step (1), and taking the rest unmarked areas as backgrounds to obtain an infrared and visible light image power accessory sample set;
and (4): inputting the marked infrared and visible light images into a feature fusion module for multi-level modal information feature extraction and weight adaptive learning so as to fully extract infrared and visible light data features with specificity, fully fusing the infrared and visible light data features by utilizing the complementarity of the infrared and visible light images, sending the fused result into a high-level feature activation module to enhance the differential features and eliminate the influence of interference noise, and finally sending a global feature map into a multi-level decoding module to obtain a final semantic segmentation result;
and (5): and (4) randomly dividing the infrared and visible light image sample set group in the step (3) into a training set and a testing set according to the proportion of 4.
2. The semantic segmentation method for the power accessory of the power transmission line according to claim 1, wherein in the step (2), the image registration module based on the edge strength comprises an edge strength detection module, a feature selection module and a descriptor matching module.
3. The semantic segmentation method for the power accessory of the power transmission line according to claim 2, wherein the edge strength detection module comprises the following steps:
(1) Sending an original infrared visible light image into the image processing system, extracting the edge characteristics of the infrared image, and recording that the infrared image A is corroded by a structural element B as follows:
Figure FDA0003777892850000013
wherein B is z Is a translation of the vector z to B, in particular
Figure FDA0003777892850000011
E is the corroded image; similarly, infrared image a is expanded by structuring element B as:
Figure FDA0003777892850000012
wherein B is s The process can be regarded as that the structuring element B rotates 180 degrees around the origin and then corrodes the infrared image A;
(2) According to (1), an inner edge D of the infrared image can be obtained 1 = A- (A! B) and outer edge
Figure FDA0003777892850000021
Further obtaining basic gradient edge of infrared image
Figure FDA0003777892850000022
4. The semantic segmentation method for the power accessory of the power transmission line according to claim 2, wherein the feature selection module performs feature selection based on an ORB algorithm, and comprises the following steps:
(1) Taking a pixel block p with the size of S multiplied by S, and obtaining a key point q = (x, y) on the pixel block through smoothing processing T And defining a binary criterion Γ:
Figure FDA0003777892850000023
wherein r = (x ', y') T For a selected threshold keypoint, p (r) is the gray value at r;
(2) Selecting n pairs of position coordinates (x) in q neighborhood of key point i ,y i ) And comparing the two-value judgment standard gamma to obtain an n-dimensional feature descriptor f n (p):
Figure FDA0003777892850000024
(3) Using a 2 Xn order matrix M as a rotation matrix R θ Operate to obtain a new matrix M θ Comprises the following steps:
Figure FDA0003777892850000025
wherein
Figure FDA0003777892850000026
Rotation matrix R corresponding to key point q principal direction theta θ Is shown as
Figure FDA0003777892850000027
(4) From (3), the matrix M becomes a directed form, i.e. the feature descriptor has rotation invariance, and the binary descriptor is:
g n (p,θ)=f n (p)|(x i ,y i )∈S θ
5. the semantic segmentation method for the power accessory of the power transmission line according to claim 2, wherein the descriptor matching module comprises the following steps:
(1) Taking the Hamming distance as the similarity measurement of two points, and roughly matching the feature points by adopting a violent matching method;
(2) Solving a transformation matrix by a progressive consistent sampling method;
(3) Calculating a final transformation matrix H in the high-precision interior point set by using a RANSAC algorithm, and transforming the images according to the transformation matrix H to calculate a common region between the two images;
(4) Affine transformation is adopted as a solution model of registration, and the mathematical expression of the solution model is as follows:
Figure FDA0003777892850000031
wherein (x) 1 ,y 1 ) Represents the original coordinates (x) 2 ,y 2 ) Representing projectively transformed coordinates, H 1 Representing an affine transformation matrix, (a) 02 ,a 12 ) Indicating the offset.
6. The semantic segmentation method for the power accessory of the power transmission line according to claim 1, wherein the step (4) includes the sample set construction module, the multi-level modal information feature extraction module, the weight adaptive learning feature fusion module, the high-level feature activation module, the multi-level decoding module and a network training module, and includes the following steps:
(1) Respectively inputting the marked infrared image and visible light image into two parallel encoders in corresponding modes, wherein the two parallel encoders use ResNet50 as a main network, the lower branch encoder changes the number of the first convolutional layer input channel into 1 to match with a corresponding infrared gray image, and the upper branch is used for extracting corresponding visible light image characteristics;
(2) From the second layer, the infrared image is sent to a feature fusion module of weight self-adaptive learning positioned in front of each convolution layer of the visible light encoder, and the fusion result is used as the input of the next convolution layer of the visible light encoder so as to realize the auxiliary supervision effect of the infrared image on the visible light image;
(3) Sending the final fusion result of the step (2) to the high-level feature activation module to enhance the differential features and eliminate the influence of interference noise, and obtaining a global feature map;
(4) And sending the global feature map into the multilevel decoding module to obtain a final semantic segmentation result.
7. The semantic segmentation method for the power accessory of the power transmission line according to claim 6, wherein the feature fusion module for weight adaptive learning comprises the following steps:
(1) Recording the characteristic of the input visible light image as R i-1 The infrared image is characterized by T i-1 R is to be i-1 And T i-1 After the feature connection, a 3X 3 convolutional layer is sent to reduce the number of channels to 1/4 of the original number, denoted as P i
(2) Processing P with four largest pooling layers of sizes 1, 5, 9, 13, respectively i Obtaining four kinds of characteristic mapping with different types and target sizes, then sending them into two convolution layers containing a 3X 3 convolution kernel and a ReLU activation function to obtain 4 characteristic mapping v j ,j=1,2,3,4;
(3) Performing up-sampling operation on the result of (2) until the result is equal to P i Keep consistent to form a residual structure, P i Is reduced to 2 and up-sampled v j Performing summation operation at a pixel level to perform fusion to obtain a fusion graph U;
(4) Inputting the result in the step (3) into two convolution layers of 3 multiplied by 3, and then activating a softmax function to obtain a dual-channel mapping W i
(5) Separating the result in (4) into two weight maps W i RGB And W i T And respectively communicate withInto R i-1 And T i-1 Multiplying at pixel level to obtain the input of the next visible encoder convolution layer
Figure FDA0003777892850000041
(6) The feature fusion module of the weight self-adaptive learning is used for fusing the infrared image features into the visible light image features in each intermediate stage of the feature extraction encoder, the result is sent to the next multi-level modal information feature extraction module for feature extraction, and the steps (1) to (5) are iterated continuously until the final fusion result R is obtained 5
8. The semantic segmentation method for the power accessory of the power transmission line according to claim 6, wherein the high-level feature activation module comprises the following steps:
(1) R is to be 5 Divided into two mutually independent characterizers M avg And M max And separately combine M avg And M max Sending into the adaptive average pooling layer and the adaptive maximum pooling layer to obtain an output result with a size of 1 × 1, wherein
M avg =AvgPool 1 (R 5 ),M max =MaxPool 1 (R 5 );
(2) Sending the two descriptors in (1) to a shared block, wherein the shared block is realized by two convolutional layers and a ReLU function, the sizes of the two convolutional layers are 1 multiplied by 1, the first convolutional layer reduces the number of channels to a half of the number of the original channels, and the second convolutional layer restores the channels after the ReLU function; then, element summation, sigmoid function and element multiplication are utilized to generate a channel attention feature map:
Figure FDA0003777892850000042
(3) The result F of (2) c Feeding into a void space pyramid pool (ASPP) module to expand the feature senseAnd capturing multi-scale information by the exposure field, connecting the up-sampling result to obtain a self-adaptive global feature map G, and deriving the number of channels by using a 1 × 1 convolutional layer.
9. The semantic segmentation method for the power accessory of the power transmission line according to claim 6, wherein the multilevel decoding module comprises the following steps:
(1) Taking the first decoder as an example, the global feature map G, the corresponding output feature C of the visible light image encoder, and the output O of the previous decoding module are sent to the decoder, and the global feature map G is used instead because the first decoder does not have the output of the previous decoding module;
(2) In this case C is R 4 And the rest of the modules are respectively R 3 And R 2 Sending the output O of the former decoding module and the output characteristic C of the visible light image encoder into the adaptive spatial channel attention module to strengthen the specific modal characteristics;
(3) The result in the step (2) is up-sampled until the resolution is consistent with the feature C, and the number of channels mapped by the three features is reduced to 1/4 of the number of channels of the feature map O;
(4) Carrying out pixel-level summation operation on the three results in the step (3), and reconstructing a summed feature map by using a convolution kernel of 3 multiplied by 3;
(5) And repeating the steps until the last decoder sends the last decoder into a 1 x 1 convolution kernel, and then performing up-sampling operation to change the number of channels into the number of segmentation target categories, so as to obtain a final semantic segmentation result Y.
10. The method according to claim 1, wherein in step (5), the labeled groups of ir and visible light images are randomly divided into the training set and the testing set according to a ratio of 4 label ={Y label (i) I =1, 2.., m }, and is lost in cross entropyThe function is used as supervision and network parameters are updated to obtain better segmentation effect.
CN202210921552.0A 2022-08-02 2022-08-02 Semantic segmentation method for power accessory of power transmission line Pending CN115376024A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210921552.0A CN115376024A (en) 2022-08-02 2022-08-02 Semantic segmentation method for power accessory of power transmission line

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210921552.0A CN115376024A (en) 2022-08-02 2022-08-02 Semantic segmentation method for power accessory of power transmission line

Publications (1)

Publication Number Publication Date
CN115376024A true CN115376024A (en) 2022-11-22

Family

ID=84064700

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210921552.0A Pending CN115376024A (en) 2022-08-02 2022-08-02 Semantic segmentation method for power accessory of power transmission line

Country Status (1)

Country Link
CN (1) CN115376024A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116129353A (en) * 2023-02-07 2023-05-16 佛山市顺德区福禄康电器科技有限公司 Method and system for intelligent monitoring based on image recognition
CN116912649A (en) * 2023-09-14 2023-10-20 武汉大学 Infrared and visible light image fusion method and system based on relevant attention guidance
CN116935063A (en) * 2023-07-24 2023-10-24 北京中科睿途科技有限公司 Method for generating driver state text in intelligent cabin environment and related equipment
CN117470859A (en) * 2023-12-25 2024-01-30 广州中科智巡科技有限公司 Insulator internal defect detection method and device

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116129353A (en) * 2023-02-07 2023-05-16 佛山市顺德区福禄康电器科技有限公司 Method and system for intelligent monitoring based on image recognition
CN116129353B (en) * 2023-02-07 2024-05-07 广州融赋数智技术服务有限公司 Method and system for intelligent monitoring based on image recognition
CN116935063A (en) * 2023-07-24 2023-10-24 北京中科睿途科技有限公司 Method for generating driver state text in intelligent cabin environment and related equipment
CN116935063B (en) * 2023-07-24 2024-03-08 北京中科睿途科技有限公司 Method for generating driver state text in intelligent cabin environment and related equipment
CN116912649A (en) * 2023-09-14 2023-10-20 武汉大学 Infrared and visible light image fusion method and system based on relevant attention guidance
CN116912649B (en) * 2023-09-14 2023-11-28 武汉大学 Infrared and visible light image fusion method and system based on relevant attention guidance
CN117470859A (en) * 2023-12-25 2024-01-30 广州中科智巡科技有限公司 Insulator internal defect detection method and device
CN117470859B (en) * 2023-12-25 2024-03-22 广州中科智巡科技有限公司 Insulator internal defect detection method and device

Similar Documents

Publication Publication Date Title
CN111325797B (en) Pose estimation method based on self-supervision learning
CN108399419B (en) Method for recognizing Chinese text in natural scene image based on two-dimensional recursive network
CN109360171B (en) Real-time deblurring method for video image based on neural network
CN115376024A (en) Semantic segmentation method for power accessory of power transmission line
CN111401384B (en) Transformer equipment defect image matching method
CN113657388B (en) Image semantic segmentation method for super-resolution reconstruction of fused image
CN109903299B (en) Registration method and device for heterogenous remote sensing image of conditional generation countermeasure network
Liao et al. Model-free distortion rectification framework bridged by distortion distribution map
CN113159466B (en) Short-time photovoltaic power generation prediction system and method
CN111861880B (en) Image super-fusion method based on regional information enhancement and block self-attention
CN112633220B (en) Human body posture estimation method based on bidirectional serialization modeling
CN109376641B (en) Moving vehicle detection method based on unmanned aerial vehicle aerial video
CN112560865B (en) Semantic segmentation method for point cloud under outdoor large scene
CN111553845B (en) Quick image stitching method based on optimized three-dimensional reconstruction
CN113887349A (en) Road area image identification method based on image and point cloud fusion network
CN113284251A (en) Cascade network three-dimensional reconstruction method and system with self-adaptive view angle
CN113486894A (en) Semantic segmentation method for satellite image feature component
CN114663880A (en) Three-dimensional target detection method based on multi-level cross-modal self-attention mechanism
CN113627481A (en) Multi-model combined unmanned aerial vehicle garbage classification method for smart gardens
CN115330935A (en) Three-dimensional reconstruction method and system based on deep learning
CN115439669A (en) Feature point detection network based on deep learning and cross-resolution image matching method
CN114708315A (en) Point cloud registration method and system based on depth virtual corresponding point generation
CN113160291A (en) Change detection method based on image registration
CN111274893A (en) Aircraft image fine-grained identification method based on component segmentation and feature fusion
CN111460941A (en) Visual navigation feature point extraction and matching method in wearable navigation equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination