CN117218345A - Semantic segmentation method for electric power inspection image - Google Patents
Semantic segmentation method for electric power inspection image Download PDFInfo
- Publication number
- CN117218345A CN117218345A CN202311183380.2A CN202311183380A CN117218345A CN 117218345 A CN117218345 A CN 117218345A CN 202311183380 A CN202311183380 A CN 202311183380A CN 117218345 A CN117218345 A CN 117218345A
- Authority
- CN
- China
- Prior art keywords
- cross
- feature
- modal
- feature map
- map
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 57
- 238000000034 method Methods 0.000 title claims abstract description 55
- 238000007689 inspection Methods 0.000 title claims abstract description 30
- 238000010586 diagram Methods 0.000 claims abstract description 37
- 230000004927 fusion Effects 0.000 claims abstract description 11
- 230000007246 mechanism Effects 0.000 claims abstract description 10
- 230000000007 visual effect Effects 0.000 claims description 20
- 239000011800 void material Substances 0.000 claims description 13
- 230000006870 function Effects 0.000 claims description 7
- 230000008859 change Effects 0.000 claims description 6
- 230000001965 increasing effect Effects 0.000 claims description 5
- 238000011176 pooling Methods 0.000 claims description 5
- 238000005070 sampling Methods 0.000 claims description 5
- 230000008569 process Effects 0.000 claims description 4
- 230000004931 aggregating effect Effects 0.000 claims description 3
- 230000002708 enhancing effect Effects 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 230000017105 transposition Effects 0.000 claims description 3
- 230000005611 electricity Effects 0.000 claims 1
- 238000003709 image segmentation Methods 0.000 abstract description 4
- 238000013527 convolutional neural network Methods 0.000 description 4
- 230000007547 defect Effects 0.000 description 3
- 239000000284 extract Substances 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 238000012935 Averaging Methods 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000003708 edge detection Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Landscapes
- Image Analysis (AREA)
Abstract
The invention discloses a semantic segmentation method of an electric power inspection image, which comprises the following steps: s1, acquiring an RGB image, a thermodynamic diagram and a depth map of power inspection; s2, respectively extracting features of an RGB image, a thermodynamic diagram and a depth diagram, and performing cross-modal feature fusion on the feature diagram to obtain a cross-modal feature diagram; s3, performing rough segmentation on the example region of the cross-modal feature map to obtain a rough segmentation example region; s4, extracting feature association information from pixel level to instance level in the cross-modal feature map based on the rough segmentation instance area; and S5, carrying out pixel-level feature enhancement in the cross-modal feature map based on the feature association information to obtain a predicted semantic segmentation result map. Compared with the traditional method and the CNN-based image segmentation method, the method can more fully represent the context relation and the global semantic information among the features, and has smaller parameter number and faster reasoning speed compared with the attention mechanism-based image segmentation method.
Description
Technical Field
The invention belongs to the technical field of power equipment inspection, and particularly relates to a semantic segmentation method for an electric power inspection image.
Background
In the power inspection task, image semantic segmentation is an important technology, and can help automatically identify power equipment, defects and other key factors, so that inspection efficiency and inspection accuracy are improved. The existing research method mainly comprises the following steps: (1) semantic segmentation based on traditional computer vision methods. Such methods rely mainly on low-level visual features of the image, and commonly include Graph Cut (Graph Cut), clustering, edge detection, and other techniques. (2) semantic segmentation based on Convolutional Neural Networks (CNNs). Such models employ an encoder-decoder architecture, where the encoder is responsible for extracting features of the image and the decoder is responsible for mapping these features back to the pixel-level segmentation results. In the encoder section, common network structures include VGGNet, resNet and the like. The decoder portion typically uses a transposed convolutional layer for upsampling and restoring resolution. (3) semantic segmentation method based on attention mechanism. Such methods allow the model to automatically learn contextual dependencies between different locations in the image as the image is processed, adaptively focus on regions in the image that are relevant to the semantic segmentation task, and dynamically assign weights for the different regions based on the image content.
Semantic segmentation based on a traditional computer vision method is often dependent on low-level features of an artificial design image, cannot well characterize high-level semantic information, cannot effectively learn context relations among pixels, is difficult to perform semantic understanding on power grid panorama, and is poor in performance in a power inspection scene with complex and changeable environment; the semantic segmentation based on the convolutional neural network only can extract local characteristics, long-distance dependency relationship among pixels is difficult to capture, overlapped and blocked objects in complex scenes cannot be well processed, the method is sensitive to appearance change of power equipment caused by factors such as environment and illumination, and generalization is poor in different scenes; semantic segmentation methods based on attention mechanisms typically require computation of correlation weights between each location and other locations. For large-size images or high-resolution feature images, the computational complexity of the model can be obviously increased, so that the time cost of training and reasoning is increased, and the requirements of scenes such as unmanned aerial vehicle inspection on low computational effort and high real-time performance are difficult to meet.
Disclosure of Invention
Aiming at the defects in the prior art, the semantic segmentation method for the power inspection image solves the problem that the existing semantic segmentation model is difficult to segment the semantic under the complex scene with shielding and appearance change when aiming at the detection of the power inspection defect.
In order to achieve the aim of the invention, the invention adopts the following technical scheme: a semantic segmentation method of an electric power inspection image comprises the following steps:
s1, acquiring multi-mode image data of power inspection;
wherein the multi-modal image data includes an RGB image, a thermodynamic diagram, and a depth map;
s2, respectively extracting features of an RGB image, a thermodynamic diagram and a depth diagram, and performing cross-modal feature fusion on the feature diagram to obtain a cross-modal feature diagram;
s3, performing rough segmentation on the example region of the cross-modal feature map to obtain a rough segmentation example region;
s4, extracting feature association information from pixel level to instance level in the cross-modal feature map based on the rough segmentation instance area;
and S5, carrying out pixel-level feature enhancement in the cross-modal feature map based on the feature association information, and further obtaining a predicted semantic segmentation result map.
Further, in the step S2, the visual characteristics of the power image of the RGB image are extracted through the MobileNet model, so as to obtain a visual characteristic diagram XF;
extracting the pixel heat intensity change characteristics in the thermodynamic diagram through a SheffleNet model to obtain a thermodynamic characteristic diagram XT;
and extracting structural features including lines and equipment from the depth map through the PointNet model to obtain a depth feature map XD.
Further, in the step S2, a bi-directional attention mechanism is adopted to perform cross-modal feature fusion, so as to obtain a cross-modal feature map X, where the expression is as follows:
XF′=XF+Attention(XF,XD)+Attention(XF,XT)
XD′=XD+Attention(XD,XF)+Attention(XD,XT)
XT′=XT+Attention(XT,XD)+Attention(XT,XF)
X=Concat(XF′,XD′,XT′)
in the formula, XF ' is a visual feature map fused with thermal information and depth information, XD ' is a depth feature map fused with visual information and thermal information, XT ' is a thermal feature map fused with visual information and depth information, attention (·) is an Attention mechanism, and Concat (·) is a splicing operation.
Further, the step S3 specifically includes:
s31, performing convolution operation on the cross-modal feature map X by using the cavity convolution kernels w with different cavity rates to obtain a cross-modal feature map X' after the convolution operation;
s32, carrying out global average pooling, 1X 1 convolution dimension increasing and splicing operation on the cross-modal feature graphs X 'with different void ratios after convolution operation in sequence to realize multi-scale information fusion, so as to obtain a multi-scale cross-modal feature graph X';
s33, merging deep semantic features of the multi-scale cross-modal feature map X' with shallow semantic features of the initial cross-modal feature map X through jump connection to obtain coding features;
s34, up-sampling decoding is carried out on the coding features by utilizing transpose convolution, and rough division example areas corresponding to all examples are obtained.
Further, in the step S31, the cross-modal feature map X 'after the convolution operation is at an arbitrary position X' i Expressed as:
wherein k represents the position on the convolution kernel, r represents the void rate of void convolution, w is the void convolution kernel, and X is a cross-modal feature map;
in the step S34, the rough segmentation implementation region is:
M=Deconv(X+X″)
where X is a cross-modal feature map and Deconv (·) is a transpose convolution operation.
Further, the step S4 specifically includes:
s41, carrying out weighted summation on pixel level representations in the image of the coarse-division instance corresponding to the similar instance to obtain instance level representations;
s42, extracting feature association information of each pixel level representation and the instance level representation by using the similarity of the pixel level representation and the instance level representation corresponding to the similar instance.
Further, in the step S41, the instance level represents f k The method comprises the following steps:
wherein X is i For the ith pixel level representation, M ki For the normalized probability that the ith pixel belongs to class k, represent M k The value of the ith position is added, and I is a pixel set;
in the step S42, feature-related information w ik The method comprises the following steps:
in the method, in the process of the invention,is a normalized relationship function.
Further, the step S5 specifically includes:
s51, taking the feature association information as weight, and weighting and aggregating instance-level representations of K areas to obtain association features;
s52, enhancing each pixel level representation in the cross-modal feature map by using the associated features to obtain enhanced pixel level feature representations;
s53, performing transposition convolution operation on the enhanced pixel level feature representation to obtain a final predicted semantic segmentation result graph.
Further, in the step S51, the feature associated information represented by the ith pixel level is used as the associated feature Y corresponding to the weight i The method comprises the following steps:
wherein, ρ (·) and δ (·) are both transformation functions, w ik For characteristic association information, f k For instance level representation;
in the step S52, the enhanced pixel level feature representation Z is:
Z=Concat(X,Y)
in the formula, concat (&) is a splicing operation, and X is a cross-mode characteristic diagram;
in the step S53, the semantic segmentation result map is expressed as:
R=Deconv(Z)
wherein R is a semantic label represented by each pixel in the semantic segmentation result graph, and Deconv (·) is a transpose convolution operation.
The beneficial effects of the invention are as follows:
(1) The invention fully considers the multi-mode and multi-scale information in the power inspection scene. For input images of different modes, different types of lightweight backbone networks are adopted to extract key features.
(2) The invention further extracts semantic features by utilizing multi-cavity rate cavity convolution, and obtains a coarse segmentation example area with good effect by using transposed convolution up-sampling.
(3) According to the invention, the dependency information between the pixel and the object region is aggregated into the pixel representation by utilizing the characteristic association information of the pixel-instance region, so that the pixel representation can be better similar to the abstract representation of the instance to which the pixel belongs, and a refined semantic segmentation result is obtained.
(4) Compared with the traditional method and the CNN-based image segmentation method, the method can more fully represent the context relation and the global semantic information among the features, and has smaller parameter number and faster reasoning speed compared with the attention mechanism-based image segmentation method.
Drawings
Fig. 1 is a flowchart of a semantic segmentation method of a power inspection image provided by the invention.
Fig. 2 is a schematic diagram of cross-modal feature diagram construction provided by the present invention.
FIG. 3 is a schematic diagram of pixel-level to example-level feature association and feature enhancement provided by the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.
Example 1:
the embodiment of the invention provides a semantic segmentation method of a power inspection image, as shown in fig. 1, comprising the following steps:
s1, acquiring multi-mode image data of power inspection;
wherein the multi-modal image data includes an RGB image, a thermodynamic diagram, and a depth map;
s2, respectively extracting features of an RGB image, a thermodynamic diagram and a depth diagram, and performing cross-modal feature fusion on the feature diagram to obtain a cross-modal feature diagram;
s3, performing rough segmentation on the example region of the cross-modal feature map to obtain a rough segmentation example region;
s4, extracting feature association information from pixel level to instance level in the cross-modal feature map based on the rough segmentation instance area;
and S5, carrying out pixel-level feature enhancement in the cross-modal feature map based on the feature association information, and further obtaining a predicted semantic segmentation result map.
In step S1 of the embodiment of the present invention, during power inspection, multi-mode image data of the power inspection is collected by multiple devices such as a high-definition color camera, a thermal imager, a laser range finder, etc., including an RGB image, a thermodynamic diagram and a depth map; the RBG high-definition image of the power line and the device shot by the high-definition camera provides visual information; detecting a thermodynamic diagram of the heat distribution of the surface and surrounding areas of the electrical facility by a thermal imager to discover possible overload problems of the electrical power lines and equipment; and 3D structure scanning is carried out by a laser range finder to obtain a depth map, and the three-dimensional space position, the size and other structure information of the circuit and the equipment are provided.
In step S2 of the embodiment of the present invention, as shown in fig. 2, a parallel multi-branch structure is designed and adopted to learn feature graphs of images of different modes, specifically:
the RGB image Xf mainly represents visual information such as edges, textures and the like, so that a depth separable convolution-based MobileNet model is adopted to extract the visual characteristics of the power image of the RGB image, and a visual characteristic diagram XF is obtained; the method comprises the steps of taking a MobileNet model as a light backbone network, efficiently extracting visual characteristics of an electric image, enabling depth separable convolution to be composed of two steps of depth convolution and point-by-point convolution, firstly, performing independent convolution operation on each channel of an input feature map, and then performing dimension lifting or dimension reduction by applying 1*1 convolution operation on the basis.
The heat map Xt mainly expresses the heat intensity change information of pixels, and focuses on the position relation and the local mode of the equipment, so that the heat intensity change characteristics of the pixels in the thermodynamic diagram are extracted through a SheffeNet model to obtain a thermodynamic characteristic map XT; the method comprises the steps of adopting a ShuffleNet as a light backbone network, reducing computational complexity through group convolution, and simultaneously improving the expression capability of characteristics by applying channel shuffling operation to obtain a thermal characteristic diagram XT. This approach is applicable to characterization of the law of thermal changes and local patterns within the image.
The depth map Xd mainly contains the three-dimensional spatial position, size and other structural information of the lines and the equipment, so that the structural features of the lines and the equipment are extracted from the depth map through the PointNet model to obtain a depth feature map XD; the PointNet is adopted as a light backbone network, independent characteristics are directly learned for each point through a multi-layer perceptron, max-Pooling is carried out on all the point characteristics, the maximum response value is obtained as an integral characteristic, and finally a depth characteristic map XD is obtained.
In step S2 of the embodiment of the present invention, cross-modal feature fusion is performed on the extracted visual feature map XF, depth feature map XD, and thermal feature map XT by using a bidirectional attention mechanism, so as to obtain a cross-modal feature map X, where the expression is as follows:
XF′=XF+Attention(XF,XD)+Attention(XF,XT)
XD′=XD+Attention(XD,XF)+Attention(XD,XT)
XT′=XT+Attention(XT,XD)+Attention(XT,XF)
X=Concat(XF′,XD′,XT′)
in the formula, XF ' is a visual feature map fused with thermal information and depth information, XD ' is a depth feature map fused with visual information and thermal information, XT ' is a thermal feature map fused with visual information and depth information, attention (·) is an Attention mechanism, and Concat (·) is a splicing operation.
According to the embodiment, the cross-modal feature diagram is obtained, deep cross-modal fusion is realized while independent information of each mode is reserved, and the relevance and complementarity between the three-mode features are fully excavated.
In S3 of the embodiment of the present invention, based on the cross-modal feature map X, a hole convolution kernel w with different hole ratios is adopted, multi-scale context information is extracted, the feature map is encoded, and the encoded feature is up-sampled and decoded, so as to obtain a final coarse-division instance region. Based on this, step S3 of the embodiment of the present invention specifically includes:
s31, performing convolution operation on the cross-modal feature map X by using the cavity convolution kernels w with different cavity rates to obtain a cross-modal feature map X' after the convolution operation;
s32, carrying out global average pooling, 1X 1 convolution dimension increasing and splicing operation on the cross-modal feature graphs X 'with different void ratios after convolution operation in sequence to realize multi-scale information fusion, so as to obtain a multi-scale cross-modal feature graph X';
s33, merging deep semantic features of the multi-scale cross-modal feature map X' with shallow semantic features of the initial cross-modal feature map X through jump connection to obtain coding features;
s34, up-sampling decoding is carried out on the coding features by utilizing transpose convolution, and rough division example areas corresponding to all examples are obtained.
In step S31 of the present embodiment, the cross-modal feature map X 'after the convolution operation has been subjected to the convolution operation at an arbitrary position X' i Expressed as:
wherein k represents the position on the convolution kernel, r represents the void rate of void convolution, w is the void convolution kernel, and X is a cross-modal feature map; where r can be understood as the step of sampling the element on X, the receptive field size can be adjusted by adjusting the void fraction.
In step S32 of the present embodiment, global context information is characterized by global averaging pooling.
In step S33 of this embodiment, since a great amount of detail information is lost in the feature map during the process of convolutionally extracting features, deep semantic features with strong abstract capability are fused with shallow semantic features with rich details through jump connection.
In step S34 of the present embodiment, based on the above method, the rough segmentation example distinction M obtained by supervised learning is:
M=Deconv(X+X″)
where X is a cross-modal feature map and Deconv (·) is a transpose convolution operation.
In step S4 of the embodiment of the present invention, according to the foregoing obtained rough segmentation result, an embodiment level feature representation may be obtained, so as to obtain feature association information; specifically, as shown in fig. 3, step S4 specifically includes:
s41, carrying out weighted summation on pixel level representations in the image of the coarse-division instance corresponding to the similar instance to obtain instance level representations;
s42, extracting feature association information of each pixel level representation and the instance level representation by using the similarity of the pixel level representation and the instance level representation corresponding to the similar instance.
In step S41 of the present embodiment, it is assumed that K-1 power devices are in total, including K kinds of division targets in total. Each rough object region M is set k Is a two-dimensional graph associated with class k, and the numerical value at each location on the graph indicates the probability that the pixel at that location belongs to class k. Obtaining an instance level representation f by weighting the aggregated pixel level representation k The method comprises the following steps:
wherein X is i For the ith pixel level representation, M ki For the normalized probability that the ith pixel belongs to class k, represent M k The value of the I-th position, I, is the set of pixels.
In step S42 of this embodiment, feature-related information w ik The method comprises the following steps:
in the method, in the process of the invention,is a normalized relationship function.
The step S5 of the embodiment of the invention specifically comprises the following steps:
s51, taking the feature association information as weight, and weighting and aggregating instance-level representations of K areas to obtain association features;
s52, enhancing each pixel level representation in the cross-modal feature map by using the associated features to obtain enhanced pixel level feature representations;
s53, performing transposition convolution operation on the enhanced pixel level feature representation to obtain a final predicted semantic segmentation result graph.
In step S51 of this embodiment, the feature-related information represented by the ith pixel level is used as the corresponding related feature Y in the case of weight i The method comprises the following steps:
wherein, ρ (·) and δ (·) are both transformation functions, w ik For characteristic association information, f k For instance level representation; wherein ρ and δ can be achieved by this operation: 1 x 1conv→bn→relu, conv denotes convolution operation, BN denotes batch normalization, relu denotes linear rectification activation function.
In step S52 of the present embodiment, the enhanced pixel level feature representation Z is:
Z=Concat(X,Y)
in the formula, concat (&) is a splicing operation, and X is a cross-mode characteristic diagram;
in step S53 of the present embodiment, the semantic division result map is expressed as:
R=Deconv(Z)
wherein R is a semantic label represented by each pixel in the semantic segmentation result graph, and Deconv (·) is a transpose convolution operation.
In the embodiment of the invention, before the image semantic segmentation is carried out by using the method, parameter training is needed, the average intersection ratio mIoU is adopted as an evaluation index of supervised learning, and for each class of examples, the ratio of the intersection and the union of the predicted pixel point and the true labeling pixel point is calculated, and then the average value is calculated, wherein the formula is as follows:
wherein p is ii Representing the number of pixels in R for which prediction is correct, p ij Representing pixels in R that are of class i but are predicted to be of class j.
The principles and embodiments of the present invention have been described in detail with reference to specific examples, which are provided to facilitate understanding of the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.
Those of ordinary skill in the art will recognize that the embodiments described herein are for the purpose of aiding the reader in understanding the principles of the present invention and should be understood that the scope of the invention is not limited to such specific statements and embodiments. Those of ordinary skill in the art can make various other specific modifications and combinations from the teachings of the present disclosure without departing from the spirit thereof, and such modifications and combinations remain within the scope of the present disclosure.
Claims (9)
1. The semantic segmentation method for the power inspection image is characterized by comprising the following steps of:
s1, acquiring multi-mode image data of power inspection;
wherein the multi-modal image data includes an RGB image, a thermodynamic diagram, and a depth map;
s2, respectively extracting features of an RGB image, a thermodynamic diagram and a depth diagram, and performing cross-modal feature fusion on the feature diagram to obtain a cross-modal feature diagram;
s3, performing rough segmentation on the example region of the cross-modal feature map to obtain a rough segmentation example region;
s4, extracting feature association information from pixel level to instance level in the cross-modal feature map based on the rough segmentation instance area;
and S5, carrying out pixel-level feature enhancement in the cross-modal feature map based on the feature association information, and further obtaining a predicted semantic segmentation result map.
2. The semantic segmentation method of the power inspection image according to claim 1, wherein in the step S2, the visual features of the power image of the RGB image are extracted through a MobileNet model to obtain a visual feature map XF;
extracting the pixel heat intensity change characteristics in the thermodynamic diagram through a SheffleNet model to obtain a thermodynamic characteristic diagram XT;
and extracting structural features including lines and equipment from the depth map through the PointNet model to obtain a depth feature map XD.
3. The semantic segmentation method of the power inspection image according to claim 2, wherein in the step S2, a bi-directional attention mechanism is adopted to perform cross-modal feature fusion, so as to obtain a cross-modal feature map X, and the expression is as follows:
XF′=XF+Attention(XF,XD)+Attention(XF,XT)
XD′=XD+Attention(XD,XF)+Attention(XD,XT)
XT′=XT+Attention(XT,XD)+Attention(XT,XF)
X=Concat(XF′,XD′,XT′)
in the formula, XF ' is a visual feature map fused with thermal information and depth information, XD ' is a depth feature map fused with visual information and thermal information, XT ' is a thermal feature map fused with visual information and depth information, attention (·) is an Attention mechanism, and Concat (·) is a splicing operation.
4. The semantic segmentation method of the power inspection image according to claim 1, wherein the step S3 specifically comprises:
s31, performing convolution operation on the cross-modal feature map X by using the cavity convolution kernels w with different cavity rates to obtain a cross-modal feature map X' after the convolution operation;
s32, carrying out global average pooling, 1X 1 convolution dimension increasing and splicing operation on the cross-modal feature graphs X 'with different void ratios after convolution operation in sequence to realize multi-scale information fusion, so as to obtain a multi-scale cross-modal feature graph X';
s33, merging deep semantic features of the multi-scale cross-modal feature map X' with shallow semantic features of the initial cross-modal feature map X through jump connection to obtain coding features;
s34, up-sampling decoding is carried out on the coding features by utilizing transpose convolution, and rough division example areas corresponding to all examples are obtained.
5. The method for semantic segmentation of a power inspection image according to claim 4, wherein in the step S31, the convolved cross-modal feature map X 'is at an arbitrary position X' i Expressed as:
wherein k represents the position on the convolution kernel, r represents the void rate of void convolution, w is the void convolution kernel, and X is a cross-modal feature map;
in the step S34, the rough segmentation implementation region is:
M=Deconv(x+X″)
where X is a cross-modal feature map and Deconv (·) is a transpose convolution operation.
6. The semantic segmentation method of the power inspection image according to claim 1, wherein the step S4 specifically comprises:
s41, carrying out weighted summation on pixel level representations in the image of the coarse-division instance corresponding to the similar instance to obtain instance level representations;
s42, extracting feature association information of each pixel level representation and the instance level representation by using the similarity of the pixel level representation and the instance level representation corresponding to the similar instance.
7. The method according to claim 6, wherein in the step S41, the instance level represents f k The method comprises the following steps:
wherein X is i For the ith pixel level representation, M ki For the normalized probability that the ith pixel belongs to class k, represent M k The value of the ith position is added, and I is a pixel set;
in the step S42, feature-related information w ik The method comprises the following steps:
in the method, in the process of the invention,is a normalized relationship function.
8. The semantic segmentation method of the power inspection image according to claim 6, wherein the step S5 specifically comprises:
s51, taking the feature association information as weight, and weighting and aggregating instance-level representations of K areas to obtain association features;
s52, enhancing each pixel level representation in the cross-modal feature map by using the associated features to obtain enhanced pixel level feature representations;
s53, performing transposition convolution operation on the enhanced pixel level feature representation to obtain a final predicted semantic segmentation result graph.
9. The electricity of claim 8The method for semantic segmentation of the force inspection image is characterized in that in the step S51, the feature association information represented by the ith pixel level is taken as the corresponding association feature Y when the weight is taken i The method comprises the following steps:
wherein, ρ (·) and δ (·) are both transformation functions, w ik For characteristic association information, f k For instance level representation;
in the step S52, the enhanced pixel level feature representation Z is:
Z=Concat(X,Y)
in the formula, concat (&) is a splicing operation, and X is a cross-mode characteristic diagram;
in the step S53, the semantic segmentation result map is expressed as:
R=Deconv(Z)
wherein R is a semantic label represented by each pixel in the semantic segmentation result graph, and Deconv (·) is a transpose convolution operation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311183380.2A CN117218345A (en) | 2023-09-13 | 2023-09-13 | Semantic segmentation method for electric power inspection image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311183380.2A CN117218345A (en) | 2023-09-13 | 2023-09-13 | Semantic segmentation method for electric power inspection image |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117218345A true CN117218345A (en) | 2023-12-12 |
Family
ID=89036578
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311183380.2A Pending CN117218345A (en) | 2023-09-13 | 2023-09-13 | Semantic segmentation method for electric power inspection image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117218345A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118052977A (en) * | 2024-02-02 | 2024-05-17 | 北京中成康富科技股份有限公司 | Antenna system and method for millimeter wave therapeutic apparatus |
-
2023
- 2023-09-13 CN CN202311183380.2A patent/CN117218345A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118052977A (en) * | 2024-02-02 | 2024-05-17 | 北京中成康富科技股份有限公司 | Antenna system and method for millimeter wave therapeutic apparatus |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112347859B (en) | Method for detecting significance target of optical remote sensing image | |
CN111210443B (en) | Deformable convolution mixing task cascading semantic segmentation method based on embedding balance | |
Zhou et al. | MFFENet: Multiscale feature fusion and enhancement network for RGB–thermal urban road scene parsing | |
Zhou et al. | Salient object detection in stereoscopic 3D images using a deep convolutional residual autoencoder | |
Yang et al. | Single image haze removal via region detection network | |
Shi et al. | Single‐shot detector with enriched semantics for PCB tiny defect detection | |
CN116758130A (en) | Monocular depth prediction method based on multipath feature extraction and multi-scale feature fusion | |
CN116485860A (en) | Monocular depth prediction algorithm based on multi-scale progressive interaction and aggregation cross attention features | |
CN114663371A (en) | Image salient target detection method based on modal unique and common feature extraction | |
CN117218345A (en) | Semantic segmentation method for electric power inspection image | |
Sun et al. | IRDCLNet: Instance segmentation of ship images based on interference reduction and dynamic contour learning in foggy scenes | |
CN115861756A (en) | Earth background small target identification method based on cascade combination network | |
Sun et al. | Marine ship instance segmentation by deep neural networks using a global and local attention (GALA) mechanism | |
CN114926826A (en) | Scene text detection system | |
CN112800932B (en) | Method for detecting remarkable ship target in offshore background and electronic equipment | |
CN111898671B (en) | Target identification method and system based on fusion of laser imager and color camera codes | |
CN117994573A (en) | Infrared dim target detection method based on superpixel and deformable convolution | |
Liangjun et al. | MSFA-YOLO: A Multi-Scale SAR Ship Detection Algorithm Based on Fused Attention | |
Mao et al. | Stealing stable diffusion prior for robust monocular depth estimation | |
CN117351360A (en) | Remote sensing image road extraction method based on attention mechanism improvement | |
CN115641445B (en) | Remote sensing image shadow detection method integrating asymmetric inner convolution and Transformer | |
CN116863223A (en) | Method for classifying remote sensing image scenes by embedding semantic attention features into Swin transform network | |
CN113192018B (en) | Water-cooled wall surface defect video identification method based on fast segmentation convolutional neural network | |
Bakr et al. | Mask R-CNN for moving shadow detection and segmentation | |
Lu et al. | GA-CSPN: generative adversarial monocular depth estimation with second-order convolutional spatial propagation network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |