CN117218345A - Semantic segmentation method for electric power inspection image - Google Patents

Semantic segmentation method for electric power inspection image Download PDF

Info

Publication number
CN117218345A
CN117218345A CN202311183380.2A CN202311183380A CN117218345A CN 117218345 A CN117218345 A CN 117218345A CN 202311183380 A CN202311183380 A CN 202311183380A CN 117218345 A CN117218345 A CN 117218345A
Authority
CN
China
Prior art keywords
cross
feature
modal
feature map
map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311183380.2A
Other languages
Chinese (zh)
Inventor
黄飞虎
战鹏祥
廖思睿
周子堃
彭舰
徐文政
弋沛玉
王金策
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan University
Original Assignee
Sichuan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan University filed Critical Sichuan University
Priority to CN202311183380.2A priority Critical patent/CN117218345A/en
Publication of CN117218345A publication Critical patent/CN117218345A/en
Pending legal-status Critical Current

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention discloses a semantic segmentation method of an electric power inspection image, which comprises the following steps: s1, acquiring an RGB image, a thermodynamic diagram and a depth map of power inspection; s2, respectively extracting features of an RGB image, a thermodynamic diagram and a depth diagram, and performing cross-modal feature fusion on the feature diagram to obtain a cross-modal feature diagram; s3, performing rough segmentation on the example region of the cross-modal feature map to obtain a rough segmentation example region; s4, extracting feature association information from pixel level to instance level in the cross-modal feature map based on the rough segmentation instance area; and S5, carrying out pixel-level feature enhancement in the cross-modal feature map based on the feature association information to obtain a predicted semantic segmentation result map. Compared with the traditional method and the CNN-based image segmentation method, the method can more fully represent the context relation and the global semantic information among the features, and has smaller parameter number and faster reasoning speed compared with the attention mechanism-based image segmentation method.

Description

Semantic segmentation method for electric power inspection image
Technical Field
The invention belongs to the technical field of power equipment inspection, and particularly relates to a semantic segmentation method for an electric power inspection image.
Background
In the power inspection task, image semantic segmentation is an important technology, and can help automatically identify power equipment, defects and other key factors, so that inspection efficiency and inspection accuracy are improved. The existing research method mainly comprises the following steps: (1) semantic segmentation based on traditional computer vision methods. Such methods rely mainly on low-level visual features of the image, and commonly include Graph Cut (Graph Cut), clustering, edge detection, and other techniques. (2) semantic segmentation based on Convolutional Neural Networks (CNNs). Such models employ an encoder-decoder architecture, where the encoder is responsible for extracting features of the image and the decoder is responsible for mapping these features back to the pixel-level segmentation results. In the encoder section, common network structures include VGGNet, resNet and the like. The decoder portion typically uses a transposed convolutional layer for upsampling and restoring resolution. (3) semantic segmentation method based on attention mechanism. Such methods allow the model to automatically learn contextual dependencies between different locations in the image as the image is processed, adaptively focus on regions in the image that are relevant to the semantic segmentation task, and dynamically assign weights for the different regions based on the image content.
Semantic segmentation based on a traditional computer vision method is often dependent on low-level features of an artificial design image, cannot well characterize high-level semantic information, cannot effectively learn context relations among pixels, is difficult to perform semantic understanding on power grid panorama, and is poor in performance in a power inspection scene with complex and changeable environment; the semantic segmentation based on the convolutional neural network only can extract local characteristics, long-distance dependency relationship among pixels is difficult to capture, overlapped and blocked objects in complex scenes cannot be well processed, the method is sensitive to appearance change of power equipment caused by factors such as environment and illumination, and generalization is poor in different scenes; semantic segmentation methods based on attention mechanisms typically require computation of correlation weights between each location and other locations. For large-size images or high-resolution feature images, the computational complexity of the model can be obviously increased, so that the time cost of training and reasoning is increased, and the requirements of scenes such as unmanned aerial vehicle inspection on low computational effort and high real-time performance are difficult to meet.
Disclosure of Invention
Aiming at the defects in the prior art, the semantic segmentation method for the power inspection image solves the problem that the existing semantic segmentation model is difficult to segment the semantic under the complex scene with shielding and appearance change when aiming at the detection of the power inspection defect.
In order to achieve the aim of the invention, the invention adopts the following technical scheme: a semantic segmentation method of an electric power inspection image comprises the following steps:
s1, acquiring multi-mode image data of power inspection;
wherein the multi-modal image data includes an RGB image, a thermodynamic diagram, and a depth map;
s2, respectively extracting features of an RGB image, a thermodynamic diagram and a depth diagram, and performing cross-modal feature fusion on the feature diagram to obtain a cross-modal feature diagram;
s3, performing rough segmentation on the example region of the cross-modal feature map to obtain a rough segmentation example region;
s4, extracting feature association information from pixel level to instance level in the cross-modal feature map based on the rough segmentation instance area;
and S5, carrying out pixel-level feature enhancement in the cross-modal feature map based on the feature association information, and further obtaining a predicted semantic segmentation result map.
Further, in the step S2, the visual characteristics of the power image of the RGB image are extracted through the MobileNet model, so as to obtain a visual characteristic diagram XF;
extracting the pixel heat intensity change characteristics in the thermodynamic diagram through a SheffleNet model to obtain a thermodynamic characteristic diagram XT;
and extracting structural features including lines and equipment from the depth map through the PointNet model to obtain a depth feature map XD.
Further, in the step S2, a bi-directional attention mechanism is adopted to perform cross-modal feature fusion, so as to obtain a cross-modal feature map X, where the expression is as follows:
XF′=XF+Attention(XF,XD)+Attention(XF,XT)
XD′=XD+Attention(XD,XF)+Attention(XD,XT)
XT′=XT+Attention(XT,XD)+Attention(XT,XF)
X=Concat(XF′,XD′,XT′)
in the formula, XF ' is a visual feature map fused with thermal information and depth information, XD ' is a depth feature map fused with visual information and thermal information, XT ' is a thermal feature map fused with visual information and depth information, attention (·) is an Attention mechanism, and Concat (·) is a splicing operation.
Further, the step S3 specifically includes:
s31, performing convolution operation on the cross-modal feature map X by using the cavity convolution kernels w with different cavity rates to obtain a cross-modal feature map X' after the convolution operation;
s32, carrying out global average pooling, 1X 1 convolution dimension increasing and splicing operation on the cross-modal feature graphs X 'with different void ratios after convolution operation in sequence to realize multi-scale information fusion, so as to obtain a multi-scale cross-modal feature graph X';
s33, merging deep semantic features of the multi-scale cross-modal feature map X' with shallow semantic features of the initial cross-modal feature map X through jump connection to obtain coding features;
s34, up-sampling decoding is carried out on the coding features by utilizing transpose convolution, and rough division example areas corresponding to all examples are obtained.
Further, in the step S31, the cross-modal feature map X 'after the convolution operation is at an arbitrary position X' i Expressed as:
wherein k represents the position on the convolution kernel, r represents the void rate of void convolution, w is the void convolution kernel, and X is a cross-modal feature map;
in the step S34, the rough segmentation implementation region is:
M=Deconv(X+X″)
where X is a cross-modal feature map and Deconv (·) is a transpose convolution operation.
Further, the step S4 specifically includes:
s41, carrying out weighted summation on pixel level representations in the image of the coarse-division instance corresponding to the similar instance to obtain instance level representations;
s42, extracting feature association information of each pixel level representation and the instance level representation by using the similarity of the pixel level representation and the instance level representation corresponding to the similar instance.
Further, in the step S41, the instance level represents f k The method comprises the following steps:
wherein X is i For the ith pixel level representation, M ki For the normalized probability that the ith pixel belongs to class k, represent M k The value of the ith position is added, and I is a pixel set;
in the step S42, feature-related information w ik The method comprises the following steps:
in the method, in the process of the invention,is a normalized relationship function.
Further, the step S5 specifically includes:
s51, taking the feature association information as weight, and weighting and aggregating instance-level representations of K areas to obtain association features;
s52, enhancing each pixel level representation in the cross-modal feature map by using the associated features to obtain enhanced pixel level feature representations;
s53, performing transposition convolution operation on the enhanced pixel level feature representation to obtain a final predicted semantic segmentation result graph.
Further, in the step S51, the feature associated information represented by the ith pixel level is used as the associated feature Y corresponding to the weight i The method comprises the following steps:
wherein, ρ (·) and δ (·) are both transformation functions, w ik For characteristic association information, f k For instance level representation;
in the step S52, the enhanced pixel level feature representation Z is:
Z=Concat(X,Y)
in the formula, concat (&) is a splicing operation, and X is a cross-mode characteristic diagram;
in the step S53, the semantic segmentation result map is expressed as:
R=Deconv(Z)
wherein R is a semantic label represented by each pixel in the semantic segmentation result graph, and Deconv (·) is a transpose convolution operation.
The beneficial effects of the invention are as follows:
(1) The invention fully considers the multi-mode and multi-scale information in the power inspection scene. For input images of different modes, different types of lightweight backbone networks are adopted to extract key features.
(2) The invention further extracts semantic features by utilizing multi-cavity rate cavity convolution, and obtains a coarse segmentation example area with good effect by using transposed convolution up-sampling.
(3) According to the invention, the dependency information between the pixel and the object region is aggregated into the pixel representation by utilizing the characteristic association information of the pixel-instance region, so that the pixel representation can be better similar to the abstract representation of the instance to which the pixel belongs, and a refined semantic segmentation result is obtained.
(4) Compared with the traditional method and the CNN-based image segmentation method, the method can more fully represent the context relation and the global semantic information among the features, and has smaller parameter number and faster reasoning speed compared with the attention mechanism-based image segmentation method.
Drawings
Fig. 1 is a flowchart of a semantic segmentation method of a power inspection image provided by the invention.
Fig. 2 is a schematic diagram of cross-modal feature diagram construction provided by the present invention.
FIG. 3 is a schematic diagram of pixel-level to example-level feature association and feature enhancement provided by the present invention.
Detailed Description
The following description of the embodiments of the present invention is provided to facilitate understanding of the present invention by those skilled in the art, but it should be understood that the present invention is not limited to the scope of the embodiments, and all the inventions which make use of the inventive concept are protected by the spirit and scope of the present invention as defined and defined in the appended claims to those skilled in the art.
Example 1:
the embodiment of the invention provides a semantic segmentation method of a power inspection image, as shown in fig. 1, comprising the following steps:
s1, acquiring multi-mode image data of power inspection;
wherein the multi-modal image data includes an RGB image, a thermodynamic diagram, and a depth map;
s2, respectively extracting features of an RGB image, a thermodynamic diagram and a depth diagram, and performing cross-modal feature fusion on the feature diagram to obtain a cross-modal feature diagram;
s3, performing rough segmentation on the example region of the cross-modal feature map to obtain a rough segmentation example region;
s4, extracting feature association information from pixel level to instance level in the cross-modal feature map based on the rough segmentation instance area;
and S5, carrying out pixel-level feature enhancement in the cross-modal feature map based on the feature association information, and further obtaining a predicted semantic segmentation result map.
In step S1 of the embodiment of the present invention, during power inspection, multi-mode image data of the power inspection is collected by multiple devices such as a high-definition color camera, a thermal imager, a laser range finder, etc., including an RGB image, a thermodynamic diagram and a depth map; the RBG high-definition image of the power line and the device shot by the high-definition camera provides visual information; detecting a thermodynamic diagram of the heat distribution of the surface and surrounding areas of the electrical facility by a thermal imager to discover possible overload problems of the electrical power lines and equipment; and 3D structure scanning is carried out by a laser range finder to obtain a depth map, and the three-dimensional space position, the size and other structure information of the circuit and the equipment are provided.
In step S2 of the embodiment of the present invention, as shown in fig. 2, a parallel multi-branch structure is designed and adopted to learn feature graphs of images of different modes, specifically:
the RGB image Xf mainly represents visual information such as edges, textures and the like, so that a depth separable convolution-based MobileNet model is adopted to extract the visual characteristics of the power image of the RGB image, and a visual characteristic diagram XF is obtained; the method comprises the steps of taking a MobileNet model as a light backbone network, efficiently extracting visual characteristics of an electric image, enabling depth separable convolution to be composed of two steps of depth convolution and point-by-point convolution, firstly, performing independent convolution operation on each channel of an input feature map, and then performing dimension lifting or dimension reduction by applying 1*1 convolution operation on the basis.
The heat map Xt mainly expresses the heat intensity change information of pixels, and focuses on the position relation and the local mode of the equipment, so that the heat intensity change characteristics of the pixels in the thermodynamic diagram are extracted through a SheffeNet model to obtain a thermodynamic characteristic map XT; the method comprises the steps of adopting a ShuffleNet as a light backbone network, reducing computational complexity through group convolution, and simultaneously improving the expression capability of characteristics by applying channel shuffling operation to obtain a thermal characteristic diagram XT. This approach is applicable to characterization of the law of thermal changes and local patterns within the image.
The depth map Xd mainly contains the three-dimensional spatial position, size and other structural information of the lines and the equipment, so that the structural features of the lines and the equipment are extracted from the depth map through the PointNet model to obtain a depth feature map XD; the PointNet is adopted as a light backbone network, independent characteristics are directly learned for each point through a multi-layer perceptron, max-Pooling is carried out on all the point characteristics, the maximum response value is obtained as an integral characteristic, and finally a depth characteristic map XD is obtained.
In step S2 of the embodiment of the present invention, cross-modal feature fusion is performed on the extracted visual feature map XF, depth feature map XD, and thermal feature map XT by using a bidirectional attention mechanism, so as to obtain a cross-modal feature map X, where the expression is as follows:
XF′=XF+Attention(XF,XD)+Attention(XF,XT)
XD′=XD+Attention(XD,XF)+Attention(XD,XT)
XT′=XT+Attention(XT,XD)+Attention(XT,XF)
X=Concat(XF′,XD′,XT′)
in the formula, XF ' is a visual feature map fused with thermal information and depth information, XD ' is a depth feature map fused with visual information and thermal information, XT ' is a thermal feature map fused with visual information and depth information, attention (·) is an Attention mechanism, and Concat (·) is a splicing operation.
According to the embodiment, the cross-modal feature diagram is obtained, deep cross-modal fusion is realized while independent information of each mode is reserved, and the relevance and complementarity between the three-mode features are fully excavated.
In S3 of the embodiment of the present invention, based on the cross-modal feature map X, a hole convolution kernel w with different hole ratios is adopted, multi-scale context information is extracted, the feature map is encoded, and the encoded feature is up-sampled and decoded, so as to obtain a final coarse-division instance region. Based on this, step S3 of the embodiment of the present invention specifically includes:
s31, performing convolution operation on the cross-modal feature map X by using the cavity convolution kernels w with different cavity rates to obtain a cross-modal feature map X' after the convolution operation;
s32, carrying out global average pooling, 1X 1 convolution dimension increasing and splicing operation on the cross-modal feature graphs X 'with different void ratios after convolution operation in sequence to realize multi-scale information fusion, so as to obtain a multi-scale cross-modal feature graph X';
s33, merging deep semantic features of the multi-scale cross-modal feature map X' with shallow semantic features of the initial cross-modal feature map X through jump connection to obtain coding features;
s34, up-sampling decoding is carried out on the coding features by utilizing transpose convolution, and rough division example areas corresponding to all examples are obtained.
In step S31 of the present embodiment, the cross-modal feature map X 'after the convolution operation has been subjected to the convolution operation at an arbitrary position X' i Expressed as:
wherein k represents the position on the convolution kernel, r represents the void rate of void convolution, w is the void convolution kernel, and X is a cross-modal feature map; where r can be understood as the step of sampling the element on X, the receptive field size can be adjusted by adjusting the void fraction.
In step S32 of the present embodiment, global context information is characterized by global averaging pooling.
In step S33 of this embodiment, since a great amount of detail information is lost in the feature map during the process of convolutionally extracting features, deep semantic features with strong abstract capability are fused with shallow semantic features with rich details through jump connection.
In step S34 of the present embodiment, based on the above method, the rough segmentation example distinction M obtained by supervised learning is:
M=Deconv(X+X″)
where X is a cross-modal feature map and Deconv (·) is a transpose convolution operation.
In step S4 of the embodiment of the present invention, according to the foregoing obtained rough segmentation result, an embodiment level feature representation may be obtained, so as to obtain feature association information; specifically, as shown in fig. 3, step S4 specifically includes:
s41, carrying out weighted summation on pixel level representations in the image of the coarse-division instance corresponding to the similar instance to obtain instance level representations;
s42, extracting feature association information of each pixel level representation and the instance level representation by using the similarity of the pixel level representation and the instance level representation corresponding to the similar instance.
In step S41 of the present embodiment, it is assumed that K-1 power devices are in total, including K kinds of division targets in total. Each rough object region M is set k Is a two-dimensional graph associated with class k, and the numerical value at each location on the graph indicates the probability that the pixel at that location belongs to class k. Obtaining an instance level representation f by weighting the aggregated pixel level representation k The method comprises the following steps:
wherein X is i For the ith pixel level representation, M ki For the normalized probability that the ith pixel belongs to class k, represent M k The value of the I-th position, I, is the set of pixels.
In step S42 of this embodiment, feature-related information w ik The method comprises the following steps:
in the method, in the process of the invention,is a normalized relationship function.
The step S5 of the embodiment of the invention specifically comprises the following steps:
s51, taking the feature association information as weight, and weighting and aggregating instance-level representations of K areas to obtain association features;
s52, enhancing each pixel level representation in the cross-modal feature map by using the associated features to obtain enhanced pixel level feature representations;
s53, performing transposition convolution operation on the enhanced pixel level feature representation to obtain a final predicted semantic segmentation result graph.
In step S51 of this embodiment, the feature-related information represented by the ith pixel level is used as the corresponding related feature Y in the case of weight i The method comprises the following steps:
wherein, ρ (·) and δ (·) are both transformation functions, w ik For characteristic association information, f k For instance level representation; wherein ρ and δ can be achieved by this operation: 1 x 1conv→bn→relu, conv denotes convolution operation, BN denotes batch normalization, relu denotes linear rectification activation function.
In step S52 of the present embodiment, the enhanced pixel level feature representation Z is:
Z=Concat(X,Y)
in the formula, concat (&) is a splicing operation, and X is a cross-mode characteristic diagram;
in step S53 of the present embodiment, the semantic division result map is expressed as:
R=Deconv(Z)
wherein R is a semantic label represented by each pixel in the semantic segmentation result graph, and Deconv (·) is a transpose convolution operation.
In the embodiment of the invention, before the image semantic segmentation is carried out by using the method, parameter training is needed, the average intersection ratio mIoU is adopted as an evaluation index of supervised learning, and for each class of examples, the ratio of the intersection and the union of the predicted pixel point and the true labeling pixel point is calculated, and then the average value is calculated, wherein the formula is as follows:
wherein p is ii Representing the number of pixels in R for which prediction is correct, p ij Representing pixels in R that are of class i but are predicted to be of class j.
The principles and embodiments of the present invention have been described in detail with reference to specific examples, which are provided to facilitate understanding of the method and core ideas of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.
Those of ordinary skill in the art will recognize that the embodiments described herein are for the purpose of aiding the reader in understanding the principles of the present invention and should be understood that the scope of the invention is not limited to such specific statements and embodiments. Those of ordinary skill in the art can make various other specific modifications and combinations from the teachings of the present disclosure without departing from the spirit thereof, and such modifications and combinations remain within the scope of the present disclosure.

Claims (9)

1. The semantic segmentation method for the power inspection image is characterized by comprising the following steps of:
s1, acquiring multi-mode image data of power inspection;
wherein the multi-modal image data includes an RGB image, a thermodynamic diagram, and a depth map;
s2, respectively extracting features of an RGB image, a thermodynamic diagram and a depth diagram, and performing cross-modal feature fusion on the feature diagram to obtain a cross-modal feature diagram;
s3, performing rough segmentation on the example region of the cross-modal feature map to obtain a rough segmentation example region;
s4, extracting feature association information from pixel level to instance level in the cross-modal feature map based on the rough segmentation instance area;
and S5, carrying out pixel-level feature enhancement in the cross-modal feature map based on the feature association information, and further obtaining a predicted semantic segmentation result map.
2. The semantic segmentation method of the power inspection image according to claim 1, wherein in the step S2, the visual features of the power image of the RGB image are extracted through a MobileNet model to obtain a visual feature map XF;
extracting the pixel heat intensity change characteristics in the thermodynamic diagram through a SheffleNet model to obtain a thermodynamic characteristic diagram XT;
and extracting structural features including lines and equipment from the depth map through the PointNet model to obtain a depth feature map XD.
3. The semantic segmentation method of the power inspection image according to claim 2, wherein in the step S2, a bi-directional attention mechanism is adopted to perform cross-modal feature fusion, so as to obtain a cross-modal feature map X, and the expression is as follows:
XF′=XF+Attention(XF,XD)+Attention(XF,XT)
XD′=XD+Attention(XD,XF)+Attention(XD,XT)
XT′=XT+Attention(XT,XD)+Attention(XT,XF)
X=Concat(XF′,XD′,XT′)
in the formula, XF ' is a visual feature map fused with thermal information and depth information, XD ' is a depth feature map fused with visual information and thermal information, XT ' is a thermal feature map fused with visual information and depth information, attention (·) is an Attention mechanism, and Concat (·) is a splicing operation.
4. The semantic segmentation method of the power inspection image according to claim 1, wherein the step S3 specifically comprises:
s31, performing convolution operation on the cross-modal feature map X by using the cavity convolution kernels w with different cavity rates to obtain a cross-modal feature map X' after the convolution operation;
s32, carrying out global average pooling, 1X 1 convolution dimension increasing and splicing operation on the cross-modal feature graphs X 'with different void ratios after convolution operation in sequence to realize multi-scale information fusion, so as to obtain a multi-scale cross-modal feature graph X';
s33, merging deep semantic features of the multi-scale cross-modal feature map X' with shallow semantic features of the initial cross-modal feature map X through jump connection to obtain coding features;
s34, up-sampling decoding is carried out on the coding features by utilizing transpose convolution, and rough division example areas corresponding to all examples are obtained.
5. The method for semantic segmentation of a power inspection image according to claim 4, wherein in the step S31, the convolved cross-modal feature map X 'is at an arbitrary position X' i Expressed as:
wherein k represents the position on the convolution kernel, r represents the void rate of void convolution, w is the void convolution kernel, and X is a cross-modal feature map;
in the step S34, the rough segmentation implementation region is:
M=Deconv(x+X″)
where X is a cross-modal feature map and Deconv (·) is a transpose convolution operation.
6. The semantic segmentation method of the power inspection image according to claim 1, wherein the step S4 specifically comprises:
s41, carrying out weighted summation on pixel level representations in the image of the coarse-division instance corresponding to the similar instance to obtain instance level representations;
s42, extracting feature association information of each pixel level representation and the instance level representation by using the similarity of the pixel level representation and the instance level representation corresponding to the similar instance.
7. The method according to claim 6, wherein in the step S41, the instance level represents f k The method comprises the following steps:
wherein X is i For the ith pixel level representation, M ki For the normalized probability that the ith pixel belongs to class k, represent M k The value of the ith position is added, and I is a pixel set;
in the step S42, feature-related information w ik The method comprises the following steps:
in the method, in the process of the invention,is a normalized relationship function.
8. The semantic segmentation method of the power inspection image according to claim 6, wherein the step S5 specifically comprises:
s51, taking the feature association information as weight, and weighting and aggregating instance-level representations of K areas to obtain association features;
s52, enhancing each pixel level representation in the cross-modal feature map by using the associated features to obtain enhanced pixel level feature representations;
s53, performing transposition convolution operation on the enhanced pixel level feature representation to obtain a final predicted semantic segmentation result graph.
9. The electricity of claim 8The method for semantic segmentation of the force inspection image is characterized in that in the step S51, the feature association information represented by the ith pixel level is taken as the corresponding association feature Y when the weight is taken i The method comprises the following steps:
wherein, ρ (·) and δ (·) are both transformation functions, w ik For characteristic association information, f k For instance level representation;
in the step S52, the enhanced pixel level feature representation Z is:
Z=Concat(X,Y)
in the formula, concat (&) is a splicing operation, and X is a cross-mode characteristic diagram;
in the step S53, the semantic segmentation result map is expressed as:
R=Deconv(Z)
wherein R is a semantic label represented by each pixel in the semantic segmentation result graph, and Deconv (·) is a transpose convolution operation.
CN202311183380.2A 2023-09-13 2023-09-13 Semantic segmentation method for electric power inspection image Pending CN117218345A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311183380.2A CN117218345A (en) 2023-09-13 2023-09-13 Semantic segmentation method for electric power inspection image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311183380.2A CN117218345A (en) 2023-09-13 2023-09-13 Semantic segmentation method for electric power inspection image

Publications (1)

Publication Number Publication Date
CN117218345A true CN117218345A (en) 2023-12-12

Family

ID=89036578

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311183380.2A Pending CN117218345A (en) 2023-09-13 2023-09-13 Semantic segmentation method for electric power inspection image

Country Status (1)

Country Link
CN (1) CN117218345A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118052977A (en) * 2024-02-02 2024-05-17 北京中成康富科技股份有限公司 Antenna system and method for millimeter wave therapeutic apparatus

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118052977A (en) * 2024-02-02 2024-05-17 北京中成康富科技股份有限公司 Antenna system and method for millimeter wave therapeutic apparatus

Similar Documents

Publication Publication Date Title
CN112347859B (en) Method for detecting significance target of optical remote sensing image
CN111210443B (en) Deformable convolution mixing task cascading semantic segmentation method based on embedding balance
Zhou et al. MFFENet: Multiscale feature fusion and enhancement network for RGB–thermal urban road scene parsing
Zhou et al. Salient object detection in stereoscopic 3D images using a deep convolutional residual autoencoder
Yang et al. Single image haze removal via region detection network
Shi et al. Single‐shot detector with enriched semantics for PCB tiny defect detection
CN116758130A (en) Monocular depth prediction method based on multipath feature extraction and multi-scale feature fusion
CN116485860A (en) Monocular depth prediction algorithm based on multi-scale progressive interaction and aggregation cross attention features
CN114663371A (en) Image salient target detection method based on modal unique and common feature extraction
CN117218345A (en) Semantic segmentation method for electric power inspection image
Sun et al. IRDCLNet: Instance segmentation of ship images based on interference reduction and dynamic contour learning in foggy scenes
CN115861756A (en) Earth background small target identification method based on cascade combination network
Sun et al. Marine ship instance segmentation by deep neural networks using a global and local attention (GALA) mechanism
CN114926826A (en) Scene text detection system
CN112800932B (en) Method for detecting remarkable ship target in offshore background and electronic equipment
CN111898671B (en) Target identification method and system based on fusion of laser imager and color camera codes
CN117994573A (en) Infrared dim target detection method based on superpixel and deformable convolution
Liangjun et al. MSFA-YOLO: A Multi-Scale SAR Ship Detection Algorithm Based on Fused Attention
Mao et al. Stealing stable diffusion prior for robust monocular depth estimation
CN117351360A (en) Remote sensing image road extraction method based on attention mechanism improvement
CN115641445B (en) Remote sensing image shadow detection method integrating asymmetric inner convolution and Transformer
CN116863223A (en) Method for classifying remote sensing image scenes by embedding semantic attention features into Swin transform network
CN113192018B (en) Water-cooled wall surface defect video identification method based on fast segmentation convolutional neural network
Bakr et al. Mask R-CNN for moving shadow detection and segmentation
Lu et al. GA-CSPN: generative adversarial monocular depth estimation with second-order convolutional spatial propagation network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination