CN113205526B - Distribution line accurate semantic segmentation method based on multi-source information fusion - Google Patents

Distribution line accurate semantic segmentation method based on multi-source information fusion Download PDF

Info

Publication number
CN113205526B
CN113205526B CN202110355431.XA CN202110355431A CN113205526B CN 113205526 B CN113205526 B CN 113205526B CN 202110355431 A CN202110355431 A CN 202110355431A CN 113205526 B CN113205526 B CN 113205526B
Authority
CN
China
Prior art keywords
semantic segmentation
mask
rcnn
distribution line
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110355431.XA
Other languages
Chinese (zh)
Other versions
CN113205526A (en
Inventor
张冬
高明
刘灵光
卢健
盛晓翔
顾礼峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huaianhongneng Group Co ltd
HuaiAn Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Yijiahe Technology Co Ltd
Original Assignee
Huaianhongneng Group Co ltd
HuaiAn Power Supply Co of State Grid Jiangsu Electric Power Co Ltd
Yijiahe Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huaianhongneng Group Co ltd, HuaiAn Power Supply Co of State Grid Jiangsu Electric Power Co Ltd, Yijiahe Technology Co Ltd filed Critical Huaianhongneng Group Co ltd
Priority to CN202110355431.XA priority Critical patent/CN113205526B/en
Publication of CN113205526A publication Critical patent/CN113205526A/en
Application granted granted Critical
Publication of CN113205526B publication Critical patent/CN113205526B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/28Quantising the image, e.g. histogram thresholding for discrimination between background and foreground patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20076Probabilistic image processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The invention relates to the technical field of computer vision and image processing, and discloses a distribution line accurate semantic segmentation method based on multi-source information fusion, which is used for acquiring a 3D (three-dimensional) point cloud picture of a laser radar and an RGB (red, green and blue) image of a high-precision vision camera and fusing the three images; improving a Mask-RCNN network, and constructing an improved Mask-RCNN semantic segmentation model; and improving the loss function; acquiring distribution line pictures on site to prepare a data set, and dividing the data set into a test set and a training set; preprocessing a data set, and training and testing the improved Mask-RCNN semantic segmentation model by utilizing a test set and a training set; and inputting the fused data serving as network input into an improved Mask-RCNN semantic segmentation model for semantic segmentation. Compared with the prior art, the method is based on the improved Mask-RCNN semantic segmentation model, and accurate high-speed semantic segmentation is achieved on the distribution line.

Description

Distribution line accurate semantic segmentation method based on multi-source information fusion
Technical Field
The invention relates to the technical field of computer vision and image processing, in particular to a distribution line accurate semantic segmentation method based on multi-source information fusion.
Background
With the vigorous development of economy in China, the power utilization can not be left everywhere in social production and daily life of people at present, so that higher requirements are put forward for power supply departments, and not only sufficient power supply quantity but also higher power supply reliability are ensured. The technology of "performing live-line work on a distribution line" is developed for satisfying related operations such as maintenance, inspection, and testing on a power supply device and a power supply line under a condition of continuous power supply.
However, the operator has a high risk when performing live-line work on the distribution line, so the accurate safety warning system is very important when performing live-line work. The accurate semantic segmentation of the distribution line is one of the most core technologies in the live-wire work safety early warning, and the reliability of the safety early warning is directly determined by the semantic segmentation precision. The existing distribution line erection environment is complex, the facility arrangement is dense, and therefore distribution line information acquired by a single sensor is easily influenced by surrounding complex environmental factors, acquired data information is inaccurate, and the reliability of safety early warning is reduced. And the problems of low precision, low early warning reliability and the like exist in most of the existing distribution line semantic segmentation.
Image semantic segmentation refers to the segmentation of pixels expressing different semantic categories from the perspective of the pixels, and is one of the core technologies of image processing tasks. With the introduction of the artificial intelligence era, image semantic segmentation gradually becomes a research hotspot in advanced science and technology fields such as unmanned driving, indoor navigation and the like.
In the field of image semantic segmentation, machine learning technology represented by deep learning continuously obtains better results, and gradually replaces the traditional segmentation method. Compared with the traditional segmentation method, the segmentation method based on deep learning can independently learn and extract the characteristics of the image by building a deep learning network, so that end-to-end classification learning is carried out, and the speed and the precision of semantic segmentation can be effectively improved.
In 2015, a Full Convolutional Network (FCN) was proposed for the first time, which is to use a deep learning technique in the field of semantic segmentation for the first time, convert all full connection layers used for a picture classification task in a Convolutional neural network into Convolutional layers, and introduce a deconvolution layer and a hopping structure, thereby ensuring the stability and robustness of the network. With the advent of FCN, deep learning formally enters the field of image semantic segmentation.
As a most commonly used model in the field of medical image segmentation, U-Net is well-known as its most typical U-shaped symmetric structure, and both sides of the U-shaped symmetric structure are respectively subjected to down-sampling operation and up-sampling operation. Context information of the image can be obtained through downsampling, and accurate positioning of the boundary of semantic segmentation can be achieved through upsampling, so that the model can have high segmentation capability under the condition of training less data. In the same year, a semantic segmentation model named SegNet is developed, which adopts an encoder-decoder structure to perform semantic segmentation on an image and performs upsampling by using an index of maxpool, thereby saving the memory of a network model.
Semantic segmentation models of deep lab series of the Google team are also advancing in the field of semantic segmentation. The Deep Convolutional Neural Network (DCNN) and the fully-connected Conditional Random Field (CRF) form the deep convolutional 1, so that the problem of inaccurate positioning of the deep convolutional neural network can be effectively solved. The DeepLabv2 semantic segmentation model is innovated on the basis of DeepLabv1, and a cavity space convolution pooling pyramid (ASPP) module is fused on the model structure. The module can effectively improve the network segmentation capability. The improved DeepLabv3 version appears in the same year, and the core idea is to improve the ASPP structure and introduce a batch normalization layer, so that the segmentation precision of the network is improved. And the latest DeepLabv3+ semantic segmentation model adds a coder and a decoder and an Xception backbone network on the basis of DeepLabv3, thereby improving the speed and the precision of network semantic segmentation.
In addition, the PSPNet semantic segmentation model proposed by Zhao et al introduces a pyramid pooling module, so that the semantic segmentation network can improve the capability of acquiring the global context information of the image. And a Mask-RCNN semantic segmentation model proposed by He et al, which mainly expands the fast-RCNN model, adds a network branch for segmenting tasks on the basis of the model, adopts ROIAlign to replace RoIPooling in the fast-RCNN, and combines a residual error network and a Feature Pyramid Network (FPN) for feature extraction of an image, so that the network realizes high-quality segmentation of the image while detecting a target.
A large number of experiments show that the image semantic segmentation algorithm based on deep learning has better performance in the aspect of processing image semantic segmentation. However, the hot-line work environment is complex, the requirement on the segmentation precision is high, and the traditional semantic segmentation model cannot meet the work requirement.
Disclosure of Invention
The purpose of the invention is as follows: aiming at the problems in the prior art, the invention provides a distribution line accurate semantic segmentation method based on multi-source information fusion.
The technical scheme is as follows: the invention provides a distribution line accurate semantic segmentation method based on multi-source information fusion, wherein a laser radar and a high-precision vision camera are installed on one side of a distribution line and are electrically connected with a distribution line accurate semantic segmentation system, and the distribution line accurate semantic segmentation system acquires information of the laser radar and the high-precision vision camera and then realizes semantic segmentation through the following steps:
setp 1: acquiring a 3D point cloud picture of a laser radar and an RGB image of a high-precision vision camera, and carrying out registration fusion on the two images;
setp 2: improving a Mask-RCNN network, modifying a downsampling structure of ResNet, disassembling a large-kernel convolution for the ResNet network, replacing the large-kernel convolution with a plurality of layers of small convolutions, providing a new network structure, and constructing an improved Mask-RCNN semantic segmentation model;
setp 3: improving a Mask-RCNN semantic segmentation model loss function, and adding an L2 norm loss function at the tail of the original Mask-RCNN loss function to increase the constraint of the distribution line shape;
setp 4: acquiring related distribution line pictures on a live-line work site to prepare a data set, and dividing the data set into a test set and a training set;
setp 5: preprocessing the data set, and training and testing the improved Mask-RCNN semantic segmentation model by utilizing a test set and a training set;
setp 6: and inputting the data fused with the Setp1 as network input into an improved Mask-RCNN semantic segmentation model for semantic segmentation.
Further, the improved Mask-RCNN semantic segmentation model modifies a candidate region of the Mask-RCNN, and the candidate region selection method comprises the following steps:
firstly, extracting candidate regions from a picture obtained by Setp1 by using a Hough line (arc line) detection algorithm, directly abandoning the regions without lines (arcs) and reducing the original 2000 candidate regions into 100 candidate regions;
then, the picture is directly normalized to a format required by the convolutional network, the whole picture is sent to the convolutional network, the fifth common pooling layer is replaced by a RoI pooling layer, the picture is subjected to 5 layers of convolution operation to obtain a feature map, the obtained coordinate information is converted into coordinates corresponding to the feature map through a certain mapping relation, a corresponding candidate area is intercepted, a feature vector with a fixed length is extracted through the RoI layer, and the feature vector is sent to a full connection layer.
Further, the new network structure after modification has ResNet50 as the backbone network, and ResNet uses cross-layer connection.
Further, the middle n × n convolutional blocks of the new network structure are changed to 1 × n and n × 1 convolutional block pair, and each pair of convolutional blocks are connected in parallel.
Further, the modified loss function in Setp2 is defined as:
L=L cls +L box +αL mask +βL re (1)
wherein L is cls ,L box ,L mask Respectively classifying loss, detecting frame loss and Mask loss in Mask-RCNN semantic segmentation model loss function, L re For the registration loss of the 3D point cloud data, alpha and beta respectively represent the mask loss and the weight coefficient of the registration loss; l is mask 、L re 、L cls And L box Are respectively defined as:
Figure BDA0003003557090000031
Figure BDA0003003557090000032
L cls (p i ,p i * )=-log[p i p i * +(1-p i )(1-p i * )] (4)
Figure BDA0003003557090000041
wherein, y (i) ,y' (i) Respectively a true value and a predicted value; p is a radical of i A predicted classification probability for an anchor point; when the anchor point is a positive sample, p i * 1; when the anchor point is negative, p i * =0;t i Is the predicted offset of the anchor point and,
Figure BDA0003003557090000042
representing the offset of the anchor point relative to the true value;
Figure BDA0003003557090000043
r is SmoothL 1 The function of the function(s) is,
Figure BDA0003003557090000044
further, the specific steps of fusing the 3D point cloud image of the laser radar and the RGB image of the high-precision vision camera are as follows: firstly, defining a uniform coordinate system, establishing registration relation between feature points of a 3D point cloud picture and RGB images, and enabling a point p on a space coordinate system on a radar point cloud picture to be in contact with the feature points i The (x, y, z) is mapped into a plane coordinate system in a two-dimensional space, and is input into a subsequent semantic segmentation model as a network input.
Further, when the improved Mask-RCNN semantic segmentation model is trained and tested by using a test set and a training set, the following processing needs to be performed on a data set:
1) zooming the picture: during training and testing of an improved Mask-RCNN semantic segmentation model, zooming pictures in a data set into 960 x 540;
2) data enhancement: and (4) averaging the pictures in the data set and training by utilizing horizontal inversion.
Has the advantages that:
1. according to the invention, data are acquired based on a multi-source information fusion mode, accurate identification and extraction of the distribution line can be realized by using information of multiple dimensions, the accuracy and integrity of extraction of the distribution line are effectively improved, and the reliability of the safety early warning system is further ensured.
2. The invention is based on a classical improved method for a ResNet network, namely, the large-kernel convolution is disassembled, namely, the large-kernel convolution is replaced by a plurality of layers of small convolutions, so that the depth of the network can be deepened.
3. The network structure proposed by the invention: the middle n multiplied by n convolution blocks are changed into n pairs of convolution blocks of 1 multiplied by n and n multiplied by 1, and each pair of convolution blocks are connected in parallel, so that the network computing speed is accelerated, and the probability of network overfitting is reduced.
4. The invention adds an L2 loss at the end of the loss function to increase the constraint of the distribution line shape.
Drawings
FIG. 1 is a schematic diagram of the fusion of a 3D point cloud image and RGB image data of a laser radar;
fig. 2 is a structure diagram of a backbone network ResNet 101;
FIG. 3 is a diagram of a ResNet improvement architecture;
fig. 4 is a diagram illustrating a distribution line segmentation result according to an embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.
The invention discloses a distribution line accurate semantic segmentation method based on multi-source information fusion, wherein a laser radar and a high-precision vision camera are installed on one side of a distribution line and are electrically connected with a distribution line accurate semantic segmentation system, and the distribution line accurate semantic segmentation system acquires information of the laser radar and the high-precision vision camera and then realizes semantic segmentation through the following steps:
setp 1: and acquiring a 3D point cloud picture of the laser radar and an RGB image of the high-precision vision camera, and registering and fusing the two images.
Setp 2: the Mask-RCNN network is improved, a downsampling structure of ResNet is modified, large-kernel convolution is disassembled from the ResNet network, the large-kernel convolution is replaced by multiple layers of small convolution, a new network structure is provided, and an improved Mask-RCNN semantic segmentation model is constructed to increase the detection speed of distribution lines.
In feature extraction of the original Mask-RCNN, firstly, coordinate information of 2000 candidate regions (region disposals) is obtained from an input picture by using a selective search algorithm (selective search). In the invention, because the distribution line has very obvious geometric characteristics of straight lines or arcs, a Hough line (arc) detection algorithm is used for extracting candidate regions for a picture, the regions without straight lines (arcs) are directly abandoned, and the original 2000 candidate regions are reduced into 100 candidate regions. By the operation, the training and detection speed of the network can be greatly increased.
Then, directly normalizing the picture to a format required by a convolutional network, sending the whole picture into the convolutional network, replacing a fifth common pooling layer with a RoI pooling layer, carrying out 5-layer convolution operation on the picture to obtain a feature map (feature maps), converting the coordinate information obtained at the beginning into coordinates corresponding to the feature map through a certain mapping relation, intercepting a corresponding candidate region, extracting feature vectors with fixed length after passing through the RoI layer, and sending the feature vectors into a full connection layer.
Setp 3: improving a Mask-RCNN semantic segmentation model loss function, and adding an L2 norm loss function at the tail of the original Mask-RCNN loss function to increase the constraint of the distribution line shape;
setp 4: and acquiring related distribution line pictures on a live-line work site to prepare a data set, and dividing the data set into a test set and a training set.
Setp 5: and preprocessing the data set, and training and testing the improved Mask-RCNN semantic segmentation model by utilizing a test set and a training set.
Setp 6: and (4) inputting the data fused with the Setp1 as network input into an improved Mask-RCNN semantic segmentation model for semantic segmentation.
For multi-source information fusion input:
because the setting up environment of distribution lines is comparatively complicated, the facility arrangement is comparatively intensive, therefore distribution lines information that single sensor gathered receives is influenced by surrounding complex environmental factor very easily for the data information who obtains is inaccurate, and then leads to safety precaution's reliability to reduce. Data are acquired based on a multi-source information fusion mode, accurate identification and extraction of the distribution lines can be achieved through information of multiple dimensions, accuracy and integrity of extraction of the distribution lines are effectively improved, and reliability of a safety early warning system is guaranteed.
Because the environment of live-line work is complex, the invention adopts the laser radar and integrates the high-precision vision camera as multi-source information input, the 3D point cloud picture of the laser radar can accurately acquire the position information of a target, the RGB vision camera can well acquire the surrounding vision information, the two are integrated to more accurately acquire the surrounding environment information of the live-line work, the anti-interference capability of the sensor is improved, and the distribution line is ensured to be completely and accurately identified and extracted.
As the radar cloud point image is 3D data, in order to meet the requirement of input of a Mask-RCNN semantic segmentation model, the fusion result of the radar 3D cloud point image and the RGB image needs to be 4-channel RGB-D data. The fusion algorithm of the radar 3D point cloud picture and the RGB image mainly comprises the following processes: firstly, a uniform coordinate system is defined, and a registration relation between points of the 3D point cloud picture and the RGB image is established. Point p on space coordinate system of radar point cloud picture i (x, y, z) is mapped into a planar coordinate system in two-dimensional space, the mapping formula is as follows:
Figure BDA0003003557090000061
wherein the content of the first and second substances,
Figure BDA0003003557090000062
is the mapped image coordinates, and h and w are the height and width of the desired range image representation. f ═ f u +f d For the vertical field of view of the lidar, f u Is the size of the elevation angle on the horizontal line, f d Is the size of the depression angle below the horizontal. r | | | p i || 2 Representing the range of points on a spherical coordinate system. This allows mapping the points on the 3D point cloud onto coordinates on the RGB image. Therefore, data fusion is realized, and the data fusion is used as network input and is input into a subsequent semantic segmentation model.
Improved Mask-RCNN semantic segmentation model
Improved network structure
The Mask-RCNN is a very flexible framework and can complete various image processing tasks such as target detection, semantic segmentation and the like. In order to ensure the accuracy of the distribution line segmentation of the network, the invention improves the Mask-RCNN network. And modifying a downsampling structure in ResNet according to the characteristics of the distribution line.
The invention uses ResNet50 as a backbone network. ResNet uses cross-layer connections to make training easier. The network structure of ResNet50 is shown in FIG. 2.
Based on a classical improved method for a ResNet network, the method for decomposing the large-kernel convolution is to replace the large-kernel convolution by a plurality of layers of small convolutions, and the structure diagram is shown in figure 3, so that the network depth can be deepened. This idea comes from the inclusion v2 network.
Based on the above improvement method, the present invention provides a new network structure: the middle n × n convolution block is changed to n pairs of 1 × n and n × 1 convolution blocks, and each pair of convolution blocks is connected in parallel. Therefore, the network computing speed can be increased, and the probability of network overfitting is reduced. Referring to fig. 3, the embodiment of the present invention takes 5 × 5 convolution blocks as an example, and changes 5 × 5 convolution blocks into 5 pairs of convolution blocks of 1 × 5 and 5 × 1, and connects each pair of convolution blocks in parallel.
Second, improving the loss function of the model
Because the shape of the distribution line is fixed, the method carries out optimization on the loss function of the Mask-RCNN semantic segmentation model, adds an L2 loss function at the tail of the loss function of the Mask-RCNN semantic segmentation model to strengthen the shape constraint, and defines the improved loss function as follows:
L=L cls +L box +αL mask +βL re (1)
wherein L is cls ,L box ,L mask Respectively classification loss, detection frame loss and Mask loss in Mask-RCNN semantic segmentation model loss function, L re The method comprises the steps that (1) registration loss of 3D point cloud data is obtained, and alpha and beta respectively represent weight coefficients of mask loss and registration loss; l is mask 、L re 、L cls And L box Are respectively defined as:
Figure BDA0003003557090000071
Figure BDA0003003557090000072
L cls (p i ,p i * )=-log[p i p i * +(1-p i )(1-p i * )] (4)
Figure BDA0003003557090000073
wherein, y (i) ,y' (i) Respectively a true value and a predicted value; p is a radical of formula i A predicted classification probability for an anchor point; when the anchor point is a positive sample, p i * 1; when the anchor point is negative, p i * =0;t i Is the predicted offset of the anchor point and,
Figure BDA0003003557090000074
representing the offset of the anchor point relative to the true value;
Figure BDA0003003557090000075
r is smoothL 1 The function of the function(s) is,
Figure BDA0003003557090000076
experiments and analyses
The experimental environment adopted by the invention is shown in table 1, and the parameters in the model training process are shown in table 2:
TABLE 1 Experimental Environment
Figure BDA0003003557090000077
Figure BDA0003003557090000081
TABLE 2 training parameters
Figure BDA0003003557090000082
The data set for Setp4 was processed as follows:
the invention uses the laser radar, integrates the high-precision vision camera to collect the relevant distribution line pictures on the live-line work site to prepare a data set, and the data set comprises 1800 pictures. The dataset is first preprocessed and the image size is set to 1920 x 1080. And then manually marking the data by using a marking tool to generate a label picture and a yaml file storing label names. The invention selects 1700 pictures for training and 100 pictures for testing.
In addition, the following operations are performed on the data set during the model training process.
Zooming the picture: during training and testing of the model herein, to increase the model training speed, the pictures inside the data set need to be scaled to 960 × 540.
Data enhancement: in order to make the input picture meet the requirement of the network architecture, data enhancement such as mean value removal, horizontal inversion and the like is also applied to training.
The method is used for carrying out semantic segmentation on the 10KV distribution line based on the improved Mask-RCNN model, and the visual segmentation result is shown in fig. 4, wherein the first column is an original picture, the second column is a label picture, and the third column is a segmentation result picture.
As shown in fig. 4, the method provided by the present invention can realize accurate segmentation of the distribution line in the complex background of live-wire work.
Meanwhile, the invention selects a plurality of classical semantic segmentation models to compare based on the data set created by the invention, wherein the SegNet semantic segmentation model represents a document: badrinarayanan V, Kendall A, Cipolla R.Seg Net: ADeep capacitive Encoder-Decoder Architecture for Image Segmentation [ J ]. IEEE Transactions on Pattern Analysis and Machine Analysis 2015: 1. The U-Net semantic segmentation model represents the literature: ronneberger O, Fischer P, Brox T.U-Net, volumetric Networks for biological Image Segmentation [ J ]. 2015. The Deeplabv3+ semantic segmentation model represents the literature: chen L C, Zhu Y, Papandrou G, et al, encoder-decoder with associated automatic image segmentation [ C ]// Proceedings of the European conference on computer vision (ECCV),2018: 801-. The Mask-RCNN semantic segmentation model represents the literature: he, K., Gkioxari, G., Dollar, P., et al. (2017). Mask R-CNN. in 2017IEEE International Conference on Computer Vision (ICCV) -Mask R-CNN, Venice, Italy, October 22-29,2017 (pp. 2980-2988). Methods model performance was assessed using mean cross-over ratio (MIoU). The comparison results are shown in table 3, which shows that the method provided by the invention has better effect than other methods.
Table 3 compares the results with other models
Figure BDA0003003557090000091
The above embodiments are merely illustrative of the technical concepts and features of the present invention, and the purpose of the embodiments is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered in the protection scope of the present invention.

Claims (7)

1. The utility model provides a distribution lines accurate semantic segmentation method based on multisource information fusion which characterized in that installs laser radar and high accuracy vision camera in distribution lines one side, and it all is connected with the accurate semantic segmentation system electricity of distribution lines, the accurate semantic segmentation system of distribution lines realizes semantic segmentation through following steps after acquireing laser radar and high accuracy vision camera information:
setp 1: acquiring a 3D point cloud picture of a laser radar and an RGB (red, green and blue) image of a high-precision vision camera, and registering and fusing the two images;
setp 2: improving a Mask-RCNN network, modifying a downsampling structure of ResNet, disassembling a large-kernel convolution for the ResNet network, replacing the large-kernel convolution with a plurality of layers of small convolutions, providing a new network structure, and constructing an improved Mask-RCNN semantic segmentation model;
setp 3: improving a Mask-RCNN semantic segmentation model loss function, and adding an L2 norm loss function at the tail of the original Mask-RCNN loss function to increase the constraint of the distribution line shape;
setp 4: collecting related distribution line pictures on a live-line work site to prepare a data set, and dividing the data set into a test set and a training set;
setp 5: preprocessing the data set, and training and testing the improved Mask-RCNN semantic segmentation model by utilizing a test set and a training set;
setp 6: and (4) inputting the data fused with the Setp1 as network input into an improved Mask-RCNN semantic segmentation model for semantic segmentation.
2. The method for accurate semantic segmentation of distribution lines based on multi-source information fusion according to claim 1, wherein the improved Mask-RCNN semantic segmentation model modifies candidate regions of Mask-RCNN, and the candidate region selection method comprises:
firstly, extracting candidate regions from a picture obtained by Setp1 by using a Hough line detection algorithm, directly abandoning the regions detected to have no straight line, and reducing the original 2000 candidate regions into 100 candidate regions;
then, the picture is directly normalized to a format required by the convolutional network, the whole picture is sent to the convolutional network, the fifth common pooling layer is replaced by a RoI pooling layer, the picture is subjected to 5 layers of convolution operation to obtain a feature map, the obtained coordinate information is converted into coordinates corresponding to the feature map through a certain mapping relation, a corresponding candidate area is intercepted, a feature vector with a fixed length is extracted through the RoI layer, and the feature vector is sent to a full connection layer.
3. The distribution line accurate semantic segmentation method based on multi-source information fusion of claim 1, wherein a ResNet50 is used as a backbone network in the modified new network structure, and the ResNet uses cross-layer connection.
4. The distribution line accurate semantic segmentation method based on multi-source information fusion of claim 3, characterized in that the middle nxn convolution blocks of a new network structure are changed into 1 convolution block pair of 1 xn and nx1, and each pair of convolution blocks are connected in parallel.
5. The distribution line accurate semantic segmentation method based on multi-source information fusion of claim 1, wherein the modified loss function in the Setp3 is defined as:
L=L cls +L box +αL mask +βL re (1)
wherein L is cls ,L box ,L mask Respectively classifying loss, detecting frame loss and Mask loss in Mask-RCNN semantic segmentation model loss function, L re For the registration loss of the 3D point cloud data, alpha and beta respectively represent the mask loss and the weight coefficient of the registration loss; l is mask 、L re 、L cls And L box Are respectively defined as:
Figure FDA0003691991600000021
Figure FDA0003691991600000022
L cls (p i ,p i * )=-log[p i p i * +(1-p i )(1-p i * )] (4)
Figure FDA0003691991600000023
wherein, y (i) ,y' (i) Respectively a true value and a predicted value; p is a radical of i A predicted classification probability for an anchor point; when the anchor point is a positive sample, p i * 1; when the anchor point is negative, p i * =0;t i Is an anchorThe predicted offset of the point(s) is,
Figure FDA0003691991600000024
representing the offset of the anchor point relative to the true value;
Figure FDA0003691991600000025
r is smoothL 1 The function of the function(s) is,
Figure FDA0003691991600000026
6. the distribution line accurate semantic segmentation method based on multi-source information fusion of claim 1, wherein the specific steps of fusing the 3D point cloud image of the laser radar and the RGB image of the high-precision vision camera are as follows: firstly, defining a uniform coordinate system, establishing a registration relation between a 3D point cloud picture and RGB image characteristic points, and aligning a point p on a space coordinate system on a radar point cloud picture i The (x, y, z) is mapped into a plane coordinate system in a two-dimensional space, and is input into a subsequent semantic segmentation model as a network input.
7. The distribution line accurate semantic segmentation method based on multi-source information fusion of claim 1, wherein when a test set and a training set are used for training and testing the improved Mask-RCNN semantic segmentation model, the following processing needs to be performed on a data set:
1) zooming the picture: during training and testing of an improved Mask-RCNN semantic segmentation model, zooming pictures in a data set into 960 x 540;
2) data enhancement: the pictures in the data set are de-averaged and trained using horizontal inversion.
CN202110355431.XA 2021-04-01 2021-04-01 Distribution line accurate semantic segmentation method based on multi-source information fusion Active CN113205526B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110355431.XA CN113205526B (en) 2021-04-01 2021-04-01 Distribution line accurate semantic segmentation method based on multi-source information fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110355431.XA CN113205526B (en) 2021-04-01 2021-04-01 Distribution line accurate semantic segmentation method based on multi-source information fusion

Publications (2)

Publication Number Publication Date
CN113205526A CN113205526A (en) 2021-08-03
CN113205526B true CN113205526B (en) 2022-07-26

Family

ID=77026115

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110355431.XA Active CN113205526B (en) 2021-04-01 2021-04-01 Distribution line accurate semantic segmentation method based on multi-source information fusion

Country Status (1)

Country Link
CN (1) CN113205526B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111210443A (en) * 2020-01-03 2020-05-29 吉林大学 Deformable convolution mixing task cascading semantic segmentation method based on embedding balance

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111210443A (en) * 2020-01-03 2020-05-29 吉林大学 Deformable convolution mixing task cascading semantic segmentation method based on embedding balance

Also Published As

Publication number Publication date
CN113205526A (en) 2021-08-03

Similar Documents

Publication Publication Date Title
CN111985376A (en) Remote sensing image ship contour extraction method based on deep learning
Alidoost et al. A CNN-based approach for automatic building detection and recognition of roof types using a single aerial image
CN113240691A (en) Medical image segmentation method based on U-shaped network
CN111950453A (en) Optional-shape text recognition method based on selective attention mechanism
CN109635726B (en) Landslide identification method based on combination of symmetric deep network and multi-scale pooling
CN112149547A (en) Remote sensing image water body identification based on image pyramid guidance and pixel pair matching
Yang et al. An ensemble Wasserstein generative adversarial network method for road extraction from high resolution remote sensing images in rural areas
CN111414954B (en) Rock image retrieval method and system
CN113052106B (en) Airplane take-off and landing runway identification method based on PSPNet network
CN108230330B (en) Method for quickly segmenting highway pavement and positioning camera
CN111461006B (en) Optical remote sensing image tower position detection method based on deep migration learning
CN111462140B (en) Real-time image instance segmentation method based on block stitching
CN113838064B (en) Cloud removal method based on branch GAN using multi-temporal remote sensing data
US11763471B1 (en) Method for large scene elastic semantic representation and self-supervised light field reconstruction
CN115984238A (en) Power grid insulator defect detection method and system based on deep neural network
CN115272306A (en) Solar cell panel grid line enhancement method utilizing gradient operation
CN113706562A (en) Image segmentation method, device and system and cell segmentation method
CN114882494A (en) Multi-mode attention-driven three-dimensional point cloud feature extraction method
CN116612357B (en) Method, system and storage medium for constructing unsupervised RGBD multi-mode data set
CN111881914B (en) License plate character segmentation method and system based on self-learning threshold
CN113205526B (en) Distribution line accurate semantic segmentation method based on multi-source information fusion
CN116883650A (en) Image-level weak supervision semantic segmentation method based on attention and local stitching
Li et al. Study on semantic image segmentation based on convolutional neural network
CN116385477A (en) Tower image registration method based on image segmentation
Chen et al. BARS: A benchmark for airport runway segmentation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant