CN113205526B

CN113205526B - Distribution line accurate semantic segmentation method based on multi-source information fusion

Info

Publication number: CN113205526B
Application number: CN202110355431.XA
Authority: CN
Inventors: 张冬; 高明; 刘灵光; 卢健; 盛晓翔; 顾礼峰
Original assignee: Huaianhongneng Group Co ltd; HuaiAn Power Supply Co of State Grid Jiangsu Electric Power Co Ltd; Yijiahe Technology Co Ltd
Current assignee: Huaianhongneng Group Co ltd; HuaiAn Power Supply Co of State Grid Jiangsu Electric Power Co Ltd; Yijiahe Technology Co Ltd
Priority date: 2021-04-01
Filing date: 2021-04-01
Publication date: 2022-07-26
Anticipated expiration: 2041-04-01
Also published as: CN113205526A

Abstract

The invention relates to the technical field of computer vision and image processing, and discloses a distribution line accurate semantic segmentation method based on multi-source information fusion, which is used for acquiring a 3D (three-dimensional) point cloud picture of a laser radar and an RGB (red, green and blue) image of a high-precision vision camera and fusing the three images; improving a Mask-RCNN network, and constructing an improved Mask-RCNN semantic segmentation model; and improving the loss function; acquiring distribution line pictures on site to prepare a data set, and dividing the data set into a test set and a training set; preprocessing a data set, and training and testing the improved Mask-RCNN semantic segmentation model by utilizing a test set and a training set; and inputting the fused data serving as network input into an improved Mask-RCNN semantic segmentation model for semantic segmentation. Compared with the prior art, the method is based on the improved Mask-RCNN semantic segmentation model, and accurate high-speed semantic segmentation is achieved on the distribution line.

Description

Distribution line accurate semantic segmentation method based on multi-source information fusion

Technical Field

The invention relates to the technical field of computer vision and image processing, in particular to a distribution line accurate semantic segmentation method based on multi-source information fusion.

Background

With the vigorous development of economy in China, the power utilization can not be left everywhere in social production and daily life of people at present, so that higher requirements are put forward for power supply departments, and not only sufficient power supply quantity but also higher power supply reliability are ensured. The technology of "performing live-line work on a distribution line" is developed for satisfying related operations such as maintenance, inspection, and testing on a power supply device and a power supply line under a condition of continuous power supply.

However, the operator has a high risk when performing live-line work on the distribution line, so the accurate safety warning system is very important when performing live-line work. The accurate semantic segmentation of the distribution line is one of the most core technologies in the live-wire work safety early warning, and the reliability of the safety early warning is directly determined by the semantic segmentation precision. The existing distribution line erection environment is complex, the facility arrangement is dense, and therefore distribution line information acquired by a single sensor is easily influenced by surrounding complex environmental factors, acquired data information is inaccurate, and the reliability of safety early warning is reduced. And the problems of low precision, low early warning reliability and the like exist in most of the existing distribution line semantic segmentation.

Image semantic segmentation refers to the segmentation of pixels expressing different semantic categories from the perspective of the pixels, and is one of the core technologies of image processing tasks. With the introduction of the artificial intelligence era, image semantic segmentation gradually becomes a research hotspot in advanced science and technology fields such as unmanned driving, indoor navigation and the like.

In the field of image semantic segmentation, machine learning technology represented by deep learning continuously obtains better results, and gradually replaces the traditional segmentation method. Compared with the traditional segmentation method, the segmentation method based on deep learning can independently learn and extract the characteristics of the image by building a deep learning network, so that end-to-end classification learning is carried out, and the speed and the precision of semantic segmentation can be effectively improved.

In 2015, a Full Convolutional Network (FCN) was proposed for the first time, which is to use a deep learning technique in the field of semantic segmentation for the first time, convert all full connection layers used for a picture classification task in a Convolutional neural network into Convolutional layers, and introduce a deconvolution layer and a hopping structure, thereby ensuring the stability and robustness of the network. With the advent of FCN, deep learning formally enters the field of image semantic segmentation.

As a most commonly used model in the field of medical image segmentation, U-Net is well-known as its most typical U-shaped symmetric structure, and both sides of the U-shaped symmetric structure are respectively subjected to down-sampling operation and up-sampling operation. Context information of the image can be obtained through downsampling, and accurate positioning of the boundary of semantic segmentation can be achieved through upsampling, so that the model can have high segmentation capability under the condition of training less data. In the same year, a semantic segmentation model named SegNet is developed, which adopts an encoder-decoder structure to perform semantic segmentation on an image and performs upsampling by using an index of maxpool, thereby saving the memory of a network model.

Semantic segmentation models of deep lab series of the Google team are also advancing in the field of semantic segmentation. The Deep Convolutional Neural Network (DCNN) and the fully-connected Conditional Random Field (CRF) form the deep convolutional 1, so that the problem of inaccurate positioning of the deep convolutional neural network can be effectively solved. The DeepLabv2 semantic segmentation model is innovated on the basis of DeepLabv1, and a cavity space convolution pooling pyramid (ASPP) module is fused on the model structure. The module can effectively improve the network segmentation capability. The improved DeepLabv3 version appears in the same year, and the core idea is to improve the ASPP structure and introduce a batch normalization layer, so that the segmentation precision of the network is improved. And the latest DeepLabv3+ semantic segmentation model adds a coder and a decoder and an Xception backbone network on the basis of DeepLabv3, thereby improving the speed and the precision of network semantic segmentation.

In addition, the PSPNet semantic segmentation model proposed by Zhao et al introduces a pyramid pooling module, so that the semantic segmentation network can improve the capability of acquiring the global context information of the image. And a Mask-RCNN semantic segmentation model proposed by He et al, which mainly expands the fast-RCNN model, adds a network branch for segmenting tasks on the basis of the model, adopts ROIAlign to replace RoIPooling in the fast-RCNN, and combines a residual error network and a Feature Pyramid Network (FPN) for feature extraction of an image, so that the network realizes high-quality segmentation of the image while detecting a target.

A large number of experiments show that the image semantic segmentation algorithm based on deep learning has better performance in the aspect of processing image semantic segmentation. However, the hot-line work environment is complex, the requirement on the segmentation precision is high, and the traditional semantic segmentation model cannot meet the work requirement.

Disclosure of Invention

The purpose of the invention is as follows: aiming at the problems in the prior art, the invention provides a distribution line accurate semantic segmentation method based on multi-source information fusion.

The technical scheme is as follows: the invention provides a distribution line accurate semantic segmentation method based on multi-source information fusion, wherein a laser radar and a high-precision vision camera are installed on one side of a distribution line and are electrically connected with a distribution line accurate semantic segmentation system, and the distribution line accurate semantic segmentation system acquires information of the laser radar and the high-precision vision camera and then realizes semantic segmentation through the following steps:

setp 1: acquiring a 3D point cloud picture of a laser radar and an RGB image of a high-precision vision camera, and carrying out registration fusion on the two images;

setp 2: improving a Mask-RCNN network, modifying a downsampling structure of ResNet, disassembling a large-kernel convolution for the ResNet network, replacing the large-kernel convolution with a plurality of layers of small convolutions, providing a new network structure, and constructing an improved Mask-RCNN semantic segmentation model;

setp 3: improving a Mask-RCNN semantic segmentation model loss function, and adding an L2 norm loss function at the tail of the original Mask-RCNN loss function to increase the constraint of the distribution line shape;

setp 4: acquiring related distribution line pictures on a live-line work site to prepare a data set, and dividing the data set into a test set and a training set;

setp 5: preprocessing the data set, and training and testing the improved Mask-RCNN semantic segmentation model by utilizing a test set and a training set;

setp 6: and inputting the data fused with the Setp1 as network input into an improved Mask-RCNN semantic segmentation model for semantic segmentation.

Further, the improved Mask-RCNN semantic segmentation model modifies a candidate region of the Mask-RCNN, and the candidate region selection method comprises the following steps:

firstly, extracting candidate regions from a picture obtained by Setp1 by using a Hough line (arc line) detection algorithm, directly abandoning the regions without lines (arcs) and reducing the original 2000 candidate regions into 100 candidate regions;

then, the picture is directly normalized to a format required by the convolutional network, the whole picture is sent to the convolutional network, the fifth common pooling layer is replaced by a RoI pooling layer, the picture is subjected to 5 layers of convolution operation to obtain a feature map, the obtained coordinate information is converted into coordinates corresponding to the feature map through a certain mapping relation, a corresponding candidate area is intercepted, a feature vector with a fixed length is extracted through the RoI layer, and the feature vector is sent to a full connection layer.

Further, the new network structure after modification has ResNet50 as the backbone network, and ResNet uses cross-layer connection.

Further, the middle n × n convolutional blocks of the new network structure are changed to 1 × n and n × 1 convolutional block pair, and each pair of convolutional blocks are connected in parallel.

Further, the modified loss function in Setp2 is defined as:

L＝L _cls +L _box +αL _mask +βL _re (1)

wherein L is _cls ，L _box ，L _mask Respectively classifying loss, detecting frame loss and Mask loss in Mask-RCNN semantic segmentation model loss function, L _re For the registration loss of the 3D point cloud data, alpha and beta respectively represent the mask loss and the weight coefficient of the registration loss; l is _mask 、L _re 、L _cls And L _box Are respectively defined as:

L _cls (p _i ,p _i ^* )＝-log[p _i p _i ^* +(1-p _i )(1-p _i ^* )] (4)

wherein, y ⁽ⁱ⁾ ，y' ⁽ⁱ⁾ Respectively a true value and a predicted value; p is a radical of _i A predicted classification probability for an anchor point; when the anchor point is a positive sample, p _i ^* 1; when the anchor point is negative, p _i ^* ＝0；t _i Is the predicted offset of the anchor point and,

representing the offset of the anchor point relative to the true value;

r is SmoothL ₁ The function of the function(s) is,

further, the specific steps of fusing the 3D point cloud image of the laser radar and the RGB image of the high-precision vision camera are as follows: firstly, defining a uniform coordinate system, establishing registration relation between feature points of a 3D point cloud picture and RGB images, and enabling a point p on a space coordinate system on a radar point cloud picture to be in contact with the feature points _i The (x, y, z) is mapped into a plane coordinate system in a two-dimensional space, and is input into a subsequent semantic segmentation model as a network input.

Further, when the improved Mask-RCNN semantic segmentation model is trained and tested by using a test set and a training set, the following processing needs to be performed on a data set:

1) zooming the picture: during training and testing of an improved Mask-RCNN semantic segmentation model, zooming pictures in a data set into 960 x 540;

2) data enhancement: and (4) averaging the pictures in the data set and training by utilizing horizontal inversion.

Has the advantages that:

1. according to the invention, data are acquired based on a multi-source information fusion mode, accurate identification and extraction of the distribution line can be realized by using information of multiple dimensions, the accuracy and integrity of extraction of the distribution line are effectively improved, and the reliability of the safety early warning system is further ensured.

2. The invention is based on a classical improved method for a ResNet network, namely, the large-kernel convolution is disassembled, namely, the large-kernel convolution is replaced by a plurality of layers of small convolutions, so that the depth of the network can be deepened.

3. The network structure proposed by the invention: the middle n multiplied by n convolution blocks are changed into n pairs of convolution blocks of 1 multiplied by n and n multiplied by 1, and each pair of convolution blocks are connected in parallel, so that the network computing speed is accelerated, and the probability of network overfitting is reduced.

4. The invention adds an L2 loss at the end of the loss function to increase the constraint of the distribution line shape.

Drawings

FIG. 1 is a schematic diagram of the fusion of a 3D point cloud image and RGB image data of a laser radar;

fig. 2 is a structure diagram of a backbone network ResNet 101;

FIG. 3 is a diagram of a ResNet improvement architecture;

fig. 4 is a diagram illustrating a distribution line segmentation result according to an embodiment of the present invention.

Detailed Description

The invention is further described below with reference to the accompanying drawings. The following examples are only for illustrating the technical solutions of the present invention more clearly, and the protection scope of the present invention is not limited thereby.

The invention discloses a distribution line accurate semantic segmentation method based on multi-source information fusion, wherein a laser radar and a high-precision vision camera are installed on one side of a distribution line and are electrically connected with a distribution line accurate semantic segmentation system, and the distribution line accurate semantic segmentation system acquires information of the laser radar and the high-precision vision camera and then realizes semantic segmentation through the following steps:

setp 1: and acquiring a 3D point cloud picture of the laser radar and an RGB image of the high-precision vision camera, and registering and fusing the two images.

Setp 2: the Mask-RCNN network is improved, a downsampling structure of ResNet is modified, large-kernel convolution is disassembled from the ResNet network, the large-kernel convolution is replaced by multiple layers of small convolution, a new network structure is provided, and an improved Mask-RCNN semantic segmentation model is constructed to increase the detection speed of distribution lines.

In feature extraction of the original Mask-RCNN, firstly, coordinate information of 2000 candidate regions (region disposals) is obtained from an input picture by using a selective search algorithm (selective search). In the invention, because the distribution line has very obvious geometric characteristics of straight lines or arcs, a Hough line (arc) detection algorithm is used for extracting candidate regions for a picture, the regions without straight lines (arcs) are directly abandoned, and the original 2000 candidate regions are reduced into 100 candidate regions. By the operation, the training and detection speed of the network can be greatly increased.

Then, directly normalizing the picture to a format required by a convolutional network, sending the whole picture into the convolutional network, replacing a fifth common pooling layer with a RoI pooling layer, carrying out 5-layer convolution operation on the picture to obtain a feature map (feature maps), converting the coordinate information obtained at the beginning into coordinates corresponding to the feature map through a certain mapping relation, intercepting a corresponding candidate region, extracting feature vectors with fixed length after passing through the RoI layer, and sending the feature vectors into a full connection layer.

setp 4: and acquiring related distribution line pictures on a live-line work site to prepare a data set, and dividing the data set into a test set and a training set.

Setp 5: and preprocessing the data set, and training and testing the improved Mask-RCNN semantic segmentation model by utilizing a test set and a training set.

Setp 6: and (4) inputting the data fused with the Setp1 as network input into an improved Mask-RCNN semantic segmentation model for semantic segmentation.

For multi-source information fusion input:

because the setting up environment of distribution lines is comparatively complicated, the facility arrangement is comparatively intensive, therefore distribution lines information that single sensor gathered receives is influenced by surrounding complex environmental factor very easily for the data information who obtains is inaccurate, and then leads to safety precaution's reliability to reduce. Data are acquired based on a multi-source information fusion mode, accurate identification and extraction of the distribution lines can be achieved through information of multiple dimensions, accuracy and integrity of extraction of the distribution lines are effectively improved, and reliability of a safety early warning system is guaranteed.

Because the environment of live-line work is complex, the invention adopts the laser radar and integrates the high-precision vision camera as multi-source information input, the 3D point cloud picture of the laser radar can accurately acquire the position information of a target, the RGB vision camera can well acquire the surrounding vision information, the two are integrated to more accurately acquire the surrounding environment information of the live-line work, the anti-interference capability of the sensor is improved, and the distribution line is ensured to be completely and accurately identified and extracted.

As the radar cloud point image is 3D data, in order to meet the requirement of input of a Mask-RCNN semantic segmentation model, the fusion result of the radar 3D cloud point image and the RGB image needs to be 4-channel RGB-D data. The fusion algorithm of the radar 3D point cloud picture and the RGB image mainly comprises the following processes: firstly, a uniform coordinate system is defined, and a registration relation between points of the 3D point cloud picture and the RGB image is established. Point p on space coordinate system of radar point cloud picture _i (x, y, z) is mapped into a planar coordinate system in two-dimensional space, the mapping formula is as follows:

wherein the content of the first and second substances,

is the mapped image coordinates, and h and w are the height and width of the desired range image representation. f ═ f _u +f _d For the vertical field of view of the lidar, f _u Is the size of the elevation angle on the horizontal line, f _d Is the size of the depression angle below the horizontal. r | | | p _i || ² Representing the range of points on a spherical coordinate system. This allows mapping the points on the 3D point cloud onto coordinates on the RGB image. Therefore, data fusion is realized, and the data fusion is used as network input and is input into a subsequent semantic segmentation model.

Improved Mask-RCNN semantic segmentation model

Improved network structure

The Mask-RCNN is a very flexible framework and can complete various image processing tasks such as target detection, semantic segmentation and the like. In order to ensure the accuracy of the distribution line segmentation of the network, the invention improves the Mask-RCNN network. And modifying a downsampling structure in ResNet according to the characteristics of the distribution line.

The invention uses ResNet50 as a backbone network. ResNet uses cross-layer connections to make training easier. The network structure of ResNet50 is shown in FIG. 2.

Based on a classical improved method for a ResNet network, the method for decomposing the large-kernel convolution is to replace the large-kernel convolution by a plurality of layers of small convolutions, and the structure diagram is shown in figure 3, so that the network depth can be deepened. This idea comes from the inclusion v2 network.

Based on the above improvement method, the present invention provides a new network structure: the middle n × n convolution block is changed to n pairs of 1 × n and n × 1 convolution blocks, and each pair of convolution blocks is connected in parallel. Therefore, the network computing speed can be increased, and the probability of network overfitting is reduced. Referring to fig. 3, the embodiment of the present invention takes 5 × 5 convolution blocks as an example, and changes 5 × 5 convolution blocks into 5 pairs of convolution blocks of 1 × 5 and 5 × 1, and connects each pair of convolution blocks in parallel.

Second, improving the loss function of the model

Because the shape of the distribution line is fixed, the method carries out optimization on the loss function of the Mask-RCNN semantic segmentation model, adds an L2 loss function at the tail of the loss function of the Mask-RCNN semantic segmentation model to strengthen the shape constraint, and defines the improved loss function as follows:

L＝L _cls +L _box +αL _mask +βL _re (1)

wherein L is _cls ，L _box ，L _mask Respectively classification loss, detection frame loss and Mask loss in Mask-RCNN semantic segmentation model loss function, L _re The method comprises the steps that (1) registration loss of 3D point cloud data is obtained, and alpha and beta respectively represent weight coefficients of mask loss and registration loss; l is _mask 、L _re 、L _cls And L _box Are respectively defined as:

L _cls (p _i ,p _i ^* )＝-log[p _i p _i ^* +(1-p _i )(1-p _i ^* )] (4)

wherein, y ⁽ⁱ⁾ ，y' ⁽ⁱ⁾ Respectively a true value and a predicted value; p is a radical of formula _i A predicted classification probability for an anchor point; when the anchor point is a positive sample, p _i ^* 1; when the anchor point is negative, p _i ^* ＝0；t _i Is the predicted offset of the anchor point and,

representing the offset of the anchor point relative to the true value;

r is smoothL ₁ The function of the function(s) is,

experiments and analyses

The experimental environment adopted by the invention is shown in table 1, and the parameters in the model training process are shown in table 2:

TABLE 1 Experimental Environment

TABLE 2 training parameters

The data set for Setp4 was processed as follows:

the invention uses the laser radar, integrates the high-precision vision camera to collect the relevant distribution line pictures on the live-line work site to prepare a data set, and the data set comprises 1800 pictures. The dataset is first preprocessed and the image size is set to 1920 x 1080. And then manually marking the data by using a marking tool to generate a label picture and a yaml file storing label names. The invention selects 1700 pictures for training and 100 pictures for testing.

In addition, the following operations are performed on the data set during the model training process.

Zooming the picture: during training and testing of the model herein, to increase the model training speed, the pictures inside the data set need to be scaled to 960 × 540.

Data enhancement: in order to make the input picture meet the requirement of the network architecture, data enhancement such as mean value removal, horizontal inversion and the like is also applied to training.

The method is used for carrying out semantic segmentation on the 10KV distribution line based on the improved Mask-RCNN model, and the visual segmentation result is shown in fig. 4, wherein the first column is an original picture, the second column is a label picture, and the third column is a segmentation result picture.

As shown in fig. 4, the method provided by the present invention can realize accurate segmentation of the distribution line in the complex background of live-wire work.

Meanwhile, the invention selects a plurality of classical semantic segmentation models to compare based on the data set created by the invention, wherein the SegNet semantic segmentation model represents a document: badrinarayanan V, Kendall A, Cipolla R.Seg Net: ADeep capacitive Encoder-Decoder Architecture for Image Segmentation [ J ]. IEEE Transactions on Pattern Analysis and Machine Analysis 2015: 1. The U-Net semantic segmentation model represents the literature: ronneberger O, Fischer P, Brox T.U-Net, volumetric Networks for biological Image Segmentation [ J ]. 2015. The Deeplabv3+ semantic segmentation model represents the literature: chen L C, Zhu Y, Papandrou G, et al, encoder-decoder with associated automatic image segmentation [ C ]// Proceedings of the European conference on computer vision (ECCV),2018: 801-. The Mask-RCNN semantic segmentation model represents the literature: he, K., Gkioxari, G., Dollar, P., et al. (2017). Mask R-CNN. in 2017IEEE International Conference on Computer Vision (ICCV) -Mask R-CNN, Venice, Italy, October 22-29,2017 (pp. 2980-2988). Methods model performance was assessed using mean cross-over ratio (MIoU). The comparison results are shown in table 3, which shows that the method provided by the invention has better effect than other methods.

Table 3 compares the results with other models

The above embodiments are merely illustrative of the technical concepts and features of the present invention, and the purpose of the embodiments is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered in the protection scope of the present invention.

Claims

1. The utility model provides a distribution lines accurate semantic segmentation method based on multisource information fusion which characterized in that installs laser radar and high accuracy vision camera in distribution lines one side, and it all is connected with the accurate semantic segmentation system electricity of distribution lines, the accurate semantic segmentation system of distribution lines realizes semantic segmentation through following steps after acquireing laser radar and high accuracy vision camera information:

setp 1: acquiring a 3D point cloud picture of a laser radar and an RGB (red, green and blue) image of a high-precision vision camera, and registering and fusing the two images;

setp 4: collecting related distribution line pictures on a live-line work site to prepare a data set, and dividing the data set into a test set and a training set;

2. The method for accurate semantic segmentation of distribution lines based on multi-source information fusion according to claim 1, wherein the improved Mask-RCNN semantic segmentation model modifies candidate regions of Mask-RCNN, and the candidate region selection method comprises:

firstly, extracting candidate regions from a picture obtained by Setp1 by using a Hough line detection algorithm, directly abandoning the regions detected to have no straight line, and reducing the original 2000 candidate regions into 100 candidate regions;

3. The distribution line accurate semantic segmentation method based on multi-source information fusion of claim 1, wherein a ResNet50 is used as a backbone network in the modified new network structure, and the ResNet uses cross-layer connection.

4. The distribution line accurate semantic segmentation method based on multi-source information fusion of claim 3, characterized in that the middle nxn convolution blocks of a new network structure are changed into 1 convolution block pair of 1 xn and nx1, and each pair of convolution blocks are connected in parallel.

5. The distribution line accurate semantic segmentation method based on multi-source information fusion of claim 1, wherein the modified loss function in the Setp3 is defined as:

L＝L _cls +L _box +αL _mask +βL _re (1)

L _cls (p _i ,p _i ^* )＝-log[p _i p _i ^* +(1-p _i )(1-p _i ^* )] (4)

wherein, y ⁽ⁱ⁾ ，y' ⁽ⁱ⁾ Respectively a true value and a predicted value; p is a radical of _i A predicted classification probability for an anchor point; when the anchor point is a positive sample, p _i ^* 1; when the anchor point is negative, p _i ^* ＝0；t _i Is an anchorThe predicted offset of the point(s) is,

representing the offset of the anchor point relative to the true value;

r is smoothL ₁ The function of the function(s) is,

6. the distribution line accurate semantic segmentation method based on multi-source information fusion of claim 1, wherein the specific steps of fusing the 3D point cloud image of the laser radar and the RGB image of the high-precision vision camera are as follows: firstly, defining a uniform coordinate system, establishing a registration relation between a 3D point cloud picture and RGB image characteristic points, and aligning a point p on a space coordinate system on a radar point cloud picture _i The (x, y, z) is mapped into a plane coordinate system in a two-dimensional space, and is input into a subsequent semantic segmentation model as a network input.

7. The distribution line accurate semantic segmentation method based on multi-source information fusion of claim 1, wherein when a test set and a training set are used for training and testing the improved Mask-RCNN semantic segmentation model, the following processing needs to be performed on a data set:

2) data enhancement: the pictures in the data set are de-averaged and trained using horizontal inversion.