CN111553204A

CN111553204A - Transmission tower detection method based on remote sensing image

Info

Publication number: CN111553204A
Application number: CN202010279995.5A
Authority: CN
Inventors: 田桂申; 宋猛; 白雪娇; 刘丽娟; 邹睿翀; 莫明飞; 杨知; 费香泽; 李闯
Original assignee: State Grid Corp of China SGCC; China Electric Power Research Institute Co Ltd CEPRI; East Inner Mongolia Electric Power Co Ltd
Current assignee: State Grid Corp of China SGCC; China Electric Power Research Institute Co Ltd CEPRI; East Inner Mongolia Electric Power Co Ltd
Priority date: 2020-04-10
Filing date: 2020-04-10
Publication date: 2020-08-18
Anticipated expiration: 2040-04-10
Also published as: CN111553204B

Abstract

The invention provides a transmission tower detection method based on a remote sensing image, which mainly utilizes multi-source high-resolution satellite remote sensing data to realize automatic detection and extraction of a transmission tower target, solves the problems of small size, changeable tower shape and the like of a tower body under a overlooking visual angle aiming at the problems of an object detection method in computer vision on a satellite remote sensing image, and realizes automatic detection of the transmission tower in a high-resolution optical remote sensing image. Through the statistical analysis of the number of the detection results of the transmission towers, the normalized monitoring of the construction progress of the large-scale power grid project can be guided.

Description

Transmission tower detection method based on remote sensing image

Technical Field

The invention relates to the field of power grid construction engineering, in particular to a transmission tower detection method based on optical satellite remote sensing images.

Background

In order to realize the progress monitoring of the power grid construction process, the traditional technical means mainly comprise manual inspection and the like. Aiming at large-area networking projects with extremely variable environments and complex and dangerous landforms, the traditional mode has the defects of small monitoring range, high danger, strong environmental restriction, poor reliability and the like, is insufficient in coverage range, timeliness, reliability, safety and the like, and brings challenges to normalized, efficient and refined monitoring and auditing services of the projects.

Compared with the traditional observation means, the satellite observation has incomparable huge advantages, and is particularly suitable for large-range configuration, long-distance power transmission and complex power grid form of environmental change. Particularly, the satellite remote sensing observation coverage is wide, and large-range data information can be rapidly acquired; the satellite remote sensing observation has the characteristics of high information acquisition speed, short updating period and dynamic and timely performance, and is incomparable with manual field measurement and aerial photogrammetry. And secondly, the information obtained by satellite remote sensing observation is less limited by conditions, and data can be obtained in time in regions which are extremely bad in natural conditions such as deserts, high mountains and high mountains, high altitudes and the like and difficult for people to reach. In addition, the satellite remote sensing observation obtains a large amount of information, and can obtain multi-dimensional mass information all day long and all weather by adopting different wave bands such as visible light, ultraviolet rays, infrared rays, microwaves and the like and remote sensing instruments according to different tasks. Therefore, the multi-source high-resolution satellite remote sensing data are fully utilized to realize the normalized progress monitoring of the large-area power grid project progress.

For the tower automatic detection technology in the high-resolution optical remote sensing image, wherein target detection is an important research direction in computer vision, but compared with the target detection problem in general computer vision, the target detection on the remote sensing image has more difficulties: (1) the target size is small. The valuable length and width of the target to be detected are only dozens of pixels generally due to the resolution of the remote sensing image. (2) There is a priori information about the size of the target. The object on the natural image has the phenomenon of big and small, the size change range of the similar problem on different images is large, and the remote sensing image has determined resolution information, so the size of the target to be detected is known. (3) The target angle is random. The target object on the natural image usually has a relatively uniform angle, such as upright or horizontal. And the remote sensing image obtains an overlook image of the target object, and the attitude angle of the target appears randomly.

Compared with the power transmission line body state monitoring technology based on aviation (helicopter or unmanned aerial vehicle) means, the existing power transmission line body state monitoring technology based on satellite remote sensing has less research, because the spatial resolution of satellite remote sensing can not meet the requirement of body state monitoring in the past for a long time. Therefore, for the detection of the transmission tower, the detection can be distinguished only in a form of a point target, and the identification of the type of the tower cannot be realized. In recent years, satellite remote sensing has been rapidly developed as an important means for earth observation. The WorldView series optical satellite has the spatial resolution of 0.3m, can acquire 8 spectral bands from visible light to near infrared, and has a revisit period of about 2 days. Meanwhile, the spatial resolution of Synthetic Aperture Radar (SAR) satellites such as RADARSAT-2 and the like reaches 1m, microwave band ground object target information can be acquired, the requirement of monitoring the large structural state of the power transmission line body is met, and related research is increasingly enriched. In 2003, Liao and the like clearly distinguish the target and the trend of the transmission tower from the SAR image of the flood inundation area of the Huaihe river. In 2005, zhnjinger compared the SAR image and the QuickBird image near the green bridge on the beijing north pentacyclic, and clearly identified the tower beside the overpass in both images. The SAR data used in the two researches have the spatial resolution of 1.25m, and the transmission tower shows obvious triangular bright spots on an SAR image and can be clearly identified. On the basis, in 2007, Yang and the like establish an automatic transmission tower identification model according to the high-resolution polarized SAR image, accurately extract the transmission line in the farmland, and improve the automation level of transmission tower identification. However, the research so far is in the simple identification stage of the body, and the fine identification of the transmission tower lacks related research work. In addition to the above research based on SAR images, power transmission line detection research based on optical satellite remote sensing data is also continuously developing. In 2015, Chen and the like construct peak characteristics of the power transmission conductor in a Cluster Random (CR) frequency domain space, the power transmission conductor is extracted from a Quickbird optical satellite image at home and abroad for the first time, and the milestone significance is achieved. However, the CR frequency domain peak feature in this study actually corresponds to the edge of the wire and its shadow, and the shadow needs to be removed by a priori condition to reduce the false alarm rate. This indicates that the power transmission line cannot be extracted robustly and with high precision by using the CR frequency domain peak feature alone, and it is necessary to improve the algorithm to further mine the power transmission line body feature so as to accurately extract the power transmission line body.

Disclosure of Invention

The invention provides a transmission tower detection method based on remote sensing images, which is oriented to the requirement of the normalized monitoring of the construction progress of a large-scale power grid project and aims at the problems of progress monitoring and auditing in the power grid construction process by the traditional technical means such as manual inspection and the like, and comprises the following steps:

step 1, constructing a transmission tower sample space by using a remote sensing image as a data source to form a transmission tower detection data set;

step 2, labeling the transmission tower detection data set; marking the remote sensing image containing the tower so as to improve the adaptability of tower detection to remote sensing data with different resolutions, different wave bands and different imaging angles;

step 3, splitting the transmission tower detection data set into a body data set and a shadow data set;

step 4, respectively constructing respective multi-angle rotatable candidate frame DRBox deep learning detection models and target detection networks thereof aiming at the body data set and the shadow data set; the multi-angle rotatable candidate frame DRBox is a rotatable candidate frame with angle information, and can search for the target to be detected at different positions of the input image;

step 5, dividing a transmission tower detection task into a tower body detection part and a shadow detection part, and respectively carrying out target detection network training on the basis of respective multi-angle rotatable candidate frame DRBox deep learning detection models;

step 6, cutting the remote sensing image into a plurality of small images with preset sizes;

step 7, respectively carrying out body detection and shadow detection on the transmission tower based on a target detection network of a tower body and shadow multi-angle rotatable candidate frame DRBox deep learning detection model aiming at the plurality of image small images;

and 8, restoring longitude and latitude information from the body detection result and the shadow detection result, and fusing the longitude and latitude information to the original remote sensing image to obtain a final detection result of the transmission tower.

By the method for detecting the transmission tower by using the optical satellite remote sensing image, the target extraction of the transmission tower in a large range under a complex environment is realized by using the high-resolution remote sensing image. Firstly, in order to solve the problem that the shape information of a tower body is possibly incompletely acquired due to the change of an irradiation angle in satellite remote sensing, a technical scheme of comprehensively carrying out tower target detection and type identification by combining a tower body and shadow information is provided, so that a detection frame can be respectively suitable for two situations of an oblique viewing angle (the structural characteristic of the body is obvious) and a directly-below viewing angle (the structural characteristic of the body is not obvious). Secondly, in the technical scheme for detecting the tower target, aiming at the problems that the tower target is large in length and width and large in target direction influence, a multi-angle rotatable candidate frame DRBox deep learning detection model is provided to solve the problem that a detection result is possibly interfered by the obtained visual angle difference in the remote sensing image, and the method is effectively suitable for the arbitrariness of the transmission tower direction in the remote sensing image. Through comparison with a manual marking result, the transmission tower detection recall rate of the method can reach 80%, the accuracy can reach 88.9%, the labor input in engineering operation is greatly reduced, the monitoring efficiency of the large-range transmission line progress in a complex environment is improved, and a decision basis is provided for auxiliary audit in the engineering process.

Drawings

FIG. 1 is a schematic diagram of a WorldView-1 satellite tower (1: 2000);

FIG. 2 is a schematic diagram of a WorldView-2 satellite tower (1: 2000);

FIG. 3 is a schematic diagram of tower frame labeling;

FIG. 4 is a schematic diagram of a DRBox network structure;

FIG. 5 is a flow chart of tower inspection work;

FIG. 6 is a diagram of the detection results of the Monte-Yicheng transmission line towers;

fig. 7 is a flowchart of a transmission tower detection method in the embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced without these specific details. The following detailed description includes references to the accompanying drawings, which form a part of the detailed description. The drawings show illustrations in accordance with exemplary embodiments. These exemplary embodiments, which are also referred to herein as "examples," are described in sufficient detail to enable those skilled in the art to practice the present subject matter. The embodiments may be combined, other embodiments may be utilized, or structural and logical changes may be made without departing from the scope of the claims. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope is defined by the appended claims and their equivalents.

As shown in fig. 7, the invention provides a transmission tower detection method based on an optical satellite remote sensing image, which includes:

step 1, constructing a transmission tower sample space by using a remote sensing image as a data source to form a transmission tower detection data set, wherein the remote sensing image in the embodiment is a high-resolution multi-source satellite remote sensing image;

different from natural images, the factors influencing remote sensing imaging are numerous, including inherent satellite parameters such as orbit height and satellite zenith angle, and are influenced by meteorological conditions such as cloud, rain and fog, so that different satellites image the same scene, the remote sensing data of the satellites have differences in brightness, tone and the like, and even if the same satellite passes through the scene at different times, the imaging quality of the same satellite also has differences. Therefore, in order to improve the performance of the tower detection model to the maximum extent, domestic and foreign mainstream high-resolution satellite data needs to be widely used to construct a sample space with large data volume and strong representativeness, as shown in fig. 1 and 2.

And 2, performing labeling processing on the transmission tower detection data set, and labeling the multi-source remote sensing data containing the tower so as to improve the adaptability of tower detection to remote sensing data with different resolutions, different wave bands and different imaging angles.

In order to guarantee the precision of the tower detection model to the maximum extent, the multi-source remote sensing data containing the tower needs to be labeled. For the position of the tower, the minimum circumscribed rectangle of each tower is marked by using a frame-pulling marking method, so that the specific position of the tower on the image is obtained. The labeling of the tower frame is schematically shown in fig. 3.

It can be seen from the figure that the appearance of the tower on the remote sensing image is mainly white pixels, the theme structure of the tower can be seen on the high-resolution remote sensing image, and in addition, the shadow of the tower, the shadow of the high-voltage line on the ground and the like can be seen. In order to improve the performance of a tower detection model, the position of a tower needs to be marked by a pull frame with a minimum external rectangle, and each marking frame needs to be ensured to completely cover the electric tower, so that omission cannot occur; meanwhile, the frame marking cannot be too large, if the frame marking is too large, more backgrounds (such as farmlands, water channels, trees, roads, buildings and the like) can be contained in the frame of the target, the noise of the sample can be increased, and the recognition effect of the tower detection model can be influenced. Once the noise is more, the data distribution of the tower sample is directly changed greatly, and the training and the generation of a detection model with high recall and low false positive are not facilitated.

aiming at the task characteristics of target detection of a high-resolution remote sensing data tower and considering the task of identifying the type of a follow-up tower, the invention provides a multi-angle rotatable candidate frame DRBox for solving the problem of object detection on a remote sensing image.

The rotation detection frame can be effectively adapted to the arbitrariness of the target direction in the remote sensing image. Compared with the conventional candidate frame, the rotation detection frame has the following advantages:

1) the size and aspect ratio of the frame may reflect the shape of the target object;

2) the RBox contains fewer background pixels, so that the detector can distinguish foreground points from background points;

3) the RBox can effectively avoid overlapping between adjacent target candidate frames, thereby being more beneficial to detecting dense targets.

The multi-angle rotatable candidate box DRBox is a very important ring in the detection. The convolutional network structure ensures that the rotatable candidate frame can search for the target to be detected at different positions of the input image, and at each position, the rotatable candidate frame generates a multi-angle prediction result by rotating a series of angles, which is the biggest difference between the DRBox detection method and other BBox-based detection methods in the invention. To reduce the total number of current frames, the aspect ratio used in the detection is consistent with the target type. Through the multi-angle rotatable candidate frame strategy, the training network converts the detection task into a series of subtasks, each subtask focuses on detection in a narrow angle range, and therefore the influence of target rotation on detection is reduced.

Aiming at the problem of random change of the target angle, the invention provides a rotatable frame DRBox with angle information. Each DRBox contains 7 parameters: the length and width of the target, the abscissa of the center point of the target, the ordinate, the angle of the target, and the probability that the target is judged as foreground and background.

The constructed target detection network consists of a data input layer, a convolutional layer, a priori DRBox layer, a position prediction layer, a confidence prediction layer and a loss layer, as shown in FIG. 4.

a) Data Input Layer (Data Input Layer)

The training data is read in by a data input layer, and the target frame information of the invention comprises a class label of the target, the coordinates of the central point of the target, the angle of the target and the length and the width of the target. Compared with a general detection algorithm, the method requires training data to provide a target frame with angle information.

b) Convolutional Layers (volume Layers)

Because the VGGNet network is deeper and has better feature extraction capability, the convolutional layer of the invention adopts the network as a pre-training model.

c) Priori DRBox Layer (Prior DRBox Layer)

The a priori DRBox layer is connected after the convolutional network for generating a series of a priori drboxes.

The size of the a priori DRBox is preset by the input to the algorithm. When the size of the target to be detected is fixed, directly setting the size of the target; when the size of the target to be detected changes in a small range, selecting the size of the prior DRBox as the average value of the change range; when the size difference of the target to be detected is large, multiple groups of DRBox with different sizes are used, and the DRBox with different sizes are led out from different convolutional layers. For the detection of multiple types of targets, dividing prior DRBox into multiple groups according to target types and prior sizes, and binding each group of DRBox on a selected convolution layer according to a certain strategy.

The locations of the a priori drboxes are derived from the downsampled relationship of the feature map and the input image, and each a priori DRBox is generated from one location on the feature map. For example, for an image of input size 300 × 300, the DRBox generated on an 8 × 8 feature map will cover various regions of the input image in steps of 300/8 ═ 37.5 pixels. When the distance between the targets is less than the step length, the targets are missed to be detected, so that a proper feature layer needs to be determined according to the size of the targets to ensure that the missed detection phenomenon cannot occur. Intuitively, a priori DRBox of smaller size should be generated from shallower layers, while a priori DRBox of larger size may be generated from deeper layers. The network can be made capable of detecting different size targets by generating a priori DRBox on different convolutional layers.

In order to make the network have the capability of detecting the target angle, angle information needs to be introduced into the prior DRBox, and the angle should cover the target angle which may occur. When the head and the tail of the target need to be distinguished, the angle value range of the prior DRBox is 0-360 degrees, and the value is discretely taken according to a preset step length L1; when the target does not need to distinguish the head and the tail, the angle value range of the prior DRBox is 0-180 degrees, and the value is discretely taken according to a preset step length L2. In summary, given a particular target type and size, R a priori drboxes with different angles are generated at each location in a selected convolutional layer signature, and a priori drboxes with targets of different sizes are generated in different convolutional layer signatures.

d) Position Prediction Layer (Location Prediction Layer)

And the position prediction layer generates position correction information of each prior DRBox to obtain the position information of the predicted DRBox, and the position information is obtained by a 3 x 3 sliding window on the feature map of the DRBox.

e) Confidence coefficient predicting Layer (Confidence Prediction Layer)

And the confidence prediction layer generates confidence information for each prior DRBox to obtain the confidence of each type of the predicted DRBox. In the invention, the prior DRBox and the object type are bound, so the confidence information is a 2-dimensional vector which respectively represents the probability that the DRBox belongs to the object and the background and is obtained by a 3 x 3 sliding window on the feature map which leads out the DRBox.

f) Lossy Layer (MultiDRBox Loss Layer)

And the loss layer calculates a loss function and generates an error back-propagation quantity of the network, and the error back-propagation quantity comprises four input parts, namely the complex label part of the data input layer, and the output parts of the position prediction layer, the confidence coefficient prediction layer and the prior DRBox layer. The loss function of the loss layer is weighted and superposed by the position loss and the confidence loss, and the accuracy of the position prediction and the confidence prediction are respectively measured and can be written as follows:

wherein L is_conf(x, c) is confidence loss, x is a labeled variable indicating the matching of the true target bounding box and the predicted DRBox, c is a confidence prediction vector; l is_loc(x, L, g) is the position loss, L is the position prediction vector, i.e. the offset of the predicted DRBox parameter from the prior DRBox parameter, g is the offset of the real target DRBox parameter from the prior DRBox parameter, α is the position loss weight adjustment factor, when α is 0, the position loss L is not used_loc(x, l, g); n is the number of DRBox on the match. When N is 0, the lossy layer outputs 0.

On the basis of a single-stage detection network DRBox, a task is divided into a tower body detection part and a shadow detection part, and models of the two parts are trained respectively, so that a detection frame can adapt to two situations of an oblique visual angle (obvious body structure characteristics) and a right-below visual angle (unobvious body structure characteristics) respectively.

The working process of tower detection is shown in fig. 5, and specifically includes the following steps:

step 5, dividing a transmission tower detection task into a tower body detection part and a shadow detection part, and respectively carrying out target detection network training based on respective multi-angle rotatable candidate frame DRBox deep learning detection models, wherein the method specifically comprises the following steps:

a) carrying out DRBox coding;

b) performing DRBox matching; in the training process, comparing the prior RBox with the real RBox one by one to determine positive and negative samples;

c) carrying out positive and negative sample equalization;

d) calculating loss, namely predicting the difference between the DRBox and the real DRBox, wherein the difference comprises position loss and confidence coefficient loss;

f) and (5) transmitting the error back.

Aiming at the adopted network model, network training comprises five parts of DRBox coding and decoding, DRBox matching, positive and negative sample balancing, loss calculation and error back transmission.

Wherein, a) the specific process of the DRBox coding and decoding is as follows:

in the forward propagation process, a process of obtaining an offset between the ith priori DRBox and the jth real DRBox according to the position information of the ith priori DRBox and the jth real DRBox is an encoding process, and is as follows:

in the formula, the location information of the ith prior DRBox

Location information of jth real target DRBox

Offset information of ith prior DRBox and jth real target DRBox

A vector of offset information representing all real targets, m ∈ { cx, cy, w, h, a };

respectively representing the abscissa, ordinate, width, height and angle of the ith prior DRBox;

respectively representing the abscissa, the ordinate, the width, the height and the angle of the ith real target DRBox;

respectively representing the abscissa, ordinate, width, height and angle of the offset of the ith prior DRBox and the jth real target DRBox.

b) DRBox matching

In the training process, the prior DRBox needs to be compared with the real DRBox one by one so as to determine positive and negative samples. The proximity between the two boxes is depicted at IoU. IoU are all referred to as Intersection-over-units and are defined as the ratio of the Intersection to the union area of the two boxes. To give the angle a higher weight in the calculation of IoU, the absolute value of the cosine of the difference between the two DRBox angles is multiplied after this ratio and this newly defined index is labeled RIoU. If there is no intersection between two DRBox, RIoU is 0. Since each prior DRBox in the algorithm contains class information, if the prior DRBox and the real DRBox do not belong to the same class, RIoU is also 0.

The algorithm matches the prior DRBox and the real DRBox according to the following strategy:

for each real DRBox, if a prior DRBox with RIoU >0 exists, matching the prior DRBox with the maximum RIoU with the real DRBox, and recording a mark variable as 1, otherwise recording as 0;

for each prior DRBox, if a real DRBox with the RIoU larger than 0.5 exists, matching the two DRboxs, and recording the marking variable as 1, otherwise, recording as 0;

recording a set of prior DRBox serial numbers which can be matched with the real DRBox as Pos;

taking the prior DRBox in the set Pos as a positive sample to participate in the calculation of the loss function, and taking other prior DRBox as candidate negative samples;

the marking variable is x ═ x_ij}，x_ij∈{1,0}，x_ijWhether the ith prior DRBox is matched with the jth real DRBox is shown, 1 is matched, and 0 is unmatched;

the RIoU representation IoU is multiplied by the absolute value of the cosine of the two DRBox angle differences; the IoU represents the ratio of the intersection to union area of the two boxes, i.e., IoU is used to describe the proximity between the two boxes)

The prior DRBox on these matches will participate in the calculation of the loss function as a positive sample, with the other prior DRBox as a candidate negative sample.

c) Positive and negative sample equalization

In general, the number of negative samples obtained by the strategy in DRBox matching is much larger than that of positive samples, so that the negative samples need to be reduced to balance the numbers of the negative samples and the positive samples. During model training, we tend to focus more on negative samples that are more easily confused with positive samples. Convolutional network-based detection algorithms typically employ a Hard negative mining method to reduce negative samples. Firstly, calculating the confidence coefficient of all negative samples as a background through a confidence coefficient prediction layer; and secondly, sequencing the background confidence coefficients of all negative samples, and finally, sampling the negative samples in the order from low confidence coefficient to high confidence coefficient to ensure that the number of the positive samples and the number of the negative samples meet a given proportion, thereby obtaining a set Neg of the prior DRBox serial numbers of the negative samples with balanced positive and negative samples.

d) Loss calculation

The cost loss function comprises two parts of position loss and confidence coefficient loss, the cost function is used for calculating the difference between the predicted DRBox and the real DRBox, each prior DRBox corresponds to one predicted DRBox, and relevant parameters of the predicted DRBox are output by a network.

First, position loss

Calculated from the following formula:

wherein the content of the first and second substances,

second, the confidence vector for each DRBox

Is a binary array representing the probability that it belongs to the foreground (1) and background (0), respectively. Loss of confidence

Calculated by the Softmax function:

wherein the content of the first and second substances,

finally, the cost loss is calculated by the loss function of the loss layer in the following formula

Where N is the number of DRBox on the match, when N is 0, the loss layer outputs 0, α is the position loss weight adjustment factor, α∈ [0,1]；x_ijA marking variable of the matching condition of the ith prior DRBox and the jth real DRBox; confidence prediction layer output of the detection network is

For confidence prediction information for the ith a priori DRBox,

is a binary array which respectively represents the probability that the binary array belongs to the foreground (1) and the background (0); the position prediction layer output is

Coded position prediction information of ith prior DRBox parameter

Pos is a set of prior DRBox sequence numbers matched with the real DRBox; GT is the set of all real DRBox serial numbers; neg is a set of negative sample prior DRBox serial numbers with balanced positive and negative samples; smooth_L1Is a smoothed L1 norm function.

e) Error back propagation

In the detection network, the error of the loss layer is transmitted back to the position prediction layer, the confidence coefficient prediction layer and each convolution layer, and the parameters of each layer are corrected, namely the parameters of the network model are updated by using the gradient of the loss function. The a priori DRBox layer does not need to accept error back-propagation. Thus, only the gradient of the loss function needs to be calculated in the loss layer:

and 6, cutting the remote sensing image into small pictures with preset sizes.

the detection network after training obtains the detection result through the following steps:

a) inputting the cut small image into a trained deep learning detection model, and outputting a coding position prediction vector and a confidence coefficient prediction vector through a target detection network in the deep learning detection model;

b) obtaining the position information of the predicted DRBox through a decoding process according to the coding position prediction vector and the prior DRBox, and associating a confidence degree prediction result and the type information of the prior DRBox to each predicted DRBox;

c) the final detection result is obtained by non-maximum suppression (NMS).

Wherein the decoding process in step b) is as follows:

in the formula, the location information of the ith prediction DRBox

m∈{cx,cy,w,h,a}，l_i ^cx,l_i ^cy,l_i ^w,l_i ^h,l_i ^aRespectively representing the abscissa, ordinate, width, height and angle of the ith prediction DRBox; coded position prediction information of ith prior DRBox

Location information of ith prior DRBox

For each class of targets to be detected, the NMS firstly sorts the output results with the foreground confidence degrees larger than a given value according to the confidence degrees and takes out a given number of output DRBox in sequence. These DRBox are added to the output queue in turn and each time the newly output DRBox and the RIoU of the already output DRBox are guaranteed not to exceed a given threshold. The NMS ensures that each target is not selected by a plurality of DRBox at the same time, and the DRBox finally output by the algorithm is the predicted DRBox with the highest confidence coefficient related to the target.

And 8, restoring longitude and latitude information from the body detection result and the shadow detection result, and fusing the longitude and latitude information to the original remote sensing image to obtain the detection result of the transmission tower.

And testing conditions of Monte first-stage, second-stage and third-stage work areas. The work area comprises 6 built power transmission lines in total, namely, Jingneng five-room power plant lines and Mengyeng cylinder power plant lines. Fig. 6 shows tower detection results of the first stage, the second stage and the third stage of the Mongolian tower respectively.

While embodiments have been described with reference to specific exemplary embodiments thereof, it will be evident that various modifications and changes may be made to these embodiments without departing from the broader spirit and scope of the inventive subject matter. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims

1. A transmission tower detection method based on remote sensing images is characterized by comprising the following steps:

2. The method of claim 1, wherein the remote sensing imagery is high resolution multi-source satellite remote sensing imagery.

3. The method according to claim 1, wherein the step 2 of labeling the remote sensing image containing the tower specifically comprises: and marking the minimum external rectangle of each tower by using a frame-pulling marking method, thereby acquiring the specific position of the tower on the image.

4. The method of claim 1, wherein, at step 4, the object detection network comprises a data input layer, a convolutional layer, an a priori DRBox layer, a location prediction layer, a confidence prediction layer, and a loss layer;

the data input layer is used for reading in training data, and a target frame of the training data comprises the following parameters: the length and width of the target, the abscissa and ordinate of the central point of the target, and the target angle;

the convolutional layer adopts a VGGNet network as a pre-training model;

the prior DRBox layer is connected behind the convolutional network and generates a series of prior DRBox;

the position prediction layer is used for generating position correction information of each priori DRBox to obtain the position information of the predicted DRBox;

the confidence prediction layer is used for generating confidence information for each priori DRBox to obtain the confidence of each type of the predicted DRBox;

the loss layer is used for calculating a loss function and generating error back transmission quantity of the network; the method comprises four inputs, namely the complex label part of a data input layer, and the outputs of a position prediction layer, a confidence prediction layer and a prior DRBox layer.

5. The method of claim 4, the a priori DRBox layer connection generating a series of a priori DRBox after a convolutional network,

wherein the size of the prior DRBox is preset by the input of an algorithm; when the size of the target to be detected is fixed, directly setting the size of the target; when the size of the target to be detected changes in a small range, selecting the size of the prior DRBox as the average value of the change range; when the size difference of the target to be detected is large, using a plurality of groups of DRBox with different sizes, and leading out the DRBox with different sizes from different convolutional layers;

aiming at the detection of multiple types of targets, dividing prior DRBox into multiple groups according to target types and prior sizes, and binding each group of DRBox on a selected convolution layer according to a preset strategy;

the locations of the prior drboxes are derived from the downsampling relationship of the selected convolutional layer feature map and the input image, each prior DRBox being generated from a location on the feature map; the prior RBox with smaller size is generated by a shallower layer, the prior RBox with larger size is generated by a deeper layer, and the network has the capability of detecting targets with different sizes by generating the prior RBox on different convolutional layers.

The angle information in the prior DRBox covers the target angle which may appear, when the target needs to be distinguished from the head and the tail, the angle value range of the prior DRBox is 0-360 degrees, and the value is discretely taken according to a preset step length L1; when the target does not need to distinguish the head and the tail, the angle value range of the prior DRBox is 0-180 degrees, and the value is discretely taken according to a preset step length L2.

6. The method of claim 4, wherein the confidence information of the confidence prediction layer is a 2-dimensional vector representing the probability that the DRBox belongs to the target and the background, respectively.

7. The method according to claim 1, wherein the step 5 of dividing the transmission tower detection task into two parts, namely tower body detection and shadow detection, and performing target detection network training based on respective multi-angle rotatable candidate frame DRBox deep learning detection models respectively specifically comprises:

step 5-1, carrying out DRBox coding;

step 5-2, performing DRBox matching; in the training process, comparing the prior RBox with the real RBox one by one to determine positive and negative samples;

step 5-3, carrying out positive and negative sample equalization;

5-4, calculating loss, namely predicting the difference between the DRBox and the real DRBox, wherein the difference comprises position loss and confidence coefficient loss;

and 5-5, transmitting the error back.

8. The method of claim 7, wherein the step 5-1 of performing the DRBox encoding specifically comprises:

in the formula, the location information of the ith prior DRBox

Location information of jth real target DRBox

Offset information of ith prior DRBox and jth real target DRBox

m ∈ { cx, cy, w, h, a }, cx, cy, w, h, a respectively represent the abscissa, ordinate, width, height and angle of the DRBox;

a vector of offset information representing all real targets.

9. The method of claim 8, wherein, in the step 5-2, the DRBox matching specifically comprises:

step 5-2-1, if the two DRBox have no intersection, the RIoU is 0; if the prior DRBox and the real DRBox do not belong to the same category, the RIoU is also 0;

step 5-2-2, performing bidirectional matching on the prior DRBox and the real DRBox, and specifically comprising the following steps:

for each prior DRBox, if a real DRBox with RIoU >0.5 exists, matching each real DRBox with the prior DRBox, and recording a mark variable as 1, otherwise recording as 0;

the RIoU represents IoU multiplied by the absolute value of the cosine of the two DRBox angle differences, the IoU represents the ratio of the intersection to the union area of the two boxes.

10. The method of claim 8, wherein the step 5-3 of performing positive and negative sample equalization comprises:

step 5-3-1, calculating the confidence coefficient of the negative samples as the background through a confidence coefficient prediction layer,

step 5-3-2, the background confidence degrees of all negative samples are sorted,

and 5-3-3, sampling the negative samples in the order from low confidence to high confidence, so that the number of the positive samples and the number of the negative samples meet a given proportion, and obtaining a set Neg of the prior DRBox serial numbers of the negative samples with balanced positive and negative samples.

11. The method of claim 8, wherein the step 5-4, loss calculation, specifically comprises:

cost loss is calculated by the loss function of the loss layer in the following formula

Wherein the position loss is calculated by the following formula

Wherein the confidence loss is calculated by the following Softmax function

For confidence prediction information for the ith a priori DRBox,

Coded position prediction information of ith prior DRBox parameter

Pos is a set of prior DRBox sequence numbers matched with the real DRBox; GT is the set of all real DRBox serial numbers; neg is a set of negative sample prior DRBox serial numbers with balanced positive and negative samples;

is a smoothed L1 norm function.

12. The method of claim 11, wherein the step 5-5 of error back-propagation comprises:

in the detection network, the error of the loss layer is reversely transmitted to the position prediction layer, the confidence coefficient prediction layer and each convolution layer, the parameter of each layer is corrected, namely the parameter of the network model is updated by using the gradient of the loss function, and the method specifically comprises the following steps of calculating the gradient of the loss function in the loss layer:

13. the method according to claim 1, wherein step 7, for the plurality of image thumbnails, respectively performing body detection and shadow detection of the transmission tower based on a target detection network of a multi-angle rotatable candidate frame DRBox deep learning detection model of the tower body and shadow;

step 7-1, inputting the cut small image into a trained deep learning detection model, and outputting a coding position prediction vector and a confidence coefficient prediction vector through a target detection network in the deep learning detection model;

step 7-2, obtaining the position information of the predicted DRBox through a decoding process according to the coding position prediction vector and the prior DRBox, and associating a confidence prediction result and the category information of the prior DRBox to each predicted DRBox;

and 7-3, inhibiting NMS by a non-maximum value to obtain a final detection result.

14. The method as claimed in claim 13, wherein the obtaining the location information of the predicted DRBox through a decoding process according to the location prediction vector and the a priori DRBox in step 7-2 comprises:

in the formula, the location information of the ith prediction DRBox

Location information of ith prior DRBox