CN111553204B

CN111553204B - Transmission tower detection method based on remote sensing image

Info

Publication number: CN111553204B
Application number: CN202010279995.5A
Authority: CN
Inventors: 田桂申; 宋猛; 白雪娇; 刘丽娟; 邹睿翀; 莫明飞; 杨知; 费香泽; 李闯
Original assignee: State Grid Corp of China SGCC; China Electric Power Research Institute Co Ltd CEPRI; State Grid Eastern Inner Mongolia Power Co Ltd
Current assignee: State Grid Corp of China SGCC; China Electric Power Research Institute Co Ltd CEPRI; State Grid Eastern Inner Mongolia Power Co Ltd
Priority date: 2020-04-10
Filing date: 2020-04-10
Publication date: 2024-05-28
Anticipated expiration: 2040-04-10
Also published as: CN111553204A

Abstract

The invention provides a transmission tower detection method based on a remote sensing image, which mainly utilizes multi-source high-resolution satellite remote sensing data to realize automatic detection and extraction of a transmission tower target, solves the problems of a computer vision object detection method on a satellite remote sensing image, overcomes the problems of small size, changeable tower shape and the like of a tower body in a overlooking view, and realizes automatic detection of the transmission tower in a high-resolution optical remote sensing image. Through the quantity statistical analysis of the transmission tower detection results, the normalized monitoring of the construction progress of the large-scale power grid engineering can be guided.

Description

Transmission tower detection method based on remote sensing image

Technical Field

The invention relates to the field of power grid construction engineering, in particular to a transmission tower detection method based on optical satellite remote sensing images.

Background

In order to realize the progress monitoring of the power grid construction process, the traditional technical means mainly comprise manual inspection and the like. Aiming at large-area networking engineering with extremely changeable environment and complex and dangerous topography, the traditional mode has a plurality of constraints and defects such as small monitoring range, high risk, strong environmental constraint, poor reliability and the like, has defects in various aspects such as coverage, timeliness, reliability, safety and the like, and brings challenges to the normal, efficient and refined monitoring and auditing business of the engineering.

Compared with the traditional observation means, satellite observation has incomparable huge advantages, and is particularly suitable for large-range configuration, long-distance power transmission and complicated environmental change of power grid. Specifically, the satellite remote sensing observation coverage is wide, and large-range data can be rapidly acquired; the satellite remote sensing observation has the characteristics of high information acquisition speed, short updating period and dynamic and timely performance, and is incomparable with manual field measurement and aerial photogrammetry. And secondly, satellite remote sensing observation and acquisition information is less limited by conditions, and data can be timely acquired in areas where natural conditions such as desert, mountain steep mountain, high altitude and the like are extremely severe and human beings are difficult to reach. In addition, the information quantity obtained by satellite remote sensing observation is large, and different wave bands such as visible light, ultraviolet rays, infrared rays, microwaves and the like and remote sensing instruments can be adopted according to different tasks, so that multi-dimensional mass information can be obtained all the time and all the weather. Therefore, the multi-source high-resolution satellite remote sensing data is fully utilized to realize the normalized progress monitoring of the large-area power grid engineering progress.

For the automatic detection technology of the towers in the high-resolution optical remote sensing image, the target detection is an important research direction in the computer vision, but compared with the target detection problem in the general computer vision, the target detection on the remote sensing image has more difficulties: (1) the target size is small. Limited by the resolution of the remote sensing image, the valuable target length and width to be detected is usually only tens of pixels. (2) there is a priori information about the target size. The object on the natural image has the phenomenon of near-large and far-small, the similar problems have larger size change range on different images, and the remote sensing image has definite resolution information, so the size of the object to be detected is known. (3) the target angle is random. The target object on the natural image is typically at a relatively uniform angle, such as vertical or horizontal. The remote sensing image is obtained by looking down the image of the target object, and the attitude angle of the target appears randomly.

Compared with the technical research of the state monitoring of the power transmission line body based on aviation (helicopter or unmanned aerial vehicle) means, the existing technical research of the state monitoring of the power transmission line body based on satellite remote sensing is less, because the spatial resolution of the satellite remote sensing in the past for a long time cannot meet the requirement of the state monitoring of the body. Therefore, the detection of the transmission towers can only be distinguished in the form of point targets, and the identification of the types of the towers cannot be realized. In recent years, satellite remote sensing has evolved very rapidly as an important means of earth observation. The spatial resolution of the WorldView series optical satellite reaches 0.3m, 8 spectral bands from visible light to near infrared can be obtained, and the revisiting period is about 2 days. Meanwhile, the spatial resolution of Synthetic Aperture Radar (SAR) satellites such as RADARSAT-2 reaches 1m, the ground object target information of the microwave band can be obtained, the requirement of monitoring the large-scale structural state of the power transmission line body is met, and related researches are increasingly abundant. Liao et al clearly resolved the objective and trend of transmission towers from SAR images of the flood inundation area of the river in 2003. In 2005 Zhu Junjie compared with SAR images and QuickBird images near the green bridge on the five rings of Beijing north, the towers beside the overpass are clearly distinguished in both images. The spatial resolution of SAR data used in the two researches reaches 1.25m, and the transmission tower shows obvious triangular bright spots on SAR images and can be clearly identified. On the basis, in 2007 Yang et al establish an automatic transmission tower identification model according to the high-resolution polarized SAR image, accurately extract the transmission lines in the farmland and improve the automation level of transmission tower identification. However, the research so far is in the body simple identification stage, and the fine identification of the transmission tower lacks related research work. Besides the research developed based on SAR images, the research on detecting the transmission line based on the optical satellite remote sensing data is also continuously developed. In 2015, chen et al constructed the peak characteristic of the transmission conductor in the Cluster Randon (CR) frequency domain space, which is characterized in that the transmission conductor is extracted from Quickbird optical satellite images for the first time at home and abroad, and has milestone significance. However, the CR frequency domain peak feature in this study actually corresponds to the edges of the wire and its shadows, requiring shadowing by a priori conditions to reduce the false alarm rate. This means that the CR frequency domain peak feature alone cannot be used to extract the transmission line robustly and with high accuracy, and it is necessary to further mine the transmission line body feature with an improved algorithm to achieve accurate extraction of the transmission line body.

Disclosure of Invention

The invention provides a transmission tower detection method based on remote sensing images, which aims at the problems of progress monitoring and auditing in the power grid construction process in the traditional technical means such as manual inspection and the like and aims at the normalized monitoring demand of the large-scale power grid engineering construction progress, and the transmission tower detection method based on remote sensing images comprises the following steps:

step 1, constructing a transmission tower sample space by adopting a remote sensing image as a data source to form a transmission tower detection data set;

step 2, labeling the transmission tower detection data set; labeling the remote sensing images containing the towers to improve the adaptability of the tower detection to remote sensing data with different resolutions, different wave bands and different imaging angles;

step 3, splitting the transmission tower detection data set into an ontology data set and a shadow data set;

Step 4, constructing respective multi-angle rotatable candidate frames DRBox deep learning detection models and target detection networks thereof according to the body data set and the shadow data set; the multi-angle rotatable candidate frame DRBox is a rotatable candidate frame with angle information, and can search the target to be detected at different positions of the input image;

Step 5, dividing a transmission tower detection task into two parts of tower body detection and shadow detection, and respectively carrying out target detection network training based on respective multi-angle rotatable candidate frames DRBox deep learning detection models;

step 6, cutting the remote sensing image into a plurality of image small images with preset sizes;

Step 7, for the multiple image small figures, respectively performing body detection and shadow detection of the transmission tower based on a target detection network of a multi-angle rotatable candidate frame DRBox depth learning detection model of the tower body and the shadow;

And 8, restoring longitude and latitude information from the body detection result and the shadow detection result, and fusing the longitude and latitude information and the shadow detection result to the original remote sensing image to obtain a final detection result of the transmission tower.

According to the optical satellite remote sensing image transmission tower detection method, the high-resolution remote sensing image is utilized to realize large-range transmission line tower target extraction in a complex environment. Firstly, in order to solve the problem that satellite remote sensing is possibly incomplete in acquisition of shape information of a tower body due to change of an irradiation angle, a technical scheme for comprehensively detecting and identifying types of the tower targets by combining the tower body and shadow information is provided, so that a detection frame can be respectively suitable for two situations of an oblique view angle (obvious structural characteristics of the body) and a view angle (unobvious structural characteristics of the body) under the detection frame. Secondly, in the technical scheme of detecting the target of the tower, aiming at the problems of larger length-width ratio of the target of the tower and larger influence of the target direction, the multi-angle rotatable candidate frame DRBox deep learning detection model is provided to solve the problem that the detection result is interfered by the acquired visual angle difference possibly existing in the remote sensing image, and the method is effectively suitable for the randomness of the direction of the transmission tower in the remote sensing image. By comparing the detection recall rate of the transmission tower with the manual labeling result, the detection recall rate of the transmission tower can reach 80%, the accuracy rate can reach 88.9%, the manpower investment in engineering operation is greatly reduced, the large-range transmission line progress monitoring efficiency in a complex environment is improved, and a decision basis is provided for auxiliary audit in the engineering process.

Drawings

FIG. 1 WorldView-1 satellite tower schematic (1:2000);

FIG. 2 WorldView-2 is a satellite turret schematic diagram (1:2000);

FIG. 3 is a schematic illustration of tower framing labels;

FIG. 4 DRBox is a schematic diagram of a network architecture;

FIG. 5 is a tower detection workflow diagram;

FIG. 6 is a graph of detection results of a tower of the Mongolian first-period transmission line;

Fig. 7 is a flowchart of a transmission tower detection method according to an embodiment of the present invention.

Detailed Description

In the following description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the invention. It will be apparent, however, to one skilled in the art that the present invention may be practiced without these specific details. The following detailed description includes references to the accompanying drawings, which form a part of the detailed description. The drawings illustrate diagrams in accordance with exemplary embodiments. These exemplary embodiments (which are also referred to herein as "examples") are described in sufficient detail to enable those skilled in the art to practice the present subject matter. The embodiments may be combined, other embodiments may be utilized, or structural and logical changes may be made without departing from the scope of the claims. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope is defined by the appended claims and their equivalents.

As shown in fig. 7, the invention provides a transmission tower detection method based on an optical satellite remote sensing image, which comprises the following steps:

step 1, a remote sensing image is used as a data source, a transmission tower sample space is constructed, a transmission tower detection data set is formed, and the remote sensing image in the example is a high-resolution multi-source satellite remote sensing image;

different from natural images, the factors influencing remote sensing imaging are numerous, including orbit height, satellite zenith angle and other satellite inherent parameters, and are influenced by cloud, rain, fog and other weather conditions, so that different satellites image the same scene, the remote sensing data have the difference in brightness, hue and other aspects, and the imaging quality of the same satellite is different even if the same satellite passes in different time. Therefore, in order to improve the performance of the tower detection model to the greatest extent, the mainstream high-resolution satellite data at home and abroad needs to be widely used to construct a sample space with large data volume and strong representativeness, as shown in fig. 1 and 2.

And 2, marking the transmission tower detection data set, and marking multi-source remote sensing data comprising the transmission tower so as to improve the adaptability of the transmission tower detection to remote sensing data with different resolutions, different wave bands and different imaging angles.

In order to ensure the accuracy of the tower detection model to the greatest extent, the multisource remote sensing data comprising the tower needs to be marked. For the positions of the towers, a frame drawing and marking method is utilized to mark the minimum circumscribed rectangle of each tower, so that the specific position of the tower on the image is obtained. The tower frame label schematic is shown in figure 3.

From the figure, the appearance of the tower on the remote sensing image is mainly white pixels, the theme structure of the tower can be seen on the high-resolution remote sensing image, and in addition, the shadow of the tower, the shadow of the high-voltage line on the ground and the like can be seen. In order to improve the performance of the tower detection model, the positions of the towers are required to be marked by a drawing frame with a minimum external rectangle, and each marking frame is required to be ensured to completely cover the electric tower, so that the condition that omission occurs cannot occur; meanwhile, the frame mark cannot be too large, if the frame mark is too large, more backgrounds (such as farmlands, channels, trees, roads, buildings and the like) are contained in the frame of the target, so that noise of a sample becomes large, and the recognition effect of the tower detection model can be affected. Once the noise is more, the data distribution of the tower sample is directly changed greatly, which is unfavorable for training and generating a high recall and low false positive detection model.

Aiming at the task characteristics of high-resolution remote sensing data tower target detection and considering the subsequent tower type recognition task, the invention provides a multi-angle rotatable candidate frame DRBox for solving the object detection problem on a remote sensing image.

The proposal of the rotation detection frame can effectively adapt to the arbitrary nature of the target direction in the remote sensing image. Compared with the traditional candidate frame, the rotation detection frame has the following advantages:

1) The size and aspect ratio of the box may reflect the shape of the target object;

2) RBox contain fewer background pixels, which is beneficial for the detector to distinguish foreground and background points;

3) RBox can effectively avoid overlapping between adjacent target candidate frames, thereby being more beneficial to the detection of dense targets.

The multi-angle rotatable candidate box DRBox is a very important ring in the detection. The convolution network structure ensures that the rotatable candidate frame can search the target to be detected at different positions of the input image, and at each position, the rotatable candidate frame generates a multi-angle prediction result by rotating a series of angles, which is the largest difference between the DRBox detection method and other BBox-based detection methods in the invention. In order to reduce the total number of current frames, the aspect ratio employed in the detection is consistent with the target type. Through the multi-angle rotatable candidate frame strategy, the training network converts the detection task into a series of subtasks, and each subtask focuses on detection within a narrower angle range, so that the influence of target rotation on detection is reduced.

Aiming at the problem of random change of the target angle, the invention provides a rotatable frame DRBox with angle information. Each DRBox contained 7 parameters: the length and width of the target, the abscissa, ordinate, target angle, and the probability that the target is judged to be foreground and background of the target center point.

The constructed target detection network consists of a data input layer, a convolution layer, a priori DRBox layer, a position prediction layer, a confidence prediction layer and a loss layer, as shown in fig. 4.

A) Data Input Layer (Data Input Layer)

The training data is read in by a data input layer, and the target frame information comprises a category label of the target, a center point coordinate of the target, an angle of the target and a length and width of the target. In contrast to general detection algorithms, training data is required to provide a target frame with angular information.

B) Convolution layer (Convolution Layers)

Because VGGNet networks are deeper and have better feature extraction capability, the convolutional layer of the invention adopts the networks as a pre-training model.

C) Priori DRBox layers (Prior DRBox Layer)

A priori DRBox layer is connected after the convolutional network for generating a series of priors DRBox.

The size of the prior DRBox is preset by the input of the algorithm. When the size of the target to be detected is fixed, the target size is directly set; when the size of the target to be detected changes within a small range, the size of the priori DRBox is selected as the average value of the changing range; when the size difference of the object to be detected is large, plural sets of DRBox having different sizes are used, and DRBox of different sizes are led out from different convolution layers. For the detection of multiple classes of targets, the priors DRBox are divided into multiple groups according to the target class and the prior size, and each group DRBox is bound on a selected convolution layer according to a certain strategy.

The locations of priors DRBox are derived from the downsampled relationship of the feature map and the input image, and each priors DRBox is generated from one location on the feature map. For example, for an input size of 300×300, DRBox generated on an 8×8 feature map would cover each region of the input image with a 300/8=37.5 pixel step size. When the distance between the targets is smaller than the step length, the target will be missed, so that a proper characteristic layer is required to be determined according to the target size to ensure that the missed detection phenomenon cannot occur. Intuitively, the smaller size priors DRBox should be generated by shallower layers, while the larger size priors DRBox may be generated by deeper layers. The ability to detect targets of different sizes can be provided to the network by generating priors DRBox on different convolutional layers.

In order for the network to have the ability to detect target angles, angle information needs to be introduced in the pre-test DRBox and the angle should cover the target angles that may occur. When the targets need to distinguish the head and the tail, the angle value range of the priori DRBox is 0 to 360 degrees, and discrete value is carried out according to a preset step length L1; when the targets do not need to distinguish the head and the tail, the angle value range of the priori DRBox is 0 to 180 degrees, and discrete value is carried out according to the preset step length L2. In summary, given a particular target type and size, R priors DRBox with different angles are generated at each location in a selected convolutional layer feature map, and priors DRBox with different size targets are generated in different convolutional layer feature maps.

D) Position predicting layer (Location Prediction Layer)

The position prediction layer generates position correction information for each prior DRBox to obtain position information for prediction DRBox, which is obtained from a 3 x 3 sliding window on the feature map that leads out DRBox.

E) Confidence prediction layer (Confidence Prediction Layer)

The confidence prediction layer generates confidence information for each prior DRBox, resulting in a confidence level that predicts DRBox that belongs to each class. Because the prior DRBox is bound to the target class in the present invention, the confidence information is a 2-dimensional vector, representing the probability that DRBox belongs to the target and the background, respectively, and is obtained from a 3×3 sliding window on the feature map of the extraction DRBox.

F) Loss layer (MultiDRBox Loss Layer)

The loss layer calculates a loss function and generates an error feedback quantity of the network, and the error feedback quantity comprises four parts of input, namely, the outputs of a complex label part, a position prediction layer, a confidence prediction layer and a priori DRBox layer of the data input layer. The loss function of the loss layer is formed by weighted superposition of two parts of position loss and confidence loss, and the accuracy degree of position prediction and confidence prediction is measured respectively and can be written as the following form:

wherein L _conf (x, c) is confidence loss, x is a marker variable indicating the matching condition of the real target frame and the prediction DRBox, and c is a confidence prediction vector; l _loc (x, L, g) is the position loss, L is the position prediction vector, i.e. the offset of the predicted DRBox parameter and the a priori DRBox parameter, g is the offset of the real target DRBox parameter and the a priori DRBox parameter; α is a position loss weight adjustment factor, when α=0, no position loss L _loc (x, L, g) is used; n is the number of DRBox on the match. When n=0, the loss layer outputs 0.

The task is divided into two parts of tower body detection and shadow detection on the basis of a single-stage detection network DRBox, and the two parts of models are respectively trained, so that the detection framework can be respectively suitable for two situations of an oblique view angle (the structural characteristics of the body are obvious) and a view angle (the structural characteristics of the body are not obvious) under the detection framework.

The working flow of the tower detection is shown in fig. 5, and specifically comprises the following steps:

Step 5, dividing a transmission tower detection task into two parts of tower body detection and shadow detection, and respectively carrying out target detection network training based on respective multi-angle rotatable candidate frames DRBox deep learning detection models, wherein the specific steps comprise:

a) Performing DRBox coding;

b) Performing DRBox matching; in the training process, the priori RBox and the true RBox are compared one by one to determine positive and negative samples;

c) Performing positive and negative sample equalization;

d) Loss calculation, predicting DRBox the difference between true DRBox, including two parts of position loss and confidence loss;

f) And (5) error back transmission.

For the adopted network model, the network training comprises DRBox coding and decoding, DRBox matching, positive and negative sample equalization, loss calculation and error feedback.

The specific process of a) DRBox codec is as follows:

in the forward propagation process, the process of obtaining the offset between the ith prior DRBox and the jth real DRBox according to the position information of the ith prior DRBox and the jth real DRBox is an encoding process, and the following formula is as follows:

in the formula, the position information of the ith priori DRBox Position information/>, of the j-th real object DRBoxOffset information/>, of ith a priori DRBox and jth real target DRBox A vector formed by offset information representing all real targets, m epsilon { cx, cy, w, h, a };

the abscissa, ordinate, width, height and angle of the i-th prior DRBox are respectively represented; /(I) Respectively representing the abscissa, the ordinate, the width, the height and the angle of the ith real object DRBox; /(I)The abscissa, ordinate, width, height and angle, respectively, represent the i-th prior DRBox offset from the j-th real target DRBox.

B) DRBox match

In the training process, the prior DRBox needs to be aligned with the real DRBox one by one to determine positive and negative samples. The proximity between two boxes is described by IoU. IoU, collectively Intersection-over-union, is defined as the ratio of the intersection to union area of two boxes. To give the angle a higher weight in the IoU calculation, the ratio is followed by the absolute value of the cosine of the difference between the two DRBox angles, and this newly defined finger is denoted RIoU. If there is no intersection between two DRBox, RIoU is 0. Since each prior DRBox in the algorithm contains class information, RIoU is also 0 if the prior DRBox and the real DRBox do not belong to the same class.

The algorithm matches a priori DRBox with true DRBox according to the following strategy:

for each true DRBox, if there is a priori DRBox that RIoU >0, then taking the largest a priori DRBox of RIoU to match the true DRBox and marking the variable as 1, otherwise marking as 0;

For each a priori DRBox, if there is a true DRBox with its RIoU greater than 0.5, then match the two DRBox and mark the variable as 1, otherwise as 0;

the set of all prior DRBox sequence numbers that can be matched with the true DRBox is recorded as Pos;

Taking the priors DRBox in the set Pos as positive samples to participate in the calculation of the loss function, and taking other priors DRBox as candidate negative samples;

The marker variable is x= { x _ij},x_ij∈{1,0},x_ij, which indicates whether the ith priori DRBox is matched with the jth real DRBox, 1 indicates matching, and 0 indicates non-matching;

the RIoU represents the absolute value of IoU multiplied by the cosine of the two DRBox angle differences; the IoU represents the ratio of the intersection and union area of two boxes, i.e. IoU is used to describe the proximity between two boxes

The priors DRBox on these matches will participate as positive samples in the calculation of the loss function, the other priors DRBox as candidate negative samples.

C) Positive and negative sample equalization

Typically, the number of negative samples obtained by the strategy in DRBox matches is much larger than the number of positive samples, so the negative samples need to be scaled down to equalize the numbers. During model training we tend to focus more on negative samples that are more easily confused with positive samples. Detection algorithms based on convolutional networks typically employ HARD NEGATIVE MINING methods to reduce the negative samples. Firstly, calculating the confidence coefficient of the negative samples serving as the background through a confidence coefficient prediction layer; and secondly, sorting the background confidence degrees of all negative samples, and finally sampling the negative samples according to the sequence from low confidence degrees to high confidence degrees, so that the number of the positive samples and the number of the negative samples meet a given proportion, and obtaining a negative sample priori DRBox sequence number set Neg with balanced positive and negative samples.

D) Loss calculation

The cost loss function comprises two parts, namely position loss and confidence loss, the cost function is used for calculating the difference between the prediction DRBox and the real DRBox, each priori DRBox corresponds to one prediction DRBox, and the relevant parameters of the prediction DRBox are output by the network.

First, position lossCalculated from the following formula:

Wherein,

Second, confidence vectors for each DRBoxIs a two-element array, and represents the probability that it belongs to the foreground (1) and the background (0) respectively. Confidence loss/>Calculated from the Softmax function:

Wherein,

Finally, calculating the cost loss through a loss function of the loss layer in the following formula

Where N is the number of DRBox on the match, when n=0, the loss layer outputs 0; alpha is a position loss weight adjusting factor, alpha epsilon [0,1]; x _ij is the marker variable of the matching situation of the ith priori DRBox and the jth true DRBox; confidence prediction layer output of detection network is as followsConfidence prediction information for the i-th prior DRBox,/>Is a two-element array, and represents the probability that the two-element array belongs to a foreground (1) and a background (0) respectively; the position prediction layer output is/>Coding position prediction information/>, of i priori DRBox parametersPos is a set of a priori DRBox sequence numbers that match true DRBox; GT is the set of all real DRBox serial numbers; neg is a set of negative sample priori DRBox serial numbers for positive and negative sample equalization; smooths _L1 is a smooth L1 norm function.

E) Error counter-transmission

In the detection network, the error of the loss layer is reversely transmitted to the position prediction layer, the confidence prediction layer and each convolution layer, and the parameters of each layer are corrected, namely, the parameters of the network model are updated by utilizing the gradient of the loss function. The prior DRBox layer need not accept error back-propagation. Therefore, only the gradient of the loss function needs to be calculated at the loss layer:

And 6, cutting the remote sensing image into small images with preset sizes.

the detection network after training is completed obtains a detection result through the following steps:

a) Inputting the cut image small image into a trained deep learning detection model, and outputting a coding position prediction vector and a confidence coefficient prediction vector through a target detection network in the deep learning detection model;

b) Obtaining position information of predictions DRBox through a decoding process according to the coded position prediction vector and the priori DRBox, and associating a confidence prediction result and category information of the priori DRBox to each prediction DRBox;

c) The final detection result is obtained by non-maximum suppression (NMS).

Wherein the decoding process in step b) is as follows:

In the formula, the position information of the i-th prediction DRBox m∈{cx,cy,w,h,a},l_i ^cx,l_i ^cy,l_i ^w,l_i ^h,l_i ^a The abscissa, ordinate, width, height and angle of the ith prediction DRBox are shown, respectively; coding position prediction information/>, of i-th prior DRBoxPosition information/>, of i-th a priori DRBox

For each type of object to be detected, the NMS first ranks the output results with foreground confidence greater than a given value by confidence and sequentially fetches a given number of outputs DRBox. These DRBox are added to the output queue in turn and ensure that each time DRBox of a new output and RIoU of DRBox that has been output do not exceed a given threshold. The NMS ensures that each target is not selected by multiple DRBox at the same time and that the final output DRBox of the algorithm is the highest confidence prediction DRBox for that target.

And 8, restoring longitude and latitude information from the body detection result and the shadow detection result, and fusing the longitude and latitude information and the shadow detection result to the original remote sensing image to obtain the detection result of the transmission tower.

Testing is carried out for the conditions of Meng Dong first-stage, second-stage and third-stage work areas. The work area comprises 6 power transmission lines under construction in total of Beijing energy five-room power plant-Huarun five-room power plant lines, mongolian cylinder power plant-victory transformer substation lines. Fig. 6 shows the tower detection results of the first, second and third Mongolian phases.

While the present invention has been described with reference to specific exemplary embodiments, it will be evident that various modifications and changes may be made to these embodiments without departing from the broader spirit and scope of the inventive subject matter. The specification and drawings are, accordingly, to be regarded in an illustrative rather than a restrictive sense.

Claims

1. The transmission tower detection method based on the remote sensing image is characterized by comprising the following steps of:

2. The method of claim 1, wherein the remote sensing image is a high resolution multi-source satellite remote sensing image.

3. The method of claim 1, wherein the labeling of the remote sensing image including the tower in step 2 specifically comprises: and labeling the minimum circumscribed rectangle of each tower by using a frame drawing and labeling method, thereby obtaining the specific position of the tower on the image.

4. The method of claim 1, wherein step 4, the target detection network comprises a data input layer, a convolution layer, a priori DRBox layer, a position prediction layer, a confidence prediction layer, and a loss layer;

The data input layer is used for reading in training data, and a target frame of the training data comprises parameters: the length and width of the target, the abscissa, ordinate and target angle of the target center point;

the convolution layer adopts VGGNet network as a pre-training model;

the prior DRBox layer is connected after the convolution network and generates a series of prior DRBox;

A position prediction layer, configured to generate position correction information for each priori DRBox, so as to obtain position information of the prediction DRBox;

The confidence coefficient prediction layer is used for generating confidence coefficient information for each priori DRBox to obtain the confidence coefficient of the prediction DRBox belonging to each class;

The loss layer is used for calculating a loss function and generating an error counter-transmission quantity of the network; the method comprises four parts of input, namely, the output of a complex label part, a position prediction layer, a confidence prediction layer and a priori DRBox layer of a data input layer.

5. The method of claim 4, wherein the prior DRBox layer is concatenated after the convolutional network to generate a series of prior DRBox,

Wherein the size of the prior DRBox is preset by the input of the algorithm; when the size of the target to be detected is fixed, the target size is directly set; when the size of the target to be detected changes within a small range, the size of the priori DRBox is selected as the average value of the changing range; when the size difference of the targets to be detected is large, a plurality of groups of DRBox with different sizes are used, and DRBox with different sizes are led out from different convolution layers;

Aiming at the detection of multiple types of targets, the priori DRBox is divided into a plurality of groups according to the target types and the priori sizes, and each group DRBox is bound on a selected convolution layer according to a preset strategy;

the locations of priors DRBox are derived from the downsampled relationship of the selected convolutional layer feature map and the input image, each priors DRBox being generated from one location on the feature map; the smaller-sized priors RBox are generated by shallower layers, the larger-sized priors RBox are generated by deeper layers, and the network has the capability of detecting targets of different sizes by generating the priors RBox on different convolution layers;

the angle information in the priori DRBox covers a possible target angle, when the target needs to distinguish the head and the tail, the angle value range of the priori DRBox is 0 to 360 degrees, and the angle value is discretely taken according to a preset step length L1; when the targets do not need to distinguish the head and the tail, the angle value range of the priori DRBox is 0 to 180 degrees, and discrete value is carried out according to the preset step length L2.

6. The method of claim 4, wherein the confidence prediction layer, the confidence information is a 2-dimensional vector representing DRBox probabilities of belonging to the target and the background, respectively.

7. The method of claim 1, wherein the step 5 divides the transmission tower detection task into two parts of tower body detection and shadow detection, and performs the target detection network training based on the respective multi-angle rotatable candidate frame DRBox deep learning detection model, specifically includes:

step 5-1, carrying out DRBox coding;

Step 5-2, carrying out DRBox matching; in the training process, the priori RBox and the true RBox are compared one by one to determine positive and negative samples;

Step 5-3, balancing positive and negative samples;

Step 5-4, loss calculation, predicting the difference between DRBox and the true DRBox, including two parts of position loss and confidence loss;

and 5-5, error back transmission.

8. The method of claim 7, wherein the encoding of step 5-1, DRBox, specifically comprises:

in the formula, the position information of the ith priori DRBox Position information/>, of the j-th real object DRBoxOffset information/>, of ith a priori DRBox and jth real target DRBoxM is { cx, cy, w, h, a }, cx, cy, w, h, a respectively representing DRBox abscissa, ordinate, width, height and angle; /(I)And representing vectors formed by the offset information of all real targets.

9. The method of claim 8, wherein the matching of step 5-2 to DRBox specifically comprises:

step 5-2-1, if there is no intersection between two DRBox, RIoU is 0; if prior DRBox and true DRBox do not belong to the same class, RIoU is also 0;

step 5-2-2, performing bidirectional matching on the priori DRBox and the true DRBox, wherein the specific steps include:

For each true DRBox, if there is a priori DRBox that RIoU >0, then taking the largest a priori DRBox of RIoU to match the true DRBox and marking the marker variable as 1, otherwise marking as 0;

for each a priori DRBox, if there is a true DRBox with RIoU >0.5, then matching each of the true DRBox with a priori DRBox and marking the variable as 1, otherwise as 0;

The RIoU represents the absolute value of IoU times the cosine of the two DRBox angle differences, and the IoU represents the ratio of the intersection to union area of the two boxes.

10. The method of claim 8, wherein the step 5-3 of performing positive and negative sample equalization specifically comprises:

Step 5-3-1, calculating the confidence coefficient of the negative sample as the background through the confidence coefficient prediction layer,

Step 5-3-2, sorting the background confidence of all negative samples,

And 5-3-3, sampling negative samples according to the order of the confidence from low to high, so that the number of the positive samples and the number of the negative samples meet a given proportion, and obtaining a set Neg of negative sample priori DRBox sequence numbers of balanced positive and negative samples.

11. The method according to claim 8, wherein the step 5-4, loss calculation, specifically comprises:

Calculating cost loss by loss function of loss layer in the following way

Wherein the position loss is calculated by

Wherein the confidence loss is calculated by a Softmax function of the formula

Where N is the number of DRBox on the match, when n=0, the loss layer outputs 0; alpha is a position loss weight adjusting factor, alpha epsilon [0,1]; x _ij is the marker variable of the matching situation of the ith priori DRBox and the jth true DRBox; confidence prediction layer output of detection network is as follows Confidence prediction information for the i-th prior DRBox,/>Is a two-element array, and represents the probability that the two-element array belongs to a foreground (1) and a background (0) respectively; the position prediction layer output is/>Coding position prediction information/>, of i priori DRBox parametersPos is a set of a priori DRBox sequence numbers that match true DRBox; GT is the set of all real DRBox serial numbers; neg is a set of negative sample priori DRBox serial numbers for positive and negative sample equalization; /(I)Is a smooth L1 norm function.

12. The method of claim 11, wherein the error back-propagation in step 5-5 specifically comprises:

In the detection network, the error of the loss layer is reversely transmitted to a position prediction layer, a confidence prediction layer and each convolution layer, and parameters of each layer are corrected, namely, parameters of a network model are updated by utilizing gradients of the loss function, and the method specifically comprises the steps of calculating the gradients of the loss function in the loss layer:

13. The method according to claim 1, wherein step 7, for the multiple image minidrawings, performs body detection and shadow detection of the transmission tower respectively based on the target detection network of the multi-angle rotatable candidate frame DRBox depth learning detection model of the tower body and the shadow, and specifically includes;

Step 7-1, inputting the cut image small image into a trained deep learning detection model, and outputting a coding position prediction vector and a confidence coefficient prediction vector through a target detection network in the deep learning detection model;

Step 7-2, obtaining the position information of the prediction DRBox through a decoding process according to the coding position prediction vector and the priori DRBox, and associating a confidence prediction result and the category information of the priori DRBox to each prediction DRBox;

And 7-3, obtaining a final detection result through non-maximum value inhibition NMS.

14. The method of claim 13, wherein in step 7-2, obtaining the predicted DRBox location information from the location prediction vector and the prior DRBox by a decoding process comprises: