CN112084901B

CN112084901B - GCAM-based high-resolution SAR image airport runway area automatic detection method and system

Info

Publication number: CN112084901B
Application number: CN202010871235.3A
Authority: CN
Inventors: 陈立福; 谭思雨; 潘舟浩; 邢进; 李振洪; 袁志辉; 邢学敏
Original assignee: Changsha University of Science and Technology
Current assignee: Changsha University of Science and Technology
Priority date: 2020-08-26
Filing date: 2020-08-26
Publication date: 2024-03-01
Anticipated expiration: 2040-08-26
Also published as: CN112084901A

Abstract

The invention discloses a GCAM-based high-resolution SAR image airport runway area automatic detection method and a GCAM-based high-resolution SAR image airport runway area automatic detection system, wherein the GCAM-based high-resolution SAR image airport runway area automatic detection method comprises the steps of downsampling a high-resolution SAR image to generate a medium-resolution image; inputting the medium resolution image into a geospatial context attention mechanism network (GCAM) to extract a runway region; and carrying out coordinate mapping on the extracted runway area to obtain an airport runway area detection result of the final high-resolution SAR image. Experiments show that compared with deep LabV3+, refineNet and MDDA networks, the method provided by the invention has high precision and less time consumption, can fully learn the geospatial information of SAR image airports, and can realize high-precision, rapid and automatic extraction of high-resolution SAR image airport runway areas.

Description

GCAM-based high-resolution SAR image airport runway area automatic detection method and system

Technical Field

The invention relates to an automatic detection technology of an airport runway area, in particular to an automatic detection method and an automatic detection system of a high-resolution SAR image airport runway area based on GCAM.

Background

Airports are important transportation hubs and military facilities, and detection of airport targets from synthetic aperture radar (Synthetic Aperture Radar, SAR) images has become an important application. SAR has the advantages of all-weather imaging, cloud penetration and the like in the whole day, but SAR images are less readable and more complex to interpret relative to optical images, so that most airport detection is generally based on optical remote sensing images. As the resolution of SAR images is higher and the data is more and more, the research of extracting airports by using SAR images is gradually increased in recent years, and related research is also continuously in progress. The traditional airport extraction method is time-consuming and labor-consuming and has great effect on the airport extraction of the optical image, but has poor effect on the airport extraction of the SAR image. Therefore, the realization of automatic and rapid extraction of airport runway areas on high-resolution SAR images has profound and urgent practical significance. In addition, the airport runway area is utilized to mask the airplane detection, so that false alarms in the airplane detection can be greatly reduced, and the airplane detection precision is improved.

Airport detection has wide application in navigation of terminal, accident search and rescue, and airplane positioning. Runway area is one of the main components of airport, and in the research on detecting airport, the research on detecting airport on optical remote sensing image is most. At present, the prior art carries out airport detection by using a traditional method for extracting airport edge line segments, but the method for extracting the line segments requires an airport to have obvious linear characteristics, which is not applicable to large civil airports with more terminal buildings and weak runway linear characteristics; still other approaches utilize sparse reconstructed salient models (SRS) and Target Aware Active Contour Models (TAACM) to accomplish airport detection, which enhance detail extraction from airports; still other schemes combine visual saliency analysis model, bi-directional complementary saliency analysis module and saliency active contour model (SOACM) to perform airport contour extraction, and the method is applicable to most optical remote sensing images; the SAR image has strong sudden prevention capability, can work without interference, and can acquire rich ground feature information. Still other schemes combine the traditional line segment grouping method and the saliency analysis model to detect the airport of the small SAR image, but the method is not suitable for detecting the airport on the large SAR image; still other schemes propose a PolSAR airport runway detection algorithm that combines optimized polarization characteristics with random forests, but this method can only effectively extract parallel runway characteristics in airports.

In recent years, deep learning has achieved very good results in the direction of semantic segmentation. The semantic segmentation is a deep learning method for realizing different classification of images by performing feature learning based on image pixels. Airport detection requires extraction of all airport features, which is consistent with the concept of semantic segmentation, so that deep learning combined with airport detection begins to appear. For example: some prior art proposes an airport detection method combining a deep learning YOLO model and a saliency analysis model; a method combining a deep learning Goole-LF network and a Support Vector Machine (SVM) in the prior art is used for detecting an airport; some prior art combines deep learning fast-CNN network and space analysis method to extract airport; some prior art builds end-to-end depth transferable convolutional deep learning networks to detect airports; however, the above methods are examples of applying deep learning to optical remote sensing images, and because of the lack of airport sample data, the deep learning model often has over-fitting during training. For high-resolution SAR image airport extraction, a deep learning network MDDA (Mult-level and densely dual attention) of a high-resolution SAR image runway area is provided in a certain prior art, and the deep learning network MDDA can realize high-precision airport extraction, but requires a larger data set and has longer training time. Therefore, it is of practical significance to find a deep learning method that is suitable for small sample data sets and that can efficiently extract airports.

The deep learning network is very rapid in development, and the deep Lab series in deep learning has excellent performance in the field of semantic segmentation. In 2014 deep labv1, a tape Kong Juanji (Atrous Conv) is introduced for the first time, the problems that the traditional CNN algorithm is adopted under signals existing in pixel marks, space is not deformed and the like are solved, the capability of capturing fine details of a model is improved through a Conditional Random Field (CRF), and the deep labv1 obtains a second name in a PASCAL semantic segmentation challenge; the deep Labv2 is proposed in 2016, and the deep Labv2 further provides a ASPP (Astrous spatial pyramid pooling) module on the basis of the deep Labv1, so that context semantic information is captured from a multi-scale direction, and a backbone network VGG-16 is changed into ResNet, so that the problem of feature resolution reduction caused by pooling in the traditional CNN is solved; in 2017 deep labv3, the deep labv3 improves ASPP on the basis of deep labv2, so that the network performance is better; in 2018, deep Labv3+ is further improved on the basis of deep Labv3, the deep Labv3+ is introduced into a coding-decoding structure, the deep Labv3 is used as a coding part, a simple and effective decoding block is designed, and depth separable convolution (Depthwise separable convolution) is added into a main network, so that the model can effectively reduce the calculated amount and the parameter amount on the premise of maintaining the performance.

Therefore, aiming at the problems existing in the airport extraction of the SAR image, how to realize the high-precision, rapid and automatic extraction of the airport runway area of the high-resolution SAR image based on the deep learning has become a key technical problem to be solved urgently.

Disclosure of Invention

The invention aims to solve the technical problems: aiming at the problems in the prior art, the invention provides the automatic detection method and the system for the airport runway area of the high-resolution SAR image based on the GCAM.

In order to solve the technical problems, the invention adopts the following technical scheme:

a GCAM-based high-resolution SAR image airport runway area automatic detection method comprises the following steps:

1) Downsampling the high-resolution SAR image to generate a medium-resolution image;

2) Inputting the medium resolution image into a geospatial context attention mechanism network (GCAM) to extract a runway region;

3) And carrying out coordinate mapping on the extracted runway area to obtain a detection result of the final high-resolution SAR image.

Optionally, the step 1) of downsampling the high-resolution SAR image specifically refers to performing a 5-time downsampling process on the SAR image by using a pixel value extraction method.

Optionally, the geospatial context attention mechanism network GCAM includes an encoding block and a decoding block, the encoding block includes a residual network res net, a multi-scale extrusion pyramid MSP, and an edge refinement module EDM, the residual network res net is used for extracting features of an input data set to obtain preliminary features, the multi-scale extrusion pyramid MSP is used for obtaining global context information by operating with different pooled convolution layers from different resolutions for the preliminary features, the edge refinement module EDM is used for enhancing network edge extraction capability for the preliminary features, and outputs of the multi-scale extrusion pyramid MSP and the edge refinement module EDM are further fused to obtain multi-level features; the decoding block is used for combining the primary characteristics and the multi-level characteristics to perform semantic segmentation of the runway area of the airport to extract the runway area.

Optionally, the residual network ResNet is an improved residual network obtained by replacing the common two-dimensional convolution with the hole convolutions with the hole rates of 2, 4, 8 and 16 on the basis of the residual network ResNet_101.

Optionally, the multi-scale extrusion pyramid MSP includes a multi-receptive field parallel pooling working layer and an effective attention module eSE, where the multi-receptive field parallel pooling working layer is formed by parallelly constructing a 1×1 convolution with a void ratio of 1, three 3×3 convolutions with void ratios of 6,12,18, a global average pooling module GAP and a stripe pooling module SP; the striping and pooling module SP performs pooling operation in the horizontal direction by using a banded pooling window H multiplied by 1 and pooling operation in the vertical direction by using a banded pooling window 1 multiplied by W aiming at a two-dimensional characteristic tensor with the input size H multiplied by W, and respectively averages element values in a pooling core to obtain striped and pooled output in the horizontal direction, The method comprises the steps of (1) carrying out striped pooling output in the vertical direction, then respectively expanding the output in the left-right direction and the up-down direction by using two one-dimensional convolutions aiming at the striped pooled output in the horizontal direction and the striped pooled output in the vertical direction, fusing the two expanded feature graphs, and finally multiplying the original data and the fused data to obtain H multiplied by W two-dimensional feature tensor output; the effective attention module eSE learns the features F for the input feature map Xi, first by global averaging pooling _avg Feature F _avg Obtaining a weight matrix W through full connection layer processing _C The weight matrix W _C Readjusting extracted channel attention features A by Sigmoid function _eSE Channel attention feature A _eSE Application of feature map Xi to input results in refined feature map X _refine Finally, the characteristic diagram X is refined _refine And performing feature rescreening to obtain global context information.

Optionally, the edge refinement module EDM includes a global convolution module GCB for enhancing the close relation between the feature map and the pixel classification layer and the capability of processing the feature map with different resolutions to obtain global information, and an edge refinement module BR for improving the edge extraction capability of the coding block from the global information; the global convolution module GCB comprises a k×k large convolution kernel and a feature combination module, wherein the k×k large convolution kernel comprises two paths, one path is formed by k×01×1c×2c convolution and 1×3k×c×c convolution, the other path is formed by 1×k×c×c convolution and k×1c×c convolution, c is the number of channels, and the output results of the two paths are input into the feature combination module together to obtain a feature Sum _W×H × _C The method comprises the steps of carrying out a first treatment on the surface of the The edge refinement module BR is directed to the feature Sum _W×H×C Sequentially processing by a small convolution kernel, an activation function and a small convolution kernel, and then superposing the processing result to the original characteristic Sum _W×H×C Finally, the characteristic diagram after the edge of the runway area is refined is obtained.

Optionally, the decoding block decodes the edge information of the feature map obtained by performing 1×1 convolution dimension reduction on the output feature of the encoding block and using an edge refinement module EDM to obtain a feature map of the edge of the refined runway area, performs bilinear 4-time upsampling, then connects the preliminary feature output by the residual network res net after performing 1×1 convolution dimension reduction with the result obtained by bilinear 4-time upsampling, applies a 3×3 convolution to refine the feature by the connected feature, and finally performs a simple bilinear 4-time upsampling, thereby obtaining the final segmentation result.

In addition, the invention also provides a GCAM-based high-resolution SAR image airport runway area automatic detection system, which comprises the following steps:

a downsampling program unit for downsampling the high-resolution SAR image to generate a medium-resolution image;

a runway region extraction program unit for inputting the medium resolution image into a geospatial context attention mechanism network GCAM to extract a runway region;

And the coordinate mapping program unit is used for carrying out coordinate mapping on the extracted runway area to obtain a final detection result.

In addition, the invention also provides a GCAM-based high-resolution SAR image airport runway area automatic detection system, which comprises a computer device, wherein the computer device comprises a microprocessor and a memory which are connected with each other, the microprocessor is programmed or configured to execute the steps of the GCAM-based high-resolution SAR image airport runway area automatic detection method, or the memory stores a computer program programmed or configured to execute the GCAM-based high-resolution SAR image airport runway area automatic detection method.

Furthermore, the present invention provides a computer-readable storage medium having stored therein a computer program programmed or configured to perform the GCAM-based high resolution SAR image airport runway area automatic detection method.

Compared with the prior art, the invention has the following advantages: downsampling a high-resolution SAR image to generate a medium-resolution image; inputting the medium resolution image into a geospatial context attention mechanism network (GCAM) to extract a runway region; and carrying out coordinate mapping on the extracted runway region to obtain a detection result of a final high-resolution SAR image, and combining deep learning with airport runway region extraction on the SAR image to fully learn the geospatial information of an airport of the SAR image, so that the high-precision, rapid and automatic extraction of the airport runway region of the high-resolution SAR image can be realized.

Drawings

Fig. 1 is a schematic diagram of the basic principle of the method according to the embodiment of the invention.

Fig. 2 is a schematic diagram of an improved residual network in an embodiment of the present invention.

Fig. 3 is a schematic structural diagram of a striping pool module SP according to an embodiment of the present invention.

Fig. 4 is a schematic structural diagram of an effective attention module eSE in an embodiment of the present invention.

Fig. 5 is a schematic structural diagram of the global convolution module GCB and the edge refinement module BR according to an embodiment of the present invention.

Fig. 6 is a SAR image, a tag, and an optical remote sensing image of a certain airport sample tag according to an embodiment of the present invention.

Fig. 7 is a runway extraction result for airport I in an embodiment of the invention.

Fig. 8 is a runway extraction result for airport II in an embodiment of the invention.

Fig. 9 is a runway extraction result for airport III in the embodiment of the present invention.

Detailed Description

As shown in fig. 1, the method for automatically detecting the airport runway area of the high-resolution SAR image based on the GCAM of the embodiment comprises the following steps:

In this embodiment, the downsampling of the high-resolution SAR image in step 1) specifically refers to performing 5-time downsampling on the SAR image by using a pixel value extraction method. The method mainly comprises the steps of downsampling of two parts, namely downsampling of a data set sample picture, downsampling of three high-resolution test SAR images, and sampling to obtain a medium-resolution SAR image.

In order to extract the airport runway area of the SAR image rapidly, this embodiment proposes a geospatial context attention mechanism network GCAM (Geospatial Contextual Attention Mechanism), as shown in fig. 2, the geospatial context attention mechanism network GCAM includes an encoding block and a decoding block, the encoding block includes a residual network res net, a Multi-scale extrusion pyramid MSP (Multi-scale Squeeze Pyramid) and an edge refinement module EDM (Edge Detection Module), the residual network res net is used for extracting features of an input data set to obtain preliminary features, the Multi-scale extrusion pyramid MSP is used for obtaining global context information for the preliminary features by operating with different pooled convolution layers from different resolutions, the edge refinement module EDM is used for enhancing network edge extraction capability for the preliminary features, and outputs of the Multi-scale extrusion pyramid MSP and the edge refinement module EDM are further fused to obtain Multi-level features; the decoding block is used for combining the primary characteristics and the multi-level characteristics to perform semantic segmentation of the runway area of the airport to extract the runway area. Firstly, a coding block performs preliminary feature extraction on an input data set by using a residual error network ResNet; the multi-scale extrusion pyramid MSP and the edge refinement module EDM respectively perform feature re-extraction and fusion on the primary features, the multi-scale extrusion pyramid MSP acquires global context information by different pooling convolution layer operations from different resolutions, the edge refinement module EDM enhances network edge extraction capacity, and multi-level features are further fused; the decoding block adopts edge refinement decoding, one part of the decoding block receives multilevel advanced features from the encoding block, and the other part of the decoding block receives preliminary features from a residual error network ResNet, so that semantic segmentation of an airport runway area is realized.

The residual network ResNet is a backbone network of the geospatial context attention mechanism network GCAM, has the characteristics of jump connection, residual optimization and the like, can accelerate training, improves model accuracy, and is very suitable for building a semantic segmentation network. In order to solve the problem that the detail features are easy to lose in the network pooling operation, as shown in fig. 2, the residual network res net adopted in the embodiment is an improved residual network obtained by replacing the common two-dimensional convolution with the hole ratios of 2, 4, 8 and 16 on the basis of the residual network res net_101. The problem that detail features are easy to lose in network pooling operation can be solved by the aid of hole convolution, the number of parameters of a residual network ResNet is not additionally increased by the aid of hole convolution, a rear convolution layer can keep a larger feature map size, detection of target pixels is facilitated, and overall performance of a model is improved. Considering the addition of hole convolution, for any position j of the picture, a filter ω (k) is applied over the input feature x [ j+ r.k ], then the output y (j) is:

where the rate r introduces r-1 0 values between sampling points, effectively expanding the receptive field from k x k to k+ (k-1) (r-1) without increasing the number of parameters and the amount of computation. An improved structure part of the improved residual network is given in fig. 2. The last block (block) of the residual network ResNet_101 is copied for 4 times and then is built in parallel, but the simple parallel work of the blocks does not use the network to acquire deep semantic information, so that the features are concentrated in the last few layers of smaller feature graphs, and the continuous convolution with step length does not use semantic segmentation. Therefore, in this embodiment, the hole convolutions with hole rates of 2, 4, 8 and 16 are substituted for the normal two-dimensional convolutions, thereby improving the final output step size. The addition of the cavity convolution changes the resolution of part of the feature map, so that the last output feature of the residual network ResNet_101 not only has the high-dimensional low-resolution feature map, but also contains part of the low-dimensional high-resolution feature, and the full extraction of the multi-dimensional feature is realized.

Referring to fig. 1, the multi-scale extrusion pyramid MSP includes a multi-receptive field parallel pooling working layer and an effective attention module eSE.

Referring to fig. 1, the multi-receptive field parallel pooling working layer is formed by parallelly constructing a 1×1 convolution with a void ratio of 1, three 3×3 convolutions with void ratios of 6,12,18, a global average pooling module GAP and a stripe pooling module SP; the striping pooling module SP performs pooling operation on two-dimensional feature tensors with the input size of H multiplied by W in the horizontal direction by utilizing a banded pooling window H multiplied by 1, performs pooling operation on the banded pooling window 1 multiplied by W in the vertical direction, respectively averages element values in a pooling core to obtain striping pooled output in the horizontal direction and striping pooled output in the vertical direction, and then performs expansion on the output in the horizontal direction and the striping pooled output in the vertical direction in the left-right direction and the up-down direction by using two one-dimensional convolutions respectively, wherein the sizes of the two expanded feature graphs are the same, fuses the two expanded feature graphs, and finally multiplies original data and data subjected to Sigmoid processing to obtain two-dimensional feature tensors of H multiplied by W; in this embodiment, the feature map after the processing of the improved residual network includes 256 channels and rich semantic information, and is input to the multi-receptive-field parallel pool working layer, where the layer is formed by parallel construction of a 1×1 convolution with a void rate of 1, 3×3 convolutions with three void rates of 6,12,18, a global average pooling module GAP and a stripe pooling module SP. The four hole convolutions with different hole rates can effectively capture multi-scale information from different receptive fields; the global average pooling is added to carry out downsampling treatment on the characteristics so as to prevent network overfitting; pooling the stripes to capture local information of the features; the multi-receptive field parallel pooling working layer realizes multi-scale feature fusion.

The striped pooling module SP (Stripe Pooling) overcomes the shortcoming of the general pooling that is prone to false alarms. As shown in FIG. 3, when a two-dimensional feature tensor x ε R is input ^H×W In this case, the stripe pooling module SP performs the pooling operation in the horizontal direction and the vertical direction using the stripe-shaped pooling windows h×1 and 1×w, respectively, averages the element values in the pooling core, and uses the values as the pooling output values. Striped pooled output y in horizontal direction ^h ＝R ^H The method comprises the following steps:

in the above-mentioned method, the step of,for the output of any matrix element in the horizontal stripe pool, x _i,j All matrix elements within the kernel are pooled.

Striped pooled output y in vertical direction ^v ＝R ^W The method comprises the following steps:

in the above-mentioned method, the step of,to be the output of the striping pool to any matrix element in the vertical direction, x _i,j All matrix elements within the kernel are pooled.

After h×1 and 1×w coring processing, the output is expanded in the left-right direction and the up-down direction using two one-dimensional convolutions. And after expansion, the two feature graphs have the same size, then fusion is carried out, and finally, the original data and the data processed by the Sigmoid function are multiplied, and a result is output. In the striped pooling layer in the horizontal and vertical directions, the discretely distributed pixel regions and the stripe-like pixel regions are easily interdependent. Since the convolution kernel is long and narrow and the convolution kernel shape is narrow in opposite dimensions, it is easy to capture local information of the feature. These features all make streak pooling superior to square kernel based average pooling.

The effective attention module eSE (effective Squeeze-and-Excitation Module) is used for screening the characteristics from the channel information after receiving the multi-scale characteristics. Referring to fig. 4, the effective attention module eSE first learns features F by global averaging pooling for the input feature map Xi _avg Feature F _avg Obtaining a weight matrix W through Full Connection (FC) processing _C The weight matrix W _C Readjusting extracted channel attention features A by Sigmoid function _eSE Channel attention feature A _eSE Multiplying the input feature map Xi to obtain refinementFeature map X _refine And thus, each input Xi is assigned with weights pixel by pixel, and the feature rescreening is realized. The full connection layer (FC) and a Sigmoid function readjust the input feature map to extract useful channel information.

When the size of the input feature map is X _i ∈R ^C×W×H Then the effective channel attention map A _eSE (X _i )∈R ^C×1×1 The calculation is as follows:

A _eSE (X _i )＝σ(W _C (F _gap (X _i )))

in the above, A _eSE (X _i ) Representing channel attention feature A extracted for input feature map Xi _eSE Sigma is a Sigmoid function, W _C As a weight matrix, F _gap (X _i ) The feature F obtained for global average pooling of the feature map Xi for input _avg And F _gap (X _i ) The functional expression of (2) is:

in the above formula, xi, j represent all elements in the matrix of the feature map Xi.

Channel attention feature A _eSE Application of feature map Xi to input results in refined feature map X _refine The expression of (2) is as follows:

in the above-mentioned method, the step of,representing exclusive or. The input feature map Xi is a multi-scale feature map from the multi-scale extrusion pyramid MSP output. Will A _eSE (X _i ) The application of attention as a channel feature to a multi-scale feature map may make the multi-scale feature more informative. Finally, the output characteristic diagram is element by elementIs input to the refined feature map X _refine And (5) carrying out characteristic rescreening.

Referring to fig. 1, the multi-scale extrusion pyramid MSP, edge refinement module EDM, work in parallel and simultaneously receive the output signature from the modified residual network. As shown in fig. 1, the edge refinement module EDM includes a global convolution module GCB (Global Convolutional Block) for enhancing the affinity of feature maps with the pixel classification layer and the ability to process different resolution feature maps to obtain global information, and an edge refinement module BR (Boundary Refinement) for improving the edge extraction ability of an encoded block from the global information. The edge refinement module EDM can effectively solve the problems of pixel point classification and positioning in semantic segmentation, wherein the global convolution module GCB increases the size of a convolution kernel to the space size of a feature map, so that the feature map and a pixel classification layer keep close connection, the capability of processing different features is enhanced, and global information is obtained; an edge refinement module BR is then introduced to further enhance the network edge extraction capability.

As shown in fig. 5, the global convolution module GCB in this embodiment includes a kxk large convolution kernel and a feature combination module, where the kxk large convolution kernel includes two paths, one is composed of a k×01×1c×2c convolution and a 1×3 kxc×c convolution, and the other is composed of a 1×kxc×c convolution and a k×1×c×c convolution, where c is the number of channels, and the output results of the two paths are input to the feature combination module together to obtain the feature Sum _W×H×C The method comprises the steps of carrying out a first treatment on the surface of the Edge refinement module BR targets feature Sum _W×H×C Sequentially processing by a small convolution kernel, an activation function and a small convolution kernel, and then superposing the processing result to the original characteristic Sum _W×H×C Finally, the characteristic diagram after the edge of the runway area is refined is obtained.

Referring to fig. 5, the global convolution module GCB adopts a convolution construction mode, and fully utilizes multi-channel information of features. In terms of pixel point classification, the global convolution module GCB adopts a large convolution kernel, so that semantic information corresponding to each pixel point is not changed due to image transformation (translation, overturn and the like), and the pixel-to-pixel connection is more intimate; in terms of pixel point positioning, the global convolution module GCB uses full convolution, and uses a matrix decomposition principle to replace large kernel convolution of k×k by using 1×k and k×1, k×1 and 1×k convolution, so that the number of parameters is reduced, the calculated amount is reduced, each pixel type can be matched with the corresponding correct type, and accurate pixel segmentation is realized. Because the global convolution module GCB does not have a BN layer (Batch Normalization) and an activation function, an edge refinement module BR of a small convolution kernel is introduced, the phenomenon of wrong division of object boundary pixels is prevented, and classification accuracy and positioning accuracy are realized.

As shown in fig. 1, the decoding block decodes the edge information of the feature map obtained by performing 1×1 convolution dimension reduction on the output feature of the encoding block and using the edge refinement module EDM to obtain the feature map of the edge of the refined runway area, performs bilinear 4 times up-sampling, then connects the preliminary feature output by the residual network res net after performing 1×1 convolution dimension reduction with the result obtained by bilinear 4 times up-sampling, applies a 3×3 convolution to refine the feature, and finally performs a simple bilinear 4 times up-sampling, thereby obtaining the final segmentation result. The input of the decoding block consists of two parts: the output characteristics of the coding block and the preliminary characteristics of the ResNet output of the residual network. The output characteristics of the coding block are subjected to 1X 1 convolution dimension reduction, edge information decoding is carried out by EDM, and bilinear 4 times up-sampling is carried out, so that the operation is that the edge information is fully decoded while the number of characteristic channels is reduced; and then connects to corresponding features from the backbone network of the same spatial resolution, since features from the backbone network contain a portion of low-level features, which typically contain a large number of channels, we also resort to a 1 x 1 convolution to reduce the number of channels, reducing unnecessary channel computation by the network.

In this embodiment, step 3) is configured to perform coordinate mapping on the extracted runway area to obtain a final detection result. The coordinate mapping is the same as the existing method, so that details are not repeated here, and after the geographic space context attention mechanism network GCAM realizes the airport runway region segmentation of the medium-resolution SAR image, the result map is processed by using the coordinate mapping method, so that the result map of the high-resolution SAR original map is obtained. And finally, visualizing the result image and the original image to realize the runway region extraction of the high-resolution SAR image.

The automatic detection method of the airport runway area of the high-resolution SAR image based on the GCAM in the embodiment is experimentally verified. The experimental environment is as follows: CPU Inter to strong gold medal 5120; GPU (single) NVIDIA RTX 2080Ti; the data set is an SAR image of a high-resolution No. 3 system, and firstly, 5 times downsampling is carried out on 10 airport sample images by using a pixel extraction method; pixel labeling is then performed using LaberImage software, and is divided into runway areas and background. The 10 downsampled intermediate resolution SAR images are arbitrarily cut into images larger than 480 multiplied by 480 to form a small sample data set, and 466 images are generated. The ratio of training set to validation set is 4:1. As shown in fig. 6, sub-images (a) - (c) are respectively SAR images, labels and optical remote sensing images of a certain airport sample label; the communication area where the mark a is positioned is a runway area, and the runway area comprises an aircraft runway, a taxiway, a parking apron and an aircraft; the remaining individual connected regions are the background.

The parameters were set as follows: the learning rate is set to 0.00001 and the weight attenuation coefficient is set to 0.995 in the process of network training. The batch (batch size) of input pictures is 1, the web training iterates 100 times, one epoch is saved every 5 times. In the training process, the input pictures are subjected to random clipping, and the size of a window subjected to random clipping is 480 multiplied by 480.

In this embodiment, PA (Pixel accuracy) and IOU (cross ratio, intersection over union) are adopted as parameters for verifying runway extraction accuracy. PA represents the proportion of the marked correct pixels to the total pixels; IOU represents the ratio of the intersection and union between the segmentation result and the label; MPA (average pixel ratio, mean pixel accuracy) represents the proportion of correctly classified pixel numbers for each class; MIOU (homozygote ratio, mean intersection over union) represents the average of the IOUs over each class. The specific steps are as follows:

assuming a total of k+1 classes (including one class of background), P in the above formula _ij Representing the number of pixels that would belong to class i but are predicted to class j, is a false positive sample, P _ji Representing the number of pixels that are originally of class j but predicted to be of class i, is a false negative, P _ii Indicating the true number of pixels of class i.

In order to verify the high efficiency of the SAR image airport runway area extracted by the method of the embodiment, three groups of comparison experiments are performed. The method of this example was subjected to comparative experiments with deep labv3+, refineNet and MDDA. The experimental airport is three, namely an airport I with the size of 12000 multiplied by 15000, an airport II with the size of 9600 multiplied by 9600 and an airport III with the size of 15000 multiplied by 17500, and none of the airports is used in the data set. MDDA is a deep learning network which is suitable for extracting an airport runway area of SAR images and is proposed before, and DeepLabV3+ and RefineNet are semantic segmentation mainstream networks. The dataset used for the experiment was 466 small sample dataset we manually annotate. Since the network outputs a downsampled medium resolution image, the downsampled airports I, II, III are 2400X 3000, 2000X 2000, and 3000X 3500, respectively. And finally, carrying out coordinate mapping processing on the result graph to directly obtain the result graph before downsampling. The network training time, the picture testing time and the runway region extraction precision before and after sampling are analyzed.

Fig. 7 to 9 show the results of airport runway area extraction at airports i, ii, iii, respectively. Wherein (a) is a high-resolution SAR original image, (b) is a medium-resolution SAR image after 5 times downsampling, (c) is a category label of a runway region, red is the runway region, and black is a non-runway region, namely a background; (d) Is the result of the extraction of the intermediate resolution SAR image by the RefineNet, (e) is the result of the extraction of the intermediate resolution SAR image by the MDDA, (f) is the result of the extraction of the intermediate resolution SAR image by the deep LabV3+, (g) is the result of the extraction of the intermediate resolution SAR image by the method (GCAM) of the embodiment; (h) Is a fusion graph of a refinnet result (d) and a medium resolution SAR map (b), (i) is a fusion graph of an MDDA result (e) and a medium resolution SAR map (b), (j) is a fusion graph of a deepchv3+ result (f) and a medium resolution SAR map (b), and (k) is a fusion graph of a GCAM result (g) and a medium resolution SAR map (b) of the method of the present embodiment; (l) - (o) is a fusion of the results (d) - (g) after the coordinate mapping process and the high resolution artwork (a); wherein the area with the reference number 1 is a runway area; the region frame with the number of 2 is a false detection frame, namely the background false detection is the part of the runway area; the area box numbered 3 marks the missed detection box, i.e. the part of the runway area that is not detected.

1. Airport I test results and analysis.

As shown in a sub-graph (a) in fig. 7, an airport I mainly consists of a large-area long runway area and an airplane parking apron, and has a large number of airplanes and obvious airplane target bright spots in the airport; the background area is provided with a gathering housing area and a complicated traffic line.

We tested a medium resolution SAR image of airport i with a test pattern size of 2400 x 3000. As shown in sub-graphs (d) - (g) in fig. 7, the extraction result of the method of this embodiment is closest to the label, and the MDDA does not extract the edge of the runway area completely; the deep LabV3+ has a small part of omission phenomenon on the runway area; the refinet extraction effect is the worst. From the visible views (h) - (k), we mark the main missed boxes. The method of the embodiment has no large missed detection area, the MDDA has 2 main missed detection areas, the deep LabV3+ has 4 more obvious missed detection areas, the false detection frame of the RefineNet is the most, more edge missed detection exists, and the missed detection areas are all in the edge zone of the runway area. Comparing the network result (j) of the method of the embodiment with the extraction result (k) of deep labv3+, it can be seen that the addition of the edge refinement module EDM in the method of the embodiment enhances the learning of the network edge features.

2. Airport II experimental results and analysis.

Airport II is simpler than airport I in character. The runway area of the airport II mainly comprises long straight runways, and besides small building groups near the edge area of the airport, large residential areas are not available, but more water areas are available around the runway area. The water area is imaged under the synthetic aperture radar to present a deep black feature like a runway, which has an interference on the network distinguishing feature.

We tested medium resolution SAR images of size 2000 x 2000 for airport ii. The runway area extraction for airport II is shown in FIG. 8. As can be seen from comparing the sub-graphs (d), (e), (f), (g) and (c) in fig. 8, the extraction result of the method of this embodiment is free of false alarm and the extraction effect is the best. As can be seen from the sub-graphs (h) - (k) in fig. 8, the method of the present embodiment has only a small missing frame; the MDDA has 1 false detection frame and 4 more obvious missed detection frames; the deep LabV3+ false alarms are more and the missed detection frames are the most; for the left edge region of airport II, there are several missed detection regions for RefineNet; their extraction effect is to be improved. The edge extraction capability and the false alarm removal capability of the method of the embodiment are the best, which also provides the advantages of the multi-scale extrusion pyramid MSP in the method of the embodiment.

3. Airport III experimental results and analysis.

The runway area structure and surrounding ground structure of airport iii are the most complex. The runway, taxiway, rest station and parking apron are more. Airport III is a civil airport, and the runway area of airport III is mostly short runway, and long straight runway with large area is not available. Gray and bright spots are most in surrounding ground object SAR characteristics, and the surrounding ground object SAR characteristics are obviously compared with airport runway areas, so that the probability of network misjudgment is reduced. However, the edge features of the airport III are complex, and the edge information is the most contained, which requires the network to have better global semantic information learning capability and to be able to decode the edge information effectively.

We tested medium resolution SAR images of size 3000 x 3500 for airport iii. Comparing sub-graphs (d) - (l) in fig. 9, the extraction effect of the method of this embodiment is also the best, only a part of the missing detection small area; the MDDA detection has two obvious false alarms; the deep LabV3+ has a large number of missed tests, which shows that the learning ability of the deep LabV3+ to the edge information is not strong; refineNet has a large number of false alarms and has the worst extraction effect. This also provides the effectiveness of the edge decoding of the method of the present embodiment.

In order to more intuitively embody the high efficiency of the method of the embodiment on the extraction of the airport runway area. Table 1 gives the extraction accuracy of the mid-resolution SAR images of three airports under different algorithms. The average extraction precision of the method in the embodiment for three airport runway areas reaches 0.9823, and the average IOU reaches 0.9665, which are higher than MDDA, deepLabV & lt3+ & gt and RefineNet. According to Table 1, the difference between the PA and IOU values for the same airport runway area is small, which indicates that the runway area can be almost completely extracted and no false alarm exists; the deep LabV3+ is easy to generate false alarm, so that a certain value difference exists between the PA and the IOU of the runway area of the same airport, and the false alarm can reduce the IOU of the runway area; although the MDDA has poor overall extraction effect, the MDDA has defects in detail learning of a small sample data set; the PA and IOU values of RefineNet are both lowest.

Table 1: and (5) analyzing the extraction accuracy of different networks.

Table 2 gives the training time of the different algorithms for the small sample dataset and the test time for the medium resolution SAR images of the three airports. From table 2, we have a training time of about 2 hours for our network, from the training time for the small sample dataset; the training time of the MDDA is longest, and the effect of training a small sample by the MDDA is obviously less than that of training a large sample in approximately 8 hours; the training time of deep labv3+ and refinet is almost the same as that of the present example method, but the accuracy is far from. From the perspective of the test time of the medium resolution SAR images of three airports, the smaller the picture size, the shorter the test time, the average test time of the method of this embodiment is only 16.95s, the average test time of RefineNet is 16.69s, the average test time of DeepLabV3+ is 15.89s, and the test time of MDDA is approximately 2.5 times that of the method of this embodiment. The addition of MSP and EDM brings a certain parameter quantity to the network, which is also the reason that the training time and the testing time of the method of the embodiment are slightly longer than those of deep LabV3+, and the shorter the network training time and the picture testing time, the higher the efficiency of the actual engineering. Therefore, in a comprehensive view, the method of the embodiment can realize high-precision and rapid extraction in the aspect of processing the SAR image small sample data set, and has high efficiency.

Table 2: data set training time for different networks and test time for medium resolution airport images.

Therefore, the method of the embodiment can realize the rapid and automatic extraction of the airport runway area of the high-resolution SAR image. The network design is light, the iteration time of the network layer is greatly shortened, and the network training time and the picture testing time are reduced. MSP enables the network to learn global features and encode effective features in a multi-scale and omnibearing manner, and the parallel working mode of EDM and MSP enhances the learning between context semantic information, and EDM enables edge information to be completely decoded and extracted. Meanwhile, the network is more suitable for training small sample data sets, no SAR airport public large data set for semantic segmentation is available at present, only manual labeling can be realized, and small sample application is more beneficial to saving of labor time and cost. In general, from the aspects of extraction precision, data set training time and picture testing time, the performance of the network is better than that of the main stream algorithm deep LabV3+, and the GCAM is better than that of the algorithm MDDA proposed before, so that high-efficiency automation is realized.

In summary, in order to realize the airport rapid automatic extraction of the high-resolution SAR image, the embodiment provides an airport runway area automatic detection method of the high-resolution SAR image based on GCAM. The downsampling process enables a single Zhang Xunlian sample to contain more field information, which is beneficial to making a small sample data set; MSP is added into a stripe pool and works together with four parallel convolutions so as to learn characteristics in multiple scales, and the eSE module performs useful characteristic screening; EDM helps the network learn the edge semantic information, and the coordinate mapping processing can obtain the extraction result of the original high-resolution SAR image. In the inspection test of three airport runway areas, our network is best compared to deep labv3+, refinet and MDDA, with MPA up to 0.98 and miou up to 0.96. In addition, the time of the network training data set is only 2.25h, and the average test time of the picture is only 16.94s. From the extraction result, GCAM has no false alarm and less missed detection, and can efficiently realize the extraction of the airport runway area. In addition, the GCAM can improve the detection efficiency in the actual engineering; after the airport runway area is extracted, the detection range of the subsequent airplane extraction can be shortened, and the time is saved.

In addition, the embodiment also provides a high-resolution SAR image airport runway area automatic detection system based on GCAM, which comprises:

and the coordinate mapping program unit is used for carrying out coordinate mapping on the extracted runway area to obtain a detection result of the final high-resolution SAR image.

In addition, the embodiment also provides a GCAM-based high-resolution SAR image airport runway area automatic detection system, which comprises a computer device, wherein the computer device comprises a microprocessor and a memory, which are connected with each other, the microprocessor is programmed or configured to execute the steps of the GCAM-based high-resolution SAR image airport runway area automatic detection method, or the memory is stored with a computer program programmed or configured to execute the GCAM-based high-resolution SAR image airport runway area automatic detection method.

Furthermore, the present embodiment also provides a computer-readable storage medium having stored therein a computer program programmed or configured to perform the foregoing GCAM-based high resolution SAR image airport runway area automatic detection method.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-readable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein. The present application is directed to methods, apparatus (systems), and computer program products in accordance with embodiments of the present application, and to apparatus for performing functions specified in a flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks. These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the present invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.

Claims

1. The automatic detection method for the airport runway area of the high-resolution SAR image based on the GCAM is characterized by comprising the following steps of:

3) Performing coordinate mapping on the extracted runway area to obtain a final detection result of the high-resolution SAR image;

the geographic space context attention mechanism network GCAM comprises a coding block and a decoding block, the coding block comprises a residual network ResNet, a multi-scale extrusion pyramid MSP and an edge refinement module EDM, the residual network ResNet is used for carrying out feature extraction on an input data set to obtain primary features, the multi-scale extrusion pyramid MSP is used for obtaining global context information aiming at the primary features by operating different pooling convolution layers from different resolutions, the edge refinement module EDM is used for enhancing network edge extraction capacity aiming at the primary features, and the outputs of the multi-scale extrusion pyramid MSP and the edge refinement module EDM are further fused to obtain multi-level features; the decoding block is used for combining the primary characteristics and the multi-level characteristics to perform semantic segmentation of the runway area of the airport so as to extract the runway area;

The residual network ResNet is an improved residual network obtained by replacing common two-dimensional convolution with hole convolutions with hole rates of 2, 4, 8 and 16 on the basis of the residual network ResNet_101.

2. The automatic airport runway area detection method of high-resolution SAR image based on GCAM as set forth in claim 1, wherein the downsampling of the high-resolution SAR image in step 1) specifically means that the SAR image is downsampled 5 times by adopting a pixel value extraction method.

3. The automatic detection method for airport runway area of GCAM-based high-resolution SAR image according to claim 1, wherein said multi-scale extrusion pyramid MSP comprises a multi-receptive-field parallel pooling working layer consisting of a 1× void ratio of 1, and an effective attention module eSEThe method comprises the steps of 1 convolution, 3 multiplied by 3 convolutions with three void ratios of 6,12,18 respectively, a global average pooling module GAP and a stripe pooling module SP which are built in parallel; the striping pool module SP is for input size ofH×WUsing a banded pooling windowHX 1 is subjected to pooling operation in the horizontal direction, and a band-shaped pooling window is 1×WPooling operation is carried out in the vertical direction, element values in pooling cores are respectively averaged to obtain a striped pooled output in the horizontal direction and a striped pooled output in the vertical direction, then the striped pooled output in the horizontal direction and the striped pooled output in the vertical direction are respectively expanded in the left-right direction and the up-down direction by using two one-dimensional convolutions, the sizes of the two expanded feature graphs are the same, the two expanded feature graphs are fused, and finally the original data and the data obtained after Sigmoid processing are multiplied to obtain the final product H×WIs output by a two-dimensional characteristic tensor; the effective attention module eSE aims at the input characteristic diagramXiFeatures first learned by global averaging poolingF _avg Features are characterized byF _avg Obtaining a weight matrix through full connection layer processingW _C The weight matrixW _C Readjusting extracted channel attention features by Sigmoid functionA _eSE The channel attention feature is then followedA _eSE Feature map X applied to inputiObtaining a refined feature mapX _refine Finally, the feature map is refinedX _refine And performing feature rescreening to obtain global context information.

4. The automatic detection method of airport runway area of GCAM-based high-resolution SAR image according to claim 1, wherein said edge refinement module EDM comprises a global convolution module GCB for enhancing the intimate relation of feature map and pixel classification layer and the ability to process different resolution feature maps to obtain global information, an edge refinement module BR for improving the edge extraction ability of the encoded block from the global information; the global convolution module GCB comprisesk×kLarge convolution kernel and feature combinations of (a)A module, saidk×kComprises two paths, one pathk×1×c×cConvolution sum of (2)1×k×c×cIs convolved with another route1×k×c×cConvolution sum of (2)k×1×c×cIs a convolution of (1), wherein cFor the number of channels, the output results of the two channels are input into a feature combination module together to obtain featuresSum _W×H×C The method comprises the steps of carrying out a first treatment on the surface of the The edge refinement module BR is directed to featuresSum _W×H×C Sequentially processing by a small convolution kernel, an activation function and a small convolution kernel, and then superposing the processing result on the original characteristicSum _W×H×C Finally, the characteristic diagram after the edge of the runway area is refined is obtained.

5. The automatic detection method of airport runway area of high-resolution SAR image based on GCAM as claimed in claim 1, wherein the decoding block decodes the edge information of the characteristic map of the edge of the refined runway area obtained by carrying out 1×1 convolution dimension reduction and utilizing an edge refining module EDM, carries out bilinear 4 times up sampling, then connects the preliminary characteristic output by a residual network ResNet after carrying out 1×1 convolution dimension reduction with the result obtained by bilinear 4 times up sampling, applies a 3×3 convolution to refine the characteristic, and finally carries out a simple bilinear 4 times up sampling, thus obtaining the final segmentation result.

6. A GCAM-based high resolution SAR image airport runway area automatic detection system, comprising:

A runway region extraction program unit for inputting the medium resolution image into a geospatial context attention mechanism network GCAM to extract a runway region; the geographic space context attention mechanism network GCAM comprises a coding block and a decoding block, the coding block comprises a residual network ResNet, a multi-scale extrusion pyramid MSP and an edge refinement module EDM, the residual network ResNet is used for carrying out feature extraction on an input data set to obtain primary features, the multi-scale extrusion pyramid MSP is used for obtaining global context information aiming at the primary features by operating different pooling convolution layers from different resolutions, the edge refinement module EDM is used for enhancing network edge extraction capacity aiming at the primary features, and the outputs of the multi-scale extrusion pyramid MSP and the edge refinement module EDM are further fused to obtain multi-level features; the decoding block is used for combining the primary characteristics and the multi-level characteristics to perform semantic segmentation of the runway area of the airport so as to extract the runway area; the residual network ResNet is an improved residual network obtained by replacing common two-dimensional convolution with hole convolutions with hole ratios of 2, 4, 8 and 16 on the basis of the residual network ResNet_101;

7. GCAM-based high resolution SAR image airport runway area automatic detection system comprising a computer device comprising a microprocessor and a memory connected to each other, characterized in that the microprocessor is programmed or configured to perform the steps of the GCAM-based high resolution SAR image airport runway area automatic detection method of any of claims 1-5, or in that the memory has stored therein a computer program programmed or configured to perform the GCAM-based high resolution SAR image airport runway area automatic detection method of any of claims 1-5.

8. A computer readable storage medium having stored therein a computer program programmed or configured to perform the GCAM-based high resolution SAR image airport runway area automatic detection method of any of claims 1-5.