CN114332636A

CN114332636A - Polarized SAR building region extraction method, equipment and medium

Info

Publication number: CN114332636A
Application number: CN202210243785.XA
Authority: CN
Inventors: 胡粲彬; 陆圣涛; 项德良; 程建达; 孙晓坤
Original assignee: Beijing University of Chemical Technology
Current assignee: Beijing University of Chemical Technology
Priority date: 2022-03-14
Filing date: 2022-03-14
Publication date: 2022-04-12
Anticipated expiration: 2042-03-14
Also published as: CN114332636B

Abstract

The embodiment of the invention discloses a polarized SAR building region extraction method, equipment and a medium. The method comprises the following steps: acquiring a C matrix and a PWF result of the polarized SAR data to be processed; inputting the C matrix into a trained first depth semantic extraction network to generate multi-level polarization characteristics with different depth semantic information; inputting the PWF result into a trained second deep semantic extraction network to generate multi-level PWF characteristics with different depth semantic information; fusing the polarization characteristic and the PWF characteristic of the same level to obtain a multi-level double-path fusion characteristic; and performing inter-level fusion on the multi-level double-path fusion characteristics through upsampling, and generating a building area extraction result of the to-be-processed polarized SAR data according to the obtained inter-level fusion characteristics. The embodiment improves the accuracy of the extraction result.

Description

Polarized SAR building region extraction method, equipment and medium

Technical Field

The embodiment of the invention relates to the technical field of image identification and classification, in particular to a polarized SAR building region extraction method, equipment and medium.

Background

With the rapid development of urbanization, people's demand for obtaining urban environment information is increasing, and it becomes very important to judge whether a building exists and obtain the position information of the building. Compared with other sensors such as optical sensors and infrared sensors, Synthetic Aperture Radars (SAR) are not limited by conditions such as cloud, illumination and the like, and have sensing capability of penetrating through ground objects, so that the SAR image has advantages and practical significance for extracting the area of a building.

In the prior art, the building extraction based on the SAR image is mostly based on a scattering intensity threshold or a physical scattering mechanism, a large amount of data support is not needed, and the implementation method is simple. However, the method cannot represent semantic information of a higher level (namely, a larger depth), and the extraction result is rough for the buildings in the complex area with larger feature scale change; meanwhile, the difficulty of extracting the area of the building is increased due to multiplicative noise of the SAR image.

Disclosure of Invention

The embodiment of the invention provides a polarized SAR building region extraction method, equipment and medium, which can be used for acquiring comprehensive target electromagnetic scattering information through polarized SAR data and improving the extraction precision.

In a first aspect, an embodiment of the present invention provides a polarized SAR building region extraction method, including:

acquiring a C matrix and a Polarized Whitening Filter (PWF) result of polarized SAR data to be processed;

inputting the C matrix into a trained first depth semantic extraction network to generate multi-level polarization characteristics with different depth semantic information; inputting the PWF result into a trained second deep semantic extraction network to generate multi-level PWF characteristics with different depth semantic information;

fusing the polarization characteristic and the PWF characteristic of the same level to obtain a multi-level double-path fusion characteristic; wherein the polarization features and PWF features of the same level are the same size;

and performing inter-level fusion on the multi-level double-path fusion characteristics through upsampling, and generating a building area extraction result of the to-be-processed polarized SAR data according to the obtained inter-level fusion characteristics.

In a second aspect, an embodiment of the present invention further provides an electronic device, where the electronic device includes:

one or more processors;

a memory for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the polarized SAR building region extraction method of any of the embodiments.

In a third aspect, an embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the polarized SAR building region extraction method according to any embodiment of the present invention is implemented.

The technical effects of the embodiment of the invention are as follows:

1. in the embodiment, the polarized SAR data is adopted to acquire comprehensive target electromagnetic scattering information, deep learning is applied to a polarized SAR data building region extraction task, internal scattering information of building structures in different polarization modes is extracted through a C matrix, boundary information of the building regions is extracted through a PWF result, boundaries, positions and structural details of the building regions can be extracted in multiple aspects, and extraction accuracy is improved.

2. In the embodiment, a two-way deep semantic extraction network is adopted to extract multi-level two-way characteristics of the polarized SAR data independently; then, performing two-way fusion of the features through feature connection, so that the fusion of the internal structural features and the boundary features of the building can be embodied in the fusion features; the characteristics are fused in multiple levels by utilizing the upsampling, so that the characteristics of different scales can be reflected in the fused characteristics, and the position and the shape of a building area can be accurately extracted;

3. in the embodiment, the DenseNet is used as a backbone network of the deep semantic extraction module, and the dense connection structure is used for extracting the texture feature information of a deeper layer of a building while improving the network operation efficiency, so as to promote the fine extraction of the polarized SAR building region; the multiplexing of the characteristics in the DenseNet and the fusion of the multi-level characteristics in the sampling stage of the whole network solve the problem of multi-scale distribution of the area of the building.

4. The method can realize rapid extraction of the complex large-scene polarized SAR data building region, and has superiority in the aspects of visual effect, building region extraction precision and efficiency. .

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings can be obtained by those skilled in the art without creative efforts.

Fig. 1 is a flowchart of a polarized SAR building region extraction method provided in an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a network model for polarized SAR building region extraction according to an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of a deep semantic extraction network according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a dense convolution module provided by an embodiment of the present invention;

FIG. 5 is a diagram illustrating a same-level two-way feature fusion provided by an embodiment of the invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present invention, it should be noted that the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of description and simplicity of description, but do not indicate or imply that the device or element being referred to must have a particular orientation, be constructed and operated in a particular orientation, and thus, should not be construed as limiting the present invention. Furthermore, the terms "first," "second," and "third" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.

In the description of the present invention, it should also be noted that, unless otherwise explicitly specified or limited, the terms "mounted," "connected," and "connected" are to be construed broadly, e.g., as meaning either a fixed connection, a removable connection, or an integral connection; can be mechanically or electrically connected; they may be connected directly or indirectly through intervening media, or they may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.

Fig. 1 is a flowchart of a polarized SAR building region extraction method provided in an embodiment of the present invention, which is suitable for a case of identifying a building region in SAR data, and the embodiment is executed by an electronic device. With reference to fig. 1, the method provided in this embodiment specifically includes:

and S10, acquiring a C matrix and a PWF result of the polarized SAR data to be processed.

The polarized SAR data in this embodiment may be understood as a SAR image composed of polarized SAR data. In this embodiment, the polarized SAR data to be processed is processed, and a building area in the polarized SAR data is extracted.

The C-matrix of the polarized SAR data can provide internal scattering information of the building structure for extracting internal structural information of the building. Specifically, the scattering mechanism of a building is mainly secondary scattering of a dihedron structure composed of vertical walls and the ground, and the same building shows different backscattering information for different polarization modes. Compared with homopolarity (HH/VV), cross polarization (HV/VH) has strong permeability, natural objects generate strong backscattering to the homopolarity, and the imaging shows the property of high brightness on an image.

On one hand, the polarization characteristic of the SAR data may cause target missing detection in various scenes, for example, for a complex large scene SAR image with a building made of various materials, the building has a multi-scale characteristic, and some small areas and low-rise building areas with weak backscattering are easy to miss detection; for example, a ship, vegetation, or mountain area in a port has a backscatter intensity and texture similar to those of a building, and is likely to cause false detection. On the other hand, in the aspect of structural information judgment, it is difficult to accurately judge the types of ground features such as low shrubs and building areas by using only single-polarized SAR images.

For the above two reasons, the present application uses the C matrix of polarized SAR data to provide polarized scattering information for different polarization modes. The multi-polarization property of the SAR data can more comprehensively acquire the electromagnetic scattering information of the target.

The PWF result enables noise in the SAR data to be suppressed, wherein the noise comprises multiplicative noise, the edge and structural texture features of the target area are enhanced, and the boundary information of the building area is extracted. Polarized Whitening Filtering (PWF) is a target enhancement method specially applied to polarized SAR data, and filtering polarized SAR data by using a sliding window method is a contrast enhancement process of a central pixel target and a clutter target.

S20, inputting the C matrix into a trained first depth semantic extraction network to generate multi-level polarization characteristics with different depth semantic information; and inputting the PWF result into a trained second deep semantic extraction network to generate multi-level PWF features with different depths of semantic information.

The first deep semantic extraction network and the second deep semantic extraction network are used for extracting deep semantic features of the input features, and the network structures and the network parameters of the first deep semantic extraction network and the second deep semantic extraction network can be the same or different. Fig. 2 is a schematic structural diagram of a network model for polarized SAR building region extraction provided by an embodiment of the present invention, and shows main links of the SAR building region extraction method in the form of the network model. Referring to fig. 2, the first deep semantic extraction network and the second deep semantic extraction network form a two-way network, and deep semantic extraction is performed on the C matrix and the PWF result respectively to obtain a multi-level polarization feature and a multi-level PWF feature. Each level corresponds to different feature scale and feature depth; the higher the level, the larger the feature size, and the higher the feature depth.

It should be noted that the sizes of the same-level features output by the first deep semantic extraction network and the second deep semantic extraction network are the same, that is, the sizes of the polarization feature and the PWF feature of the same level are the same, which is convenient for performing two-way feature fusion of the same level.

Optionally, a backbone network of the first deep semantic extraction network is a DenseNet, including: a sequentially linked multilayer densnet structure, each layer of densnet structure comprising: a convolution and pooling module and a dense convolution module connected in sequence. Correspondingly, inputting the C matrix into a trained first deep semantic extraction network, and generating multi-level polarization features with different depths of semantic information, wherein the multi-level polarization features comprise: taking the C matrix as a previous layer of polarization characteristics, and taking a first layer of DenseNet structure as a current layer DenseNet structure; inputting the polarization characteristic of the previous layer into a convolution and pooling module of the DenseNet structure of the current layer to obtain a preliminary semantic characteristic of the current layer; inputting the current layer preliminary semantic features into a layer dense convolution module of the current layer DenseNet structure to obtain current layer polarization features; and taking the polarization characteristic of the current layer as the polarization characteristic of the previous layer, taking the DenseNet structure of the next layer as the DenseNet structure of the current layer, returning to the convolution and pooling module for inputting the polarization characteristic of the previous layer into the DenseNet structure of the current layer, and obtaining the preliminary semantic characteristic of the current layer until obtaining the polarization characteristics of all levels.

Optionally, a backbone network of the second deep semantic extraction network is DenseNet, and includes: a sequentially linked multilayer densnet structure, each layer of densnet structure comprising: a convolution and pooling module and a dense convolution module connected in sequence. Correspondingly, inputting the PWF result into a trained second deep semantic extraction network, and generating multi-level PWF features with different depth semantic information, wherein the multi-level PWF features comprise: taking the PWF result as a previous layer PWF characteristic, and taking a first layer DenseNet structure as a current layer DenseNet structure; inputting the PWF characteristics of the previous layer into a convolution and pooling module of a DenseNet structure of the current layer to obtain the preliminary semantic characteristics of the current layer; inputting the current layer preliminary semantic features into a layer dense convolution module of a current layer DenseNet structure to obtain current layer PWF features; and taking the PWF characteristics of the current layer as the PWF characteristics of the previous layer, taking the DenseNet structure of the next layer as the DenseNet structure of the current layer, returning to the convolution and pooling module for inputting the PWF characteristics of the previous layer into the DenseNet structure of the current layer, and obtaining the preliminary semantic characteristics of the current layer until the PWF characteristics of all layers are obtained.

Fig. 3 is a schematic structural diagram of a deep semantic extraction network according to an embodiment of the present invention. As shown in fig. 3, the deep semantic extraction network includes a 3-layer DenseNet structure. Each layer of DenseNet structure comprises: a convolution and pooling module and a dense convolution module connected in sequence. After the input characteristics are input into the convolution and pooling module with the first layer of DenseNet structure, the input characteristics sequentially pass through the dense convolution module with the first layer of DenseNet structure, the convolution and pooling module with the second layer of DenseNet structure and the dense convolution module, and the convolution and pooling module with the third layer of DenseNet structure and the dense convolution module, and the dense convolution module of each layer outputs one-level output characteristics.

If the deep semantic extraction network in fig. 3 is the first deep semantic extraction network, the input features are C matrices, and the three-layer output features are three-layer polarization features. If the deep semantic extraction network in fig. 3 is the second deep semantic extraction network, the input features are PWF results, and the three-level output features are three-level PWF features.

And S30, fusing the polarization characteristic and the PWF characteristic of the same level to obtain a multi-level two-way fusion characteristic.

Optionally, the polarization feature and the PWF feature of each level are connected along the channel dimension, resulting in a two-way fusion feature of each level. Because the same-level features output by the first deep semantic extraction network and the second deep semantic extraction network have the same size, the polarization features and the PWF features of the same level have the same channel number, channel dimension connection can be performed, and double-path feature fusion of each level is realized. After the connection of the channel dimensions, the number of the channels of the two-channel fusion features of each level is the sum of the number of the channels of the same level features of the two-channel deep semantic extraction network.

And S40, performing inter-level fusion on the multi-level double-path fusion features through upsampling, and generating a building area extraction result of the to-be-processed polarized SAR data according to the obtained inter-level fusion features.

S30 completes the fusion of two-way features at the same level, and S40 first performs feature fusion at the level. Specifically, as the level increases, the size of the multi-level two-way fusion feature gradually decreases, and the resolution also gradually decreases. Therefore, the two-path fusion features of different levels are up-sampled by adopting an up-sampling method, so that the size of the sampled image is gradually increased to obtain the interlayer fusion feature, and the size of the interlayer fusion feature is the same as that of the polarized SAR data to be processed.

And then, performing pixel level prediction according to the interlayer fusion characteristics to obtain a building region extraction result with the same size as the to-be-processed polarized SAR data.

The technical effects of the embodiment are as follows:

4. The method can realize rapid extraction of the complex large-scene polarized SAR data building region, and has superiority in the aspects of visual effect, building region extraction precision and efficiency.

On the basis of the above embodiment and the following embodiment, the present embodiment refines the deep semantic network. Referring to fig. 2, the modules of the deep semantic network will be described in the processing order of the input features.

Convolution and pooling module of first layer DenseNet structure

Optionally, the convolution and pooling module of the first layer DenseNet structure comprises: a first convolution module and a first pooling module connected in series, the first convolution module using a large convolution kernel. For example, a 5 × 5 convolution.

The convolution and pooling module of the first layer DenseNet structure is used for preliminarily extracting semantic information of polarization data, obtaining a large-range receptive field by using a large convolution kernel, obtaining global characteristics and preliminarily obtaining an approximate range of a building target area.

(II) dense convolution module

Optionally, each dense convolution module comprises densely connected multi-layer convolution modules, each convolution module comprising: a 1 × 1 convolution module and a 3 × 3 convolution module connected in sequence; the 1 × 1 convolution module includes: batch Normalization (BN), Linear rectification function (Rectified Linear Unit, ReLU) and 1 × 1 convolution of sequential connections; the 3 × 3 convolution module includes: BN, ReLU, and 3 × 3 convolutions connected in sequence.

The dense convolution module is used for extracting depth features, effectively utilizes feature multiplexing, strengthens feature transfer, simultaneously alleviates the problem of gradient disappearance, greatly reduces the number of parameters and improves the network efficiency.

Particularly, the dense convolution module starts from the characteristics to improve the effect of the convolution neural network, and achieves better effect and fewer parameters through extreme utilization of the characteristics. Fig. 3 is a schematic structural diagram of a dense convolution module according to an embodiment of the present invention. As shown in fig. 3, in a conventional convolutional neural network, if there are M layers, there are M connections, but in a dense convolutional module, there are M (M +1)/2 connections, and each layer takes the feature map output by all the previous layers as input, that is:

(1)

wherein the content of the first and second substances,ma presentation layer;x _mto representmOutputting the layer;Jrepresenting a non-linear transformation;

and the characteristic diagram of the output of the 0-to-m-1 layers is subjected to channel merging connection. The nonlinear transformation includes BN, ReLU and convolution (Conv) operations, abbreviated BN-ReLu-Conv. Optionally, the nonlinear transformation comprises: BN-ReLu-Conv (1 x 1) and BN-ReLU-Conv (3 x 3) which are connected in sequence, namely a 1 x 1 convolution module is added before each 3 x 3 convolution module for reducing the number of feature maps and reducing the dimension by channelsAnd calculating quantity and fusing the characteristics of each channel efficiently.

In addition, the dense convolution module also reduces the problem of gradient disappearance, greatly reduces the number of parameters and improves the network efficiency. In particular, the problem of gradient vanishing is easier to occur when the network is deeper, because the input information and the gradient information are transmitted among a plurality of layers, and the dense convolution module densely connects the characteristics, namely, each layer is directly connected with the input and the target loss function, so that the situation of gradient vanishing is greatly reduced while the network depth is deepened. The dense convolution module directly connects all layers on the premise of ensuring the maximum information transmission between the layers in the network, thereby more effectively utilizing the feature multiplexing and strengthening the feature transmission. Meanwhile, the network is narrowed in the channel dimension, the number of parameters is greatly reduced, and the network operation efficiency is improved.

(III) the firstnConvolution and pooling module of layer DenseNet structure(s) (II)nIs a natural number greater than 1)

Optionally, anThe convolution and pooling module of the layer DenseNet structure comprises: first, thenConvolution module and the secondnA pooling module ofnThe convolution module comprises sequentially connected convolutions of BN, ReLU and 1 × 1, the second onenThe pooling module includes 2 x 2 average pooling.

First, thenThe convolution and pooling module of the layer DenseNet structure is used for further compressing parameters, reducing the number of characteristic diagram channels and realizing further characteristic fusion while reducing dimensions.

Specifically, the firstnThe convolution and pooling module of the layer DenseNet structure is positioned between every two dense convolution modules and comprises a BN-ReLu-Conv (1 multiplied by 1) convolution operation and a 2 multiplied by 2 average pooling operation, so that the parameter number is compressed while high-level semantic information is extracted, and the network operation efficiency is improved. The convolution of 1 multiplied by 1 realizes the compression of the number of characteristic image channels, and realizes the further characteristic fusion while reducing the dimension. The 2 x 2 average pooling operation reduces the resolution of the feature map, reduces the image size, obtains high-level semantic features of the data, and prepares for subsequent connection and upsampling of the same-level features.

It should be noted that the network structure provided in any of the above embodiments is applicable to the first deep semantic extraction network and is also applicable to the second deep semantic extraction network. When the two-way deep semantic extraction network designed by the invention has the same network structure, the network parameters are still mutually independent, and can be the same or different.

On the basis of the above-described embodiment and the following-described embodiment, the present embodiment refines the processes of feature fusion and region extraction.

Two-way feature fusion

Optionally, the polarization feature and the PWF feature of the same level are fused to obtain a multi-level two-way fusion feature, including: and connecting the polarization characteristic and the PWF characteristic of each level along the channel dimension to obtain a two-way fusion characteristic of each level.

The step completes the fusion of double-path characteristics of the same level, and the specific fusion method comprises the following steps:

wherein the content of the first and second substances,xrepresenting the number of channels of the polarized SAR data to be processed,x ₁representing the number of channels of the polarization feature at any level,x ₂a number of channels representing PWF features of the any one level,x ₁₊ x ₂a channel number representing a two-way fusion feature of the any one hierarchy;Feature _D1(x _→ x ₁) Representing the polarization characteristics of any of the levels,Feature _D2(x _→ x ₂) A PWF feature representing the any of the levels,Feature _out(x _→ x ₁+x ₂) A two-way fusion feature representing the any one hierarchy;

representing feature map channel stacking, i.e., connections along the channel dimension.

FIG. 5 is a diagram illustrating a two-way feature fusion at the same level according to an embodiment of the present invention. As shown in fig. 5, after the channel level fusion, the number of feature map channels is the sum of the number of feature channels at the same level of the two deep semantic extraction modules. Assuming that the number of channels of the feature maps of different levels obtained by the first deep semantic extraction network and the second deep semantic extraction network is [64, 128, 256, 512] and [16, 32, 64, 128], the number of channels of the feature maps of different levels after fusion is [80, 160, 320, 640 ]. Temporarily storing the fused different-level feature maps in the feature connection module at the same level to prepare for a feature superposition module at a subsequent upsampling stage.

(II) Multi-level feature fusion

Optionally, the size of the multi-level two-way fusion feature gradually decreases with the increase of the level, and accordingly, the multi-level two-way fusion feature is subjected to inter-level fusion through upsampling, including: taking the last layer of double-path fusion characteristics as the current layer of double-path fusion characteristics; performing an up-sampling operation on the two-way fusion feature of the current layer to obtain a two-way fusion feature after sampling, wherein the two-way fusion feature after sampling has the same size as the two-way fusion feature of the previous layer; connecting the sampled two-way fusion feature with the previous two-way fusion feature along a channel dimension, taking the connected two-way fusion feature as a two-way fusion feature of a current layer, returning to the step of performing up-sampling operation on the two-way fusion feature of the current layer to obtain the sampled two-way fusion feature until the previous two-way fusion feature is the first two-way fusion feature; and obtaining an interlayer fusion characteristic according to the final sampled two-way fusion characteristic, wherein the interlayer fusion characteristic has the same size as the to-be-processed polarimetric SAR data.

And mapping the low-resolution feature map of the semantic extraction stage to the size of original input data by adding the upsampling and the fusion features of different levels to prepare for a subsequent pixel-level classifier.

In one embodiment, if it is the firstnThe pooling module comprises a 2 × 2 average pooling operation, and the size of the multi-level two-way fusion feature is reduced according to the proportion of 1/2The above up-sampling operation employs a 2-fold deconvolution operation. Accordingly, when multi-level feature fusion is performed, the last two-way fusion feature F1 with the smallest size is subjected to up-sampling operation and convolution operation, and the sampled two-way fusion feature F11 is 2 times as large as the last two-way fusion feature F1 and has the same size as the two-way fusion feature F2 of the penultimate layer. And (3) carrying out channel connection on the sampled two-way fusion feature F11 and the two-way fusion feature F2 of the penultimate layer, and then carrying out up-sampling operation and convolution operation, wherein the sampled two-way fusion feature F21 is 2 times of the two-way fusion feature F2 of the penultimate layer and has the same size as the two-way fusion feature F3 of the penultimate layer. The sampled two-way fusion feature F21 is channel-connected with the two-way fusion feature F3 of the third last layer, and then an up-sampling operation and a convolution operation are performed. And repeating the steps until the sampled two-way fusion feature and the first-layer two-way fusion feature are subjected to channel fusion to obtain the final sampled two-way fusion feature. And then, continuously performing up-sampling operation on the finally sampled two-path fusion characteristic to obtain the interlayer fusion characteristic, wherein the size of the interlayer fusion characteristic is the same as that of the polarized SAR data to be processed.

Optionally, the upsampling operation comprises a 2-fold deconvolution operation and a convolution operation, which are sequentially connected, the convolution operation being BN-Conv (3 × 3) -ReLU. Conv (3 × 3) serves to further fuse the features of the superimposed channels.

(III) extraction of building areas

Optionally, generating a building region extraction result of the to-be-processed polarimetric SAR data according to the obtained interlayer fusion feature, including: performing 1 × 1 convolution operation on the interlayer fusion features to obtain single-channel fusion features; performing 1 × 1 convolution operation on the obtained interlayer fusion features to obtain single-channel fusion features; and adopting a pixel-level classifier to perform Sigmoid normalization operation on the single-channel fusion features, and predicting the probability that each pixel belongs to a building area.

Specifically, the dimension of a channel is reduced through 1 × 1 convolution, a single-channel fusion feature is output, and a double-path feature and a multi-level feature are fused in the feature, so that the internal structural feature, the boundary feature and the multi-scale feature of a building area are embodied. And then, a pixel-level classifier is adopted to carry out Sigmoid normalization operation on the single-channel fusion features, the overall problem is converted into a pixel-by-pixel probability prediction problem, and a prediction probability graph with each pixel value being 0-1 is generated.

Optionally, after predicting the probability that each pixel belongs to a building area, the resulting feature of the same size as the polarized SAR data to be processed is obtained according to the following formula:

wherein the content of the first and second substances,y' represents the prediction probability of each pixel point;Resultthe classification result of each pixel point is represented,Result =1 indicates that the pixel belongs to a building area,Result and =0 indicates that the pixel belongs to the background area. The embodiment distinguishes the semantic category of the region to which each pixel belongs according to the probability value. After Sigmoid normalization, setting the pixel points with the probability value larger than 0.5 as 1, and setting the other pixel points as 0, and finally obtaining a binary result graph extracted from the polarized SAR data building region.

On the basis of the above embodiment and the following embodiment, the present embodiment refines the training process of the entire network model. Optionally, before inputting the C matrix into the trained first deep semantic extraction network, the method further includes: according to Google earth optical images of the same region, pixel-level labeling is carried out on each group of polarized SAR training data in a plurality of groups of polarized SAR training data so as to distinguish a building region from a background region; and training the original first depth semantic extraction network and the original second depth network by using the marked multiple groups of polarized SAR training data, so that the extraction result of the area of the building obtained by training is consistent with the marking result.

The original first deep semantic extraction network and the original second deep semantic extraction network refer to deep semantic extraction networks with untrained network parameters. In this embodiment, a whole network model formed by the polarized SAR building region extraction method provided in the above embodiment is trained by using labeled multiple groups of polarized SAR training data to update network parameters of the first deep semantic extraction network and the second deep semantic extraction network, so that a building region extraction result obtained by the trained whole network model is consistent with a labeling result.

Specifically, the training process of the whole network model is divided into the following stages:

(I) training data labeling

And carrying out pixel-level building region labeling on the multiple groups of polarized SAR training data, wherein each group of polarized SAR training data represents an SAR image. Due to an SAR scattering imaging mechanism, the scattering intensity of buildings in a complex scene is different, and the region of the building is difficult to finely interpret from the image. Therefore, in the data annotation process, pixel level annotation is carried out according to the optical image of Google Earth (Google Earth) in the same region by referring to the optical satellite image, a building region is annotated 1, a background region is annotated 0, and an annotation result graph is generated.

(II) training data preprocessing

And carrying out preprocessing such as geometric correction and denoising on each group of polarized SAR training data to obtain a C matrix of each group of polarized SAR training data. And performing PWF processing on each group of polarized SAR training data to obtain a PWF result of each group of polarized SAR training data.

(III) network model training

Inputting the C matrix of each group of polarized SAR training data into an original first depth semantic extraction network to generate multi-level polarization characteristics with different depth semantic information; and inputting the PWF result of each group of polarized SAR training data into an original second depth semantic extraction network to generate multi-level PWF characteristics with different depth semantic information.

Fusing the polarization characteristic and the PWF characteristic of the same level to obtain a multi-level double-path fusion characteristic; wherein the polarization features and PWF features of the same level are the same size.

And performing inter-level fusion on the multi-level double-path fusion characteristics through upsampling, and generating the pixel-level prediction probability of each group of polarized SAR training data according to the obtained inter-level fusion characteristics.

According to the pixel level prediction probability and the labeling result of each group of polarized SAR training data, the following loss functions are constructed:

wherein the content of the first and second substances,yand =1 or 0, indicating the labeling result. The function improves cross entropy loss, and can solve the problem of unbalanced number of positive and negative samples. Specifically, the number of building areas and background samples is widely different, and thus the parameters are setηThe weight of the positive and negative samples to the total loss is controlled, and the weight of the positive samples is relatively improved if the number of the positive samples is small, so that the influence caused by sample imbalance is reduced. Modulation factorαMaking the model more focused on difficult samples.

The loss function shown in formula (4) is adopted in the embodiment, so that the problem of uneven distribution of the building area and the background sample is effectively solved.

Alternatively,η=0.8，αand (2). The whole training process of the network model can be carried out on a Ubuntu operating system, a GeForce RTX 2080Ti GPU. The training process adopts random gradient descent to realize optimization, the initial learning rate =0.001, batch processing with the size of 4 is used, and the network parameter is attenuated and iterated for 100 rounds. The present embodiment can also be written using a pytorech deep learning framework.

Optionally, the present embodiment evaluates the performance of the polarized SAR building region extraction method provided in any of the above embodiments by using F1-Score. Specifically, in this embodiment, each pixel point is classified as a sample (including a positive sample and a negative sample), and the following situations may occur:

TruePositive: classifying what is originally a positive sample into a positive sample;

FalsePositive: classifying what is originally a negative sample into a positive sample;

FalseNegative: classification of otherwise positive samples into negative samples;

TrueNegative: classification of inherently negative examples into negative examples。

To the accuracy rate (Precision) And recall ratio (Recall) Is defined as follows:

F-Score is defined as follows:

so F1-Score is justδF-Score when =1, i.e. recall rate is as important as accuracy, the specific formula is as follows:

the higher the F1-Score, the better the performance of the frame.

Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention, as shown in fig. 6, the electronic device includes a processor 50, a memory 51, an input device 52, and an output device 53; the number of processors 50 in the device may be one or more, and one processor 50 is taken as an example in fig. 6; the processor 50, the memory 51, the input device 52 and the output device 53 in the apparatus may be connected by a bus or other means, as exemplified by the bus connection in fig. 6.

The memory 51 is a computer-readable storage medium, and can be used for storing software programs, computer-executable programs, and modules, such as program instructions/modules corresponding to a method for extracting a polarized SAR building region in an embodiment of the present invention. The processor 50 executes various functional applications and data processing of the device by running software programs, instructions and modules stored in the memory 51, so as to implement the above-mentioned polarized SAR building region extraction method.

The memory 51 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the terminal, and the like. Further, the memory 51 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some examples, the memory 51 may further include memory located remotely from the processor 50, which may be connected to the device over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input device 52 is operable to receive input numeric or character information and to generate key signal inputs relating to user settings and function controls of the apparatus. The output device 53 may include a display device such as a display screen.

Embodiments of the present invention further provide a computer-readable storage medium on which a computer program is stored, where the computer program is executed by a processor to implement a polarized SAR building region extraction method according to any of the embodiments.

Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Python, Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

Finally, it should be noted that: the above embodiments are only used to illustrate the technical solution of the present invention, and not to limit the same; while the invention has been described in detail and with reference to the foregoing embodiments, it will be understood by those skilled in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions deviate from the technical solutions of the embodiments of the present invention.

Claims

1. A polarized SAR building region extraction method is characterized by comprising the following steps:

acquiring a C matrix and a polarized whitening filtering PWF result of polarized SAR data to be processed;

2. The method of claim 1, wherein the backbone network of the first deep semantic extraction network is DenseNet, and the method comprises: a sequentially linked multilayer densnet structure, each layer of densnet structure comprising: a convolution and pooling module and a dense convolution module connected in sequence:

inputting the C matrix into a trained first depth semantic extraction network, and generating multi-level polarization characteristics with different depth semantic information, wherein the multi-level polarization characteristics comprise the following steps:

taking the C matrix as a previous layer of polarization characteristics, and taking a first layer of DenseNet structure as a current layer DenseNet structure;

inputting the polarization characteristic of the previous layer into a convolution and pooling module of the DenseNet structure of the current layer to obtain a preliminary semantic characteristic of the current layer;

inputting the current layer preliminary semantic features into a layer dense convolution module of the current layer DenseNet structure to obtain current layer polarization features;

and taking the polarization characteristic of the current layer as the polarization characteristic of the previous layer, taking the DenseNet structure of the next layer as the DenseNet structure of the current layer, returning to the convolution and pooling module for inputting the polarization characteristic of the previous layer into the DenseNet structure of the current layer, and obtaining the preliminary semantic characteristic of the current layer until obtaining the polarization characteristics of all levels.

3. The method of claim 2, wherein each dense convolution module comprises densely connected multi-layer convolution modules, each convolution module comprising: a 1 × 1 convolution module and a 3 × 3 convolution module connected in sequence;

the 1 × 1 convolution module includes: sequentially connected batch normalization BN, linear rectification function ReLU and 1 multiplied by 1 convolution;

the 3 × 3 convolution module includes: BN, ReLU, and 3 × 3 convolutions connected in sequence.

4. The method of claim 2, wherein the convolution and pooling module of the first layer DenseNet structure comprises: a first convolution module and a first pooling module connected in series, the first convolution module using a 5 x 5 convolution kernel;

first, thenThe convolution and pooling module of the layer DenseNet structure comprises: first, thenConvolution module and the secondnA pooling module ofnThe convolution module comprises sequentially connected convolutions of BN, ReLU and 1 × 1, the second onenThe pooling module includes 2 x 2 average pooling,nis a natural number greater than 1.

5. The method of claim 1, wherein fusing polarization features and PWF features of the same level to obtain a multi-level two-way fusion feature comprises:

and connecting the polarization characteristic and the PWF characteristic of each level along the channel dimension to obtain a two-way fusion characteristic of each level.

6. The method of claim 1, wherein the size of the multi-level two-way fusion feature gradually decreases with increasing levels;

performing inter-hierarchy fusion on the multi-hierarchy two-way fusion features through upsampling, including:

taking the last layer of double-path fusion characteristics as the current layer of double-path fusion characteristics;

performing an up-sampling operation on the two-way fusion feature of the current layer to obtain a two-way fusion feature after sampling, wherein the two-way fusion feature after sampling has the same size as the two-way fusion feature of the previous layer;

connecting the sampled two-way fusion feature with the previous two-way fusion feature along a channel dimension, taking the connected two-way fusion feature as a two-way fusion feature of a current layer, returning to the step of performing up-sampling operation on the two-way fusion feature of the current layer to obtain the sampled two-way fusion feature until the previous two-way fusion feature is the first two-way fusion feature;

and obtaining an interlayer fusion characteristic according to the final sampled two-way fusion characteristic, wherein the interlayer fusion characteristic has the same size as the to-be-processed polarimetric SAR data.

7. The method according to claim 1, wherein generating a building region extraction result of the to-be-processed polarized SAR data according to the obtained interlayer fusion features comprises:

performing 1 × 1 convolution operation on the obtained interlayer fusion features to obtain single-channel fusion features;

and adopting a pixel-level classifier to perform Sigmoid normalization operation on the single-channel fusion features, and predicting the probability that each pixel belongs to a building area.

8. The method of claim 1, wherein before inputting the C matrix into the trained first deep semantic extraction network, further comprising:

according to Google earth optical images of the same region, pixel-level labeling is carried out on each group of polarized SAR training data in a plurality of groups of polarized SAR training data so as to distinguish a building region from a background region;

and training the original first depth semantic extraction network and the original second depth network by using the marked multiple groups of polarized SAR training data, so that the extraction result of the area of the building obtained by training is consistent with the marking result.

9. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs,

when executed by the one or more processors, cause the one or more processors to implement the polarized SAR building region extraction method of any of claims 1-8.

10. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the polarized SAR building region extraction method according to any one of claims 1 to 8.