CN115497006B

CN115497006B - Urban remote sensing image change depth monitoring method and system based on dynamic mixing strategy

Info

Publication number: CN115497006B
Application number: CN202211138291.1A
Authority: CN
Inventors: 滕旭阳; 林煜凯; 冯嘉旖; 蔡璐; 高永盛
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2022-09-19
Filing date: 2022-09-19
Publication date: 2023-08-01
Anticipated expiration: 2042-09-19
Also published as: CN115497006A

Abstract

The invention discloses a method and a system for monitoring urban remote sensing image change depth based on a dynamic mixing strategy, wherein the method comprises the following steps: s1, preprocessing urban remote sensing images, and marking urban areas of different categories to obtain a data set; s2, adopting a deep LabV3+ network model of a dynamic mixing pooling strategy based on Xreception as a backbone network, and training the network by utilizing the data set of the step S1; s3, carrying out same-proportion clipping on remote sensing images of the same city region at different times, inputting the remote sensing images into a trained network model, and dividing the images; and S4, after obtaining the regional classification result of the urban remote sensing image, calculating the change degree of each regional category in a period of time and labeling the change on the map. According to the invention, different pooling modes are dynamically selected for pooling the characteristic images of each layer of the remote sensing image, so that the global information and the local information of the image can be better grasped, and the segmentation precision is improved.

Description

Urban remote sensing image change depth monitoring method and system based on dynamic mixing strategy

Technical Field

The invention belongs to the technical field of satellite remote sensing image processing, and particularly relates to a method and a system for monitoring urban remote sensing image change depth based on a dynamic mixing strategy.

Background

The high-resolution remote sensing image semantic segmentation is a basic task in the field of remote sensing images, and the main task is to utilize a computer to select characteristic information by analyzing the color, spectral information and spatial information of various targets in the observed remote sensing image, classify each pixel in the image and segment out the regional outline between the target images. The urban remote sensing image mainly comprises buildings, roads, green planting coverage areas and the like. The accurate segmentation and change detection of the urban remote sensing image can analyze the index details of each part of the building, the green plant seasonal change, the disaster detection, the vegetation distribution change and the like in urban management planning, and further provide basis and help for comprehensively grasping the urban layout and the real-time dynamic change thereof.

In recent decades, with the development of computer artificial intelligence vision technology, the resolution of satellite remote sensing images is higher and higher, and the capability of processing images is greatly improved, so that the method has important significance for developing academic research and guiding production practice. Compared with the common image applied in the prior art, the remote sensing image has the characteristics of high precision, abundant context information, wider visual field range, capability of real-time dynamic monitoring and the like, is applied in various fields in a large scale and gradually shows a fine development trend. It is important to accurately extract key information from the remote sensing image. The conventional remote sensing image segmentation technology, such as a region segmentation-based method, an edge detection segmentation method, shadow analysis and the like, generally relies on manual design to extract features, and has poor generalization when region segmentation is performed on a complex scene.

The deep learning algorithm based on edge detection performs semantic segmentation on the high-altitude remote sensing image, valuable information in the image can be better extracted, segmentation accuracy of small-size targets in the remote sensing image can be improved, and data support can be provided for urban planning, desert detection management, urban greening management and water area supervision and the like. However, in an actual remote sensing image, due to complex environments with different conditions, such as various terrains, the target objects are mutually shielded, the city building types are rich, and the influence of a plurality of factors such as illumination, cloud cover and the like, the problems of great reduction of the accuracy of object edge segmentation details and blurred segmentation boundaries exist.

Therefore, it is very important to design a urban remote sensing image change detection method based on a dynamic mixing strategy, which can reduce the influence of the topographic environmental factors so as to improve the regional edge segmentation precision.

Disclosure of Invention

The invention aims to improve the defects in the technology and provides a urban remote sensing image change depth detection method based on a deep LabV3+ multi-scale network model by combining various mixed pooling strategies.

The invention adopts the following technical scheme:

the urban remote sensing image change depth monitoring method based on the dynamic mixing strategy comprises the following steps:

s1, preprocessing an urban remote sensing image, and marking areas of different categories in the city to obtain a data set;

s2, based on Xreception as a backbone network, adopting a network model of a dynamic mixing pooling strategy, and training the network by utilizing the data set in the step S1;

s3, carrying out same-proportion clipping on remote sensing images of the same city region at different times, inputting the remote sensing images into a trained network model, and dividing the images;

and S4, after obtaining the regional classification result of the urban remote sensing image, calculating the change degree of each regional category in a period of time and labeling the change on the map.

Furthermore, the original remote sensing image of the urban area adopts an RSSCN7DataSet remote sensing image DataSet.

In step S1, the original remote sensing image is cut and semantically segmented into 256 x 256 images as preprocessing of data, and the cut images are marked by using a Labelme semantic segmentation marking tool and are classified into five categories of roads, buildings, water areas, green plants and air spaces.

Furthermore, labelme technology is an open source image labeling tool, which is mainly used for data set labeling work of example segmentation, semantic segmentation, target detection and classification tasks.

Further, since the sign features of each regional category in the urban area are generally more remarkable, we directly segment and label the preprocessed remote sensing image by using different colors, and then select the training data set by adopting a random selection mode.

Further, in step S2, the dataset is trained using a hybrid pooled deep labv3+ network model. The deep LabV3+ network model fuses multi-scale information, divides a model structure into a coding layer and a decoding layer, and introduces Xattention as a backbone network. The backbone network divides the method of applying serial hole convolution to the result into two modules, wherein one part is directly transmitted into the decoding layer, and the other part passes through the ASPP module. The method specifically comprises the following steps:

s21, at the encoding layer, extracting characteristic information of the detected urban remote sensing image through a backbone network, and sequentially changing the image size into 1/4,1/8 and 1/16 of the original image size.

S22, obtaining a 60 x 2048 feature map from a backbone network, entering a cavity space convolution pooling pyramid ASPP, wherein a general ASPP module consists of a 1*1 convolution layer, three 3*3 expansion convolution layers with the cavity rates of 6, 12 and 18 and a global average pooling layer, and in order to better improve the accuracy of network model edge segmentation, based on the structure, the global average pooling in the ASPP module is replaced by a mixing pooling strategy adopted in the structure, the input feature map is mixed and pooled to obtain a feature map with the size of 20 x 20, the channel compression is carried out through the convolution of 1*1, the three expansion convolution layers are restored to the height and the width of the input feature map through a deconvolution method, and the results obtained in all the parts are spliced and fused.

S23, merging the mixed result of the low-level features output by the corresponding level of the backbone network and the coding end at the decoding end, and recovering the prediction result of the resolution of the input image after up-sampling by the 3*3 convolution kernel, so that a result image of urban remote sensing image classification is obtained, and finally the precision of network model segmentation is improved.

S24, training the network model by using the RSSCN7DataSet remote sensing image DataSet in the dynamic mixing pooling strategy deep LabV3+ network model, and using the rest samples as test datasets for testing the network model.

Further, the hybrid pooling strategy in step S22 needs to optimize the 2048-layer feature map:

selecting the frequency alpha of maximum pooling in the k-th layer feature map _k SelectingSelect average pooled frequency beta _k And selecting a random pooled frequency gamma _k The method comprises the following steps:

the method comprises the steps of selecting a maximum pooling method frequency from a k-th layer characteristic diagram, selecting an average pooling method frequency from the k-th layer characteristic diagram, selecting a random pooling method frequency from an i_sto-th layer characteristic diagram, and selecting a training set size of the optimized k-th layer characteristic diagram by i_total.

Mixed pooling result output of k-th layer feature map output in final network model _k The method comprises the following steps:

wherein x is _{k_max} Representing the result of the k-layer characteristic diagram after adopting the maximum pooling method, and x _{k_avg} Representing the result of the k-layer characteristic diagram after adopting an average pooling method, and x _{k_sto} And the result of the k-layer characteristic diagram after adopting a random pooling method is shown.

Further, the step of improving global average pooling in ASPP modules to dynamic hybrid pooling includes:

a. initializing, namely before training a deep LabV3+ network model, initializing the initialization weight alpha of each pooling strategy of each layer of feature map _k 、β _k And gamma _k Are all arranged as

b. Pooling the layer 1 feature images by using the methods of maximum pooling, average pooling and random pooling respectively, pooling the rest feature images by using a mixed pooling method, evaluating the advantages and disadvantages of different pooling strategies by calculating the average cross ratio (mIoU) of the predicted value and the true value, and taking the pooling mode with the maximum mIoU value as the selected pooling mode of the layer 1 feature images;

c. b, optimizing a pooling strategy of the 2 nd-2048 th layer characteristic diagram of the same input by adopting the method;

d. b, performing the operations of the steps b and c on all training set samples;

e. obtaining alpha of each layer of feature images in training set by adopting different pooling strategies _k 、β _k And gamma _k 。

Further, in step S22, an average cross-over ratio (mlou) is used to evaluate the merits of different pooling strategies, where mlou represents the ratio of the intersection and union of the predicted value and the true value of each class, and divided by the number of classes. The formula can be expressed as:

where k represents the number of non-empty categories, TP represents the number of real cases, FP represents the number of false positive cases, and FN represents the number of false negative cases.

Further, in step S2, the deep labv3+ network model uses a pixel-by-pixel cross entropy loss function softmax loss to process the multi-classification problem, and selects each pixel point as a sample, and calculates the cross entropy loss function of each pixel for the prediction category and the real category. The loss function softmax converts the plurality of outputs into probability values mapped within the (0, 1) interval for classification in the multi-classification process.

The output after softmax regression was:

wherein e is a natural constant of 2.71, p _i,j For the prediction probability of the ith sample for the jth class, l _i,j For the output of the neural network on the ith sample in the jth class, C is the number of original input classes.

The above equation changes the output to a probability distribution of (0, 1), and the distance between the predicted probability distribution and the probability distribution of the true value is calculated by the cross entropy loss function. The method comprises the following steps:

wherein N is the number of samples of the training set, C is the number of original input classes, Y (i) is the class to which the ith sample belongs, and w is the weight of the sample data.

In step S3, the remote sensing images of the same city area at different times are cut in the same proportion, and the cut images are input into the trained model.

Further, the step S4 specifically includes:

s41, calculating the cross ratio loss of the predicted value of the ith category in the urban remote sensing image of the same area at the moment of T1 and T2, wherein the cross ratio loss is used for representing the change of the ith category in the area and has the following formula:

wherein L is _i Indicating the size of the change in the ith region class, L _i The larger the area, the larger the TP indicating the change of the area _i Representing the number of real cases of the ith region class, FP _i FN, representing the number of false positive cases of the ith region class _i Representing the number of false counter examples for the ith zone category.

S42, splicing the segmented images in sequence.

S43, subtracting the urban remote sensing images spliced at the time of T1 and T2, eliminating pixel points of the same region category, obtaining a part only containing a change region, marking, and superposing the marked image with the original image, thereby displaying the urban change range on the remote sensing image.

The invention also discloses a system based on the urban remote sensing image change depth monitoring method, which comprises the following modules:

a data set acquisition module: preprocessing the urban remote sensing image, and labeling urban areas of different categories to obtain a data set;

training module: based on Xreception as backbone network, adopting a deep LabV3+ network model of dynamic mixing pooling strategy, and utilizing a data set training network;

and a segmentation module: the remote sensing images of the same city area at different times are cut in the same proportion, and are input into a trained model to be segmented;

and the marking module is used for: and after obtaining the regional classification result of the urban remote sensing image, calculating the change degree of each regional category in a period of time and labeling the change on the map.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the invention, different pooling modes are dynamically selected for pooling the characteristic images of each layer of the remote sensing image, so that the global information and the local information of the image can be better grasped, and the segmentation precision is improved.

2. According to the invention, random pooling is used as one of pooling strategies, so that the risk of overfitting is effectively reduced, and the generalization capability of the model is improved.

Drawings

FIG. 1 is a flow chart of a method for monitoring urban remote sensing image change depth based on a dynamic mixing strategy.

Fig. 2 is a mixed pooling flow diagram.

FIG. 3 is a DeepLabV3+ network model diagram.

Fig. 4 is a block diagram of a urban remote sensing image change depth monitoring system based on a dynamic mixing strategy.

Detailed Description

The preferred embodiments of the present invention will be described in detail with reference to the accompanying drawings.

As shown in fig. 1-3, the urban remote sensing image change depth monitoring method based on the dynamic mixing strategy according to the embodiment is performed according to the following steps:

s1, preprocessing an image according to an RSSCN7DataSet remote sensing image published by the university of Wuhan in 2015, and marking different types of areas in a city to obtain a data set;

s2, a deep LabV3+ network model of a dynamic mixing pooling strategy is adopted based on Xreception as a backbone network, and a data set training network is utilized;

s3, carrying out same-proportion clipping on remote sensing images of the same city region at different times, inputting the remote sensing images into a trained model, and dividing the images.

Specifically, in this embodiment, the original remote sensing image of the urban area adopts the RSSCN7DataSet remote sensing image DataSet.

In this embodiment, in step S1, the original remote sensing image is cut and semantically segmented as preprocessing of data, the original remote sensing image is cut into 256 x 256 sizes, and the cut image is marked by using a Labelme semantic segmentation marking tool, and is divided into five categories of roads, buildings, water areas, green plants and air spaces.

Because the sign features of each regional category in the urban area are generally more remarkable, in the embodiment, the preprocessed remote sensing image is directly segmented and marked by using different colors, and then a training data set is selected by adopting a random selection mode.

In S2 of this embodiment, the data set is trained using the hybrid pooled deep labv3+ network model. The deep LabV3+ network model fuses multi-scale information, divides a model structure into a coding layer and a decoding layer, and introduces Xattention as a backbone network. The backbone network divides the method of applying serial hole convolution to the result into two modules, wherein one part is directly transmitted into the decoding layer, and the other part passes through the ASPP module. The method specifically comprises the following steps:

S22, obtaining a 60 x 2048 feature map from a backbone network into a cavity space convolution pooling pyramid ASPP, wherein a general ASPP module consists of a 1*1 convolution layer, three 3*3 expansion convolution layers with the cavity rates of 6, 12 and 18 and a global average pooling layer, and in order to better improve the accuracy of network edge segmentation, based on the structure, the global average pooling in the ASPP module is replaced by a mixing pooling strategy adopted in the structure, the input feature map is mixed and pooled to obtain a feature map with the size of 20 x 20, the channel compression is carried out through the convolution of 1*1, the three expansion convolution layers are restored into the height and the width of the input feature map through a deconvolution method, and the results obtained in all the parts are spliced and fused.

S23, merging the mixed result of the low-level features output by the corresponding level of the backbone network and the coding end at the decoding end, and recovering the prediction result of the resolution of the input image after up-sampling by the 3*3 convolution kernel, so that a result image of urban remote sensing image classification is obtained, and finally the network segmentation precision is improved.

S24, in order to better improve the accuracy of network edge segmentation, global average pooling in ASPP is replaced by a mixed pooling strategy adopted in the process in a deep LabV3+ network structure. Training the RSSCN7DataSet remote sensing image DataSet, and using the rest samples as test datasets for a test network.

In this embodiment, the hybrid pooling strategy in step S22 needs to optimize the 2048-layer feature map:

selecting the frequency alpha of maximum pooling in the k-th layer feature map _k Selecting an average pooled frequency beta _k And selecting a random pooled frequency gamma _k The method comprises the following steps:

Mixed pooling result output of k-th layer feature map output in final model _k The method comprises the following steps:

The step of improving global average pooling in ASPP modules to dynamic hybrid pooling comprises:

In step S22 of this embodiment, the average cross-correlation ratio mlou is used to evaluate the merits of different pooling strategies, where mlou represents the ratio of the intersection and union of each class predictor and the true value, and divided by the number of classes. The formula can be expressed as:

In step S2 of this embodiment, the deep labv3+ network model uses a pixel-by-pixel cross entropy loss function to process the multi-classification problem, and selects each pixel point as a sample, and calculates the cross entropy loss function for the prediction category and the real category of each pixel. softmax converts multiple outputs into probability value maps for classification within the (0, 1) interval in a multi-classification process.

The output after softmax regression was:

wherein p is _i,j For the prediction probability of the ith sample for the jth class, l _i,j Output for the network for the ith sample at class j. The above equation changes the output to a probability distribution of (0, 1), and the distance between the predicted probability distribution and the probability distribution of the true value is calculated by the cross entropy loss function. The method comprises the following steps:

In step S3 of this embodiment, remote sensing images of the same city region at different times are cut in the same proportion, and input into a trained model to obtain a segmented image.

The step S4 of this embodiment specifically includes:

S42, splicing the segmented images in sequence.

As shown in fig. 4, this embodiment discloses a system based on the method for monitoring urban remote sensing image change depth according to the above embodiment, which includes the following modules:

training module: based on Xreception as backbone network, adopting a deep LabV3+ network model of dynamic mixing pooling strategy, and training the network model by utilizing a data set;

According to the invention, different pooling modes are dynamically selected for pooling the characteristic images of each layer of the remote sensing image, so that the global information and the local information of the image can be better grasped, and the segmentation precision is improved. According to the invention, random pooling is used as one of pooling strategies, so that the risk of overfitting is effectively reduced, and the generalization capability of the model is improved.

Claims

1. The urban remote sensing image change depth monitoring method based on the dynamic mixing strategy is characterized by comprising the following steps of:

s3, carrying out same-proportion clipping on remote sensing images of the same city region at different times, inputting the remote sensing images into a trained model, and dividing the images;

s4, after obtaining the regional classification result of the urban remote sensing image, calculating the change degree of each regional category in a period of time and labeling the change on the map;

the step S2 is specifically as follows:

s21, extracting characteristic information of the detected urban remote sensing image through a backbone network at a coding layer, and sequentially changing the image size into 1/4,1/8 and 1/16 of the original image size;

s22, a 60 x 2048 feature map obtained by a backbone network enters a cavity space convolution pooling pyramid ASPP, an ASPP module consists of a 1*1 convolution, three 3*3 expansion convolution layers with the void ratio of 6, 12 and 18 and a global average pooling layer, on the basis of the structure, the global average pooling in the ASPP module adopts a mixed pooling strategy, an input feature map is subjected to mixed pooling to obtain a feature map with the size of 20 x 20, channel compression is carried out through the convolution of 1*1, and finally the feature map is restored to the height and the width of the input feature map through a deconvolution method, and the results obtained in all parts are spliced and fused;

s23, merging the mixed result of the low-layer features and the coding layers output by the corresponding layers of the backbone network at the decoding layer, and recovering the prediction result of the resolution of the input image after up-sampling through a 3*3 convolution kernel to obtain a result image of urban remote sensing image classification;

s24, training a network model by using an RSSCN7DataSet remote sensing image DataSet in a dynamic mixing pooling strategy deep LabV3+ network model, and taking the rest samples as test datasets for testing the network model;

in step S22, the hybrid pooling strategy optimizes the 2048-layer feature map:

the method comprises the steps of selecting a maximum pooling frequency from a k-th layer characteristic diagram, selecting an average pooling frequency from the k-th layer characteristic diagram, selecting a random pooling frequency from an i_sto-th layer characteristic diagram, and selecting a training set of the optimized k-th layer characteristic diagram by i_total, wherein i_max is the frequency of the maximum pooling method in the k-th layer characteristic diagram, i_avg is the frequency of the average pooling method in the k-th layer characteristic diagram;

wherein x is _{k_max} Representing the result of the k-layer characteristic diagram after adopting the maximum pooling method, and x _{k_avg} Representing the result of the k-layer characteristic diagram after adopting an average pooling method, and x _{k_sto} Representing the result of the k-layer feature map after adopting a random pooling method;

the global average pooling in the ASPP module is improved to dynamic mixing pooling, and the method comprises the following specific steps:

c. b, optimizing a pooling strategy of the 2 nd-2048 th layer characteristic diagram of the same input by adopting the method of the step b;

2. The urban remote sensing image change depth monitoring method based on dynamic mixing strategy as claimed in claim 1, wherein the method is characterized by comprising the following steps: in the step S1, a remote sensing image adopts an RSSCN7DataSet remote sensing image DataSet; or in step S1, clipping and semantic segmentation are performed on the urban remote sensing image as preprocessing of data, clipping is performed to 256 x 256, and the clipped image is marked by using a Labelme semantic segmentation marking tool, so that the clipped image is classified into five categories of roads, buildings, water areas, green plants and air spaces.

3. The urban remote sensing image change depth monitoring method based on dynamic mixing strategy as claimed in claim 2, wherein the method is characterized by comprising the following steps: in step S1, the preprocessed remote sensing image is segmented and marked by adopting different colors, and then a training data set is selected by adopting a random selection mode.

4. The urban remote sensing image change depth monitoring method based on dynamic mixing strategy as claimed in claim 1, wherein the method is characterized by comprising the following steps: in step S22, the average intersection ratio mlou is used to evaluate the merits of different pooling strategies, where mlou represents the ratio of the intersection and union of each category predicted value and the true value to be added, and divided by the number of categories, and is expressed as:

5. The urban remote sensing image change depth monitoring method based on the dynamic mixing strategy as claimed in claim 4, wherein the method is characterized by comprising the following steps:

in the step S2, the deep LabV3+ network model adopts a pixel-by-pixel cross entropy loss function softmax loss to process the multi-classification problem, each pixel point is selected as a sample, and the cross entropy loss function of each pixel is calculated for the prediction category and the real category of each pixel; the loss function softmax converts a plurality of outputs into probability values and maps the probability values in a (0, 1) interval for classification in a multi-classification process;

the output after softmax regression was:

wherein e is a natural constant of 2.71, p _i,j For the prediction probability of the ith sample for the jth class, l _i,j Outputting the ith sample in the jth class for the neural network, wherein C is the number of the original input classes;

the above-mentioned method changes the output into a probability distribution of (0, 1), and calculates the distance between the predicted probability distribution and the probability distribution of the true value by the cross entropy loss function, specifically:

6. The urban remote sensing image change depth monitoring method based on dynamic mixing strategy as claimed in claim 5, wherein the method is characterized by comprising the following steps: the step S4 specifically comprises the following steps:

wherein L is _i Indicating the size of the change in the ith region class, L _i The larger the area, the larger the TP indicating the change of the area _i Representing the number of real cases of the ith region class, FP _i FN, representing the number of false positive cases of the ith region class _i A number of false counter examples representing the i-th region class;

s42, splicing the segmented images in sequence;

s43, subtracting the urban remote sensing images spliced at the time of T1 and T2, eliminating pixel points of the same region category, obtaining a part only containing a change region, marking, and superposing the marked image with the original image to display the urban change range on the remote sensing image.

7. Urban remote sensing image change depth monitoring system based on dynamic mixing strategy, which is used for executing the urban remote sensing image change depth monitoring method according to any one of claims 1-6, and is characterized in that the system comprises the following modules:

and a segmentation module: the remote sensing images of the same city area at different times are cut in the same proportion, and are input into a trained network model to be segmented;