CN115497006A

CN115497006A - Urban remote sensing image change depth monitoring method and system based on dynamic hybrid strategy

Info

Publication number: CN115497006A
Application number: CN202211138291.1A
Authority: CN
Inventors: 滕旭阳; 林煜凯; 冯嘉旖; 蔡璐; 高永盛
Original assignee: Hangzhou Dianzi University
Current assignee: Hangzhou Dianzi University
Priority date: 2022-09-19
Filing date: 2022-09-19
Publication date: 2022-12-20
Anticipated expiration: 2042-09-19
Also published as: CN115497006B

Abstract

The invention discloses a method and a system for monitoring the change depth of an urban remote sensing image based on a dynamic mixing strategy, wherein the method comprises the following steps: s1, preprocessing a city remote sensing image, and labeling different types of city areas to obtain a data set; s2, training the network by using the data set in the step S1 based on a DeepLabV3+ network model adopting a dynamic mixed pooling strategy and taking the Xconcept as a backbone network; s3, cutting the remote sensing images of the same city area at different times in the same proportion, inputting the remote sensing images into a trained network model, and segmenting the images; and S4, after the region classification result of the urban remote sensing image is obtained, calculating the change degree of each region class within a period of time and marking the change on the image. The invention dynamically selects different pooling modes for pooling the characteristic maps of each layer of the remote sensing image, can better grasp the global information and the local information of the image and improves the segmentation precision.

Description

Urban remote sensing image change depth monitoring method and system based on dynamic hybrid strategy

Technical Field

The invention belongs to the technical field of satellite remote sensing image processing, and particularly relates to a method and a system for monitoring the change depth of an urban remote sensing image based on a dynamic mixing strategy.

Background

The semantic segmentation of the high-resolution remote sensing image is a basic task in the field of remote sensing images, and the main task is to utilize a computer to analyze the color, spectral information and spatial information of various targets in the observed remote sensing image, select characteristic information, classify each pixel in the image and segment the regional outline between the target images. The urban remote sensing image mainly comprises buildings, roads, green plant coverage areas and the like. Accurate segmentation and change detection of the urban remote sensing image can be used for analyzing index details of all parts such as seasonal changes of buildings and green plants, disaster detection, vegetation distribution changes and the like in urban management planning, and further providing basis and help for comprehensively mastering urban layout and real-time dynamic changes thereof.

In recent decades, with the development of computer artificial intelligent vision technology, the resolution of satellite remote sensing images is higher and higher, the capability of processing images is greatly improved, and the method has important significance for developing academic research and guiding production practice. Compared with the common image applied in the prior art, the remote sensing image has the characteristics of high precision, abundant context information, wider visual field range, real-time dynamic monitoring and the like, is applied in various fields in a large scale and gradually shows a refined development trend. And how to accurately extract key information from the remote sensing image is of great importance. The traditional remote sensing image segmentation technology, such as a region segmentation-based method, an edge detection segmentation method, shadow analysis and the like, generally depends on manual design to extract features, and has poor generalization when performing region segmentation on a complex scene.

The high-altitude remote sensing image is subjected to semantic segmentation by a deep learning algorithm based on edge detection, valuable information in the image can be better extracted, the segmentation precision of small-size targets in the remote sensing image can be improved, and meanwhile, data support can be provided for urban planning, desert detection management, urban greening management and control, water area supervision and the like. However, in an actual remote sensing image, due to the fact that complex environments with different conditions such as various terrains, target objects are mutually shielded, urban building types are rich, and the influence of a plurality of factors such as illumination, cloud cover and the like exists, the problems that the precision of object edge segmentation details is greatly reduced, and segmentation boundaries are fuzzy exist.

Therefore, it is very important to design a method for detecting the change of the remote sensing image in the urban area based on the dynamic mixing strategy, which can reduce the influence of the terrain environment factors and improve the accuracy of the segmentation of the edge of the area.

Disclosure of Invention

The invention aims to improve the defects in the technology and provides a method for detecting the change depth of an urban remote sensing image based on a deep LabV3+ multi-scale network model by combining various mixed pooling strategies.

The invention adopts the following technical scheme:

the method for monitoring the change depth of the urban remote sensing image based on the dynamic mixing strategy comprises the following steps:

s1, preprocessing a city remote sensing image, and labeling areas of different categories in a city to obtain a data set;

s2, training the network by using the data set in the step S1 based on a network model adopting a dynamic mixed pooling strategy and taking Xconcept as a backbone network;

s3, cutting the remote sensing images of the same city area at different time in the same proportion, inputting the remote sensing images into the trained network model, and segmenting the images;

and S4, after the regional classification result of the urban remote sensing image is obtained, calculating the change degree of each regional category within a period of time and marking the change on the image.

Further, the original remote sensing image of the urban area adopts an RSSCN7DataSet remote sensing image data set.

Further, in step S1, the original remote sensing image is cut and semantically segmented to be used as data preprocessing, the size of the original remote sensing image is cut to 256 × 256, and the cut image is labeled by using a label semantic segmentation labeling tool and is classified into five categories of roads, buildings, water areas, green plants and open spaces.

Furthermore, the Labelme technology is an open source image labeling tool and is mainly used for data set labeling work of instance segmentation, semantic segmentation, target detection and classification tasks.

Furthermore, because the mark characteristics of each region category in the urban area are generally obvious, the remote sensing image after preprocessing is directly segmented and labeled by using different colors, and then a training data set is selected by adopting a random selection mode.

Further, in step S2, the dataset is trained using a mixed pooling deplab v3+ network model. The DeepLabV3+ network model fuses multi-scale information, the model structure is divided into a coding layer and a decoding layer, and Xconcept is introduced to serve as a backbone network. The method for applying the serial hole convolution to the result by the backbone network is divided into two modules, wherein one module is directly transmitted into a decoding layer, and the other module passes through an ASPP module. The method specifically comprises the following steps:

s21, at a coding layer, extracting characteristic information of the detected urban remote sensing image through a backbone network, and sequentially changing the size of the image into 1/4,1/8 and 1/16 of the size of the original image.

S22, obtaining 60 × 2048 feature graphs from a backbone network, entering a cavity space convolution pooling pyramid ASPP, wherein a general ASPP module consists of a 1 × 1 convolution, 3 × 3 expansion convolution layers with three cavity rates of 6, 12 and 18 respectively and a global average pooling layer, in order to better improve the accuracy of network model edge segmentation, on the basis of the structure, the global average pooling in the ASPP module is replaced by a mixed pooling strategy adopted in the text, the input feature graphs are subjected to mixed pooling to obtain the feature graphs with the size of 20 × 20, channel compression is performed through the convolution of 1 × 1, and finally the feature graphs are reduced to the height and the width of the input feature graphs through a deconvolution method, and the obtained results of all the parts are spliced and fused.

And S23, merging the mixed result of the low-layer features output by the corresponding hierarchy of the backbone network and the coding end at the decoding end, and recovering the prediction result of the resolution of the input image after 3-by-3 convolution kernel up-sampling, so that a result image of city remote sensing image classification is obtained, and finally, the accuracy of network model segmentation is improved.

S24, in the DeepLabV3+ network model adopting the dynamic mixed pooling strategy, training the network model by using the RSSCN7DataSet remote sensing image data set, and taking the rest samples as test data sets for testing the network model.

Further, the mixed pooling strategy in step S22 needs to optimize the 2048-level feature map:

selecting the frequency alpha of maximum pooling in the kth layer profile _k Selecting the average pooled frequency beta _k And selecting a frequency γ for random pooling _k Comprises the following steps:

wherein i _ max is the frequency count for selecting the largest pooling method in the k-th layer feature map, i _ avg is the frequency count for selecting the average pooling method in the k-th layer feature map, i _ sto is the frequency count for selecting the random pooling method in the k-th layer feature map, and i _ total is the size of the training set for optimizing the k-th layer feature map.

Finally, the result output after the mixed pooling output of the k-th layer characteristic diagram in the network model _k Comprises the following steps:

wherein x is _{k_max} Results of maximum pooling for k-th layer profile, x _{k_avg} Results of applying the average pooling method to represent the k-th layer feature map, x _{k_sto} The results of the stochastic pooling method for the k-th layer profile are shown.

Further, the step of improving the global average pooling in the ASPP module to dynamic hybrid pooling comprises:

a. first stageInitializing, namely, before training a DeepLabV3+ network model, the initialization weight alpha of each pooling strategy of each layer of feature map _k 、β _k And gamma _k Are all arranged as

b. Performing pooling processing on the layer 1 feature map by using maximum pooling, average pooling and random pooling methods respectively, performing pooling processing on the other feature maps by using a mixed pooling method, evaluating the advantages and disadvantages of different pooling strategies by calculating the average intersection ratio mIoU of a predicted value and a true value, and taking the pooling mode with the maximum mIoU value as the selected pooling mode of the layer 1 feature map;

c. b, optimizing the pooling strategy of the characteristic diagram of the 2 nd to 2048 th layers with the same input by adopting the method;

d. the operations of the steps b and c are carried out on all training set samples;

e. alpha of each layer of feature map in training set by adopting different pooling strategies is obtained _k 、β _k And gamma _k 。

Further, in step S22, an average cross-over ratio (mlou) is used to evaluate the merits of different pooling strategies, where mlou represents the ratio of the intersection and the union of the predicted values and the true values of each category, which are added together and then divided by the number of categories. The formula can be expressed as:

wherein k represents the number of non-null classes, TP represents the number of true instances, FP represents the number of false positive instances, and FN represents the number of false negative instances.

Further, in step S2, the deep labv3+ network model adopts a pixel-by-pixel cross entropy loss function softmax loss to process a multi-classification problem, selects each pixel point as a sample, and calculates the cross entropy loss function of each pixel for the prediction category and the real category. The loss function softmax converts a plurality of outputs into probability values to map in a (0, 1) interval for classification in a multi-classification process.

The output after the softmax regression process is:

wherein e is a natural constant of 2.71 _i,j Prediction probability of class j for ith sample, l _i,j And C is the number of original input classes for the output of the ith sample in the jth class of the neural network.

The above equation turns the output into a probability distribution of (0, 1), and the distance between the predicted probability distribution and the probability distribution of the true value is calculated by the cross entropy loss function. The method specifically comprises the following steps:

where N is the number of training set samples, C is the number of original input classes, Y (i) is the class to which the ith sample belongs, and w is the weight of the sample data.

Further, in step S3, the remote sensing images of the same city area at different times are cut in the same proportion and input into the trained model to obtain a segmented image.

Further, step S4 specifically includes:

s41, calculating the intersection ratio loss of the predicted value of the ith category in the urban remote sensing image of the same area at the time of T1 and T2, wherein the intersection ratio loss is used for representing the change size of the ith category in the area, and the intersection ratio loss formula is as follows:

wherein L is _i Indicating the magnitude of the change, L, of the ith region class _i The larger the area change, the larger TP _i Representing the number of real instances of the ith area class, FP _i Indicates the number of false positive cases, FN, of the ith area class _i Indicates the ith area categoryNumber of false counter-examples.

And S42, splicing the segmented images in sequence.

And S43, subtracting the urban remote sensing images spliced at the moment of T1 and T2, eliminating pixel points of the same region category, obtaining a part only containing a change region, marking, and overlapping the marked image with the original image so as to display the urban change range on the remote sensing images.

The invention also discloses a system based on the method for monitoring the change depth of the urban remote sensing image, which comprises the following modules:

a dataset acquisition module: preprocessing the urban remote sensing image, and labeling urban areas of different categories to obtain a data set;

a training module: adopting a DeepLabV3+ network model of a dynamic mixed pooling strategy based on the Xconcept as a backbone network, and training the network by utilizing a data set;

a segmentation module: cutting remote sensing images of the same city area at different times in the same proportion, inputting the remote sensing images into a trained model, and segmenting the images;

a labeling module: and after the regional classification result of the urban remote sensing image is obtained, calculating the change degree of each regional category within a period of time and marking the change on a graph.

Compared with the prior art, the invention has the beneficial effects that:

1. the invention dynamically selects different pooling modes for pooling the characteristic diagrams of each layer of the remote sensing image, can better grasp the global information and the local information of the image and improves the segmentation precision.

2. The invention takes random pooling as one of the pooling strategies, effectively reduces the risk of overfitting and improves the generalization capability of the model.

Drawings

FIG. 1 is a flow chart of the method for monitoring the change depth of the urban remote sensing image based on the dynamic mixing strategy.

FIG. 2 is a flow diagram of a hybrid pooling process.

FIG. 3 is a DeepLabV3+ network model diagram.

FIG. 4 is a block diagram of the system for monitoring the change depth of the urban remote sensing image based on the dynamic mixing strategy.

Detailed Description

The preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

As shown in fig. 1 to 3, the method for monitoring the depth of change of the remote sensing image in the urban area based on the dynamic mixing strategy in the embodiment is performed according to the following steps:

s1, preprocessing an image and labeling different types of regions in a city according to RSSCN7DataSet remote sensing images released by Wuhan university in 2015 to obtain a data set;

s2, training a network by using a data set by adopting a DeepLabV3+ network model of a dynamic mixing pooling strategy based on the Xconcept as a backbone network;

and S3, cutting the remote sensing images of the same city region at different times in the same proportion, inputting the remote sensing images into the trained model, and segmenting the images.

And S4, after the region classification result of the urban remote sensing image is obtained, calculating the change degree of each region class within a period of time and marking the change on the image.

Specifically, in this embodiment, the original remote sensing image of the urban area adopts RSSCN7DataSet remote sensing image data set.

In this embodiment, in step S1, the original remote sensing image is cut and semantically segmented as data preprocessing, the original remote sensing image is cut to 256 × 256 size, and the cut image is labeled by using a label tool for semantic segmentation, and is classified into five categories, i.e., road, building, water area, green plant, and open space.

Furthermore, the Labelme technology is an open source image labeling tool and is mainly used for data set labeling work of example segmentation, semantic segmentation, target detection and classification tasks.

Since the mark characteristics of each region category in urban areas are generally significant, the remote sensing image after preprocessing is directly segmented and labeled by using different colors in the embodiment, and then a training data set is selected by adopting a random selection mode.

In S2 of this example, the dataset was trained using the mixed pooling depeplab v3+ network model. The DeepLabV3+ network model fuses multi-scale information, the model structure is divided into an encoding layer and a decoding layer, and Xceptance is introduced to serve as a backbone network. The backbone network divides the result into two modules by applying a serial hole convolution method, wherein one module is directly transmitted into a decoding layer, and the other module passes through an ASPP module. The method specifically comprises the following steps:

s21, at the coding layer, extracting characteristic information from the detected urban remote sensing image through a backbone network, and sequentially changing the size of the image into 1/4,1/8 and 1/16 of the size of the original image.

S22, obtaining 60 × 2048 feature graphs from a backbone network, entering a cavity space convolution pooling pyramid ASPP, wherein a general ASPP module consists of a 1 × 1 convolution, 3 × 3 expansion convolution layers with three cavity rates of 6, 12 and 18 respectively and a global average pooling layer, in order to improve the accuracy of network edge segmentation better, on the basis of the structure, the global average pooling in the ASPP module is replaced by a mixed pooling strategy adopted in the text, the input feature graphs are subjected to mixed pooling to obtain the feature graphs with the size of 20 × 20, channel compression is performed through the convolution with 1 × 1, and finally the feature graphs are reduced to the height and the width of the input feature graphs through a deconvolution method, and the results obtained by all parts are spliced and fused.

And S23, merging the mixed result of the low-layer features output by the corresponding hierarchy of the backbone network and the coding end at the decoding end, and recovering the prediction result of the resolution of the input image after 3-by-3 convolution kernel up-sampling, so that a result image of city remote sensing image classification is obtained, and finally, the accuracy of network segmentation is improved.

And S24, in order to better improve the accuracy of network edge segmentation, in a DeepLabV3+ network structure, replacing the global average pooling in the ASPP with a mixed pooling strategy adopted in the text. And training the RSSCN7DataSet remote sensing image data set, and taking the rest samples as a test data set for testing the network.

In this embodiment, the hybrid pooling strategy in step S22 needs to optimize the 2048-level feature map:

selecting the frequency alpha of maximum pooling in the kth layer profile _k Selecting the average pooled frequency beta _k And selecting a frequency gamma for random pooling _k Comprises the following steps:

Result output after mixed pooling output of k-th layer feature map in final model _k Comprises the following steps:

wherein x is _{k_max} Representing the results of the k-th layer profile using the maximum pooling method, x _{k_avg} Representing the results of the k-th layer profile using the average pooling method, x _{k_sto} The result of the random pooling method is shown in the k-th layer characteristic diagram.

The step of improving the global average pooling in the ASPP module to dynamic hybrid pooling comprises:

a. initializing, namely, before training the DeepLabV3+ network model, initializing the initialization weight alpha of each pooling strategy of each layer of feature map _k 、β _k And gamma _k Are all arranged as

b. Performing pooling treatment on the layer 1 feature map by using maximum pooling, average pooling and random pooling methods respectively, performing pooling treatment on the rest feature maps by using a mixed pooling method, evaluating the advantages and disadvantages of different pooling strategies by calculating the average intersection ratio mIoU of a predicted value and a true value, and taking the pooling mode with the maximum mIoU value as the selected pooling mode of the layer 1 feature map;

e. obtaining alpha of each layer of feature map in training set by adopting different pooling strategies _k 、β _k And gamma _k 。

In step S22 of this embodiment, the quality of different pooling strategies is evaluated by using an average intersection ratio mlou, where mlou represents adding the ratios of the intersection and the union of the predicted values and the true values of each category, and dividing the sum by the number of categories. The formula can be expressed as:

In step S2 in this embodiment, the deep labv3+ network model uses a pixel-by-pixel cross entropy loss function to process a multi-classification problem, selects each pixel point as a sample, and calculates the cross entropy loss function for the prediction category and the real category of each pixel. softmax converts the outputs into probability values to map to (0, 1) intervals for classification in a multi-classification process.

The output after the softmax regression process is:

wherein p is _i,j Prediction probability of class j for ith sample, l _i,j The output in class j for the ith sample for the network. The above equation changes the output to a probability distribution of (0, 1), and calculates the distance between the predicted probability distribution and the probability distribution of the true value by the cross entropy loss function. The method comprises the following specific steps:

wherein N is the number of training set samples, C is the number of original input classes, Y (i) is the class to which the ith sample belongs, and w is the weight of sample data.

In step S3 of this embodiment, remote sensing images in the same city area at different times are cut in the same proportion, and input into the trained model to obtain a segmented image.

Step S4 of this embodiment specifically includes:

s41, calculating the intersection ratio loss of the predicted value of the ith category in the urban remote sensing images of the same area at the time T1 and the time T2, wherein the intersection ratio loss is used for representing the change size of the ith category in the area, and the intersection ratio loss formula is as follows:

wherein L is _i Indicating the magnitude of the change, L, of the ith region class _i The larger the area change, the larger TP _i Representing the number of real cases, FP, of the ith area class _i Indicates the number of false positive cases, FN, of the ith area class _i Indicating the number of false counterexamples for the ith area class.

And S42, splicing the segmented images in sequence.

As shown in fig. 4, the present embodiment discloses a system for monitoring the depth of change of remote sensing images in urban areas based on the above embodiments, which includes the following modules:

a training module: adopting a DeepLabV3+ network model of a dynamic mixed pooling strategy based on the Xconcept as a backbone network, and training the network model by utilizing a data set;

The invention dynamically selects different pooling modes for pooling the characteristic diagrams of each layer of the remote sensing image, can better grasp the global information and the local information of the image and improves the segmentation precision. The invention takes random pooling as one of the pooling strategies, effectively reduces the risk of overfitting and improves the generalization capability of the model.

Claims

1. The method for monitoring the change depth of the urban remote sensing image based on the dynamic mixing strategy is characterized by comprising the following steps of:

s3, cutting the remote sensing images of the same city area at different time in the same proportion, inputting the remote sensing images into a trained model, and segmenting the images;

2. The method for monitoring the change depth of the urban remote sensing image based on the dynamic mixing strategy as claimed in claim 1, wherein: in the step S1, the remote sensing image adopts an RSSCN7DataSet remote sensing image data set; or, in the step S1, the city remote sensing image is cut and semantically divided as data preprocessing, the size of the data is cut to 256 × 256, and the cut image is marked by using a label tool for semantically dividing and marking, so that the image is divided into five categories of roads, buildings, water areas, green plants and open spaces.

3. The method for monitoring the change depth of the urban remote sensing image based on the dynamic mixing strategy as claimed in claim 2, wherein: in the step S1, the preprocessed remote sensing image is segmented and labeled by adopting different colors, and then a training data set is selected in a random selection mode.

4. The method for monitoring the change depth of the urban remote sensing image based on the dynamic mixing strategy as claimed in claim 2 or 3, wherein: step S2 is specifically as follows:

s21, at a coding layer, extracting characteristic information of the detected urban remote sensing image through a backbone network, and sequentially changing the size of the image into 1/4,1/8 and 1/16 of the size of the original image;

s22, obtaining 60 × 2048 feature graphs from a backbone network, entering a cavity space convolution pooling pyramid ASPP, wherein an ASPP module consists of a 1 × 1 convolution, 3 × 3 expansion convolution layers with three cavity rates of 6, 12 and 18 respectively and a global average pooling layer, on the basis of the structure, the global average pooling in the ASPP module adopts a mixed pooling strategy, the input feature graphs are subjected to mixed pooling to obtain the feature graphs with the size of 20, channel compression is performed through the convolution of 1 × 1, the feature graphs are reduced to the height and the width of the input feature graphs through a deconvolution method, and the results obtained by all parts are spliced and fused;

s23, recovering a prediction result of the resolution of an input image after a result obtained by mixing low-layer features and a coding layer output by a corresponding hierarchy of a backbone network fused at a decoding layer is subjected to 3-by-3 convolution kernel upsampling, and obtaining a result image of city remote sensing image classification;

5. The method for monitoring the change depth of the urban remote sensing image based on the dynamic mixing strategy as claimed in claim 4, wherein: in step S22, the 2048-level feature map is optimized by the mixed pooling strategy:

wherein i _ max is the frequency number of the selected largest pooling method in the k-th layer feature map, i _ avg is the frequency number of the selected average pooling method in the k-th layer feature map, i _ sto is the frequency number of the selected random pooling method in the k-th layer feature map, and i _ total is the size of the training set of the optimized k-th layer feature map;

wherein x is _{k_max} Representing the results of the k-th layer profile using the maximum pooling method, x _{k_avg} Representing the results of the k-th layer profile using the average pooling method, x _{k_sto} The results of the stochastic pooling method for the k-th layer profile are shown.

6. The method for monitoring the change depth of the urban remote sensing image based on the dynamic mixing strategy as claimed in claim 5, wherein: the method improves global average pooling in the ASPP module into dynamic mixed pooling, and comprises the following specific steps:

a. initializing, namely, before training a DeepLabV3+ network model, initializing the initialization weight alpha of each pooling strategy of each layer of feature map _k 、β _k And gamma _k Are all arranged as

c. b, optimizing the pooling strategy of the same input layer 2-2048 characteristic diagrams by adopting the method in the step b;

7. The method for monitoring the change depth of the urban remote sensing image based on the dynamic mixing strategy as claimed in claim 6, wherein: in step S22, the quality of different pooling strategies is evaluated by using an average intersection-to-union ratio mlou, where mlou represents the sum of the ratios of the intersection and the union of the predicted values and the true values of each category, and the sum is divided by the number of categories, and is expressed as:

wherein k represents the number of non-null classes, TP represents the number of true positive cases, FP represents the number of false positive cases, and FN represents the number of false negative cases.

8. The method for monitoring the depth of change of the urban remote sensing image based on the dynamic mixing strategy as claimed in claim 7, wherein the method comprises the following steps:

in the step S2, the DeepLabV3+ network model adopts a pixel-by-pixel cross entropy loss function softmax loss to process a multi-classification problem, each pixel point is selected as a sample, and the cross entropy loss function of the prediction category and the real category of each pixel is calculated; in the multi-classification process, the loss function softmax converts a plurality of outputs into probability values to map in a (0, 1) interval for classification;

the output after the softmax regression process is:

wherein e is a natural constant of 2.71 _i,j Prediction probability of class j for ith sample, l _i,j The output of the ith sample in the jth class for the neural network, and C is the number of the original input classes;

the above equation changes the output into a probability distribution of (0, 1), and calculates the distance between the predicted probability distribution and the probability distribution of the true value by the cross entropy loss function, specifically:

9. The method for monitoring the change depth of the urban remote sensing image based on the dynamic mixing strategy as claimed in claim 8, wherein: step S4 specifically includes:

wherein L is _i Indicates the magnitude of change, L, of the ith area class _i The larger the size, the more the area changesThe larger TP _i Representing the number of real instances of the ith area class, FP _i Indicates the number of false positive cases, FN, of the ith area class _i Representing the number of false counterexamples of the ith area category;

s42, splicing the segmented images in sequence;

10. The system based on the urban remote sensing image change depth monitoring method of any one of claims 1-9 is characterized by comprising the following modules:

a training module: adopting a DeepLabV3+ network model of a dynamic mixing pooling strategy based on the Xconcept as a backbone network, and training the network by using a data set;

a segmentation module: cutting the remote sensing images of the same city area at different time in the same proportion, inputting the remote sensing images into a trained network model, and segmenting the images;