CN116597287A

CN116597287A - Remote sensing image landslide recognition method based on deep learning method

Info

Publication number: CN116597287A
Application number: CN202310873327.9A
Authority: CN
Inventors: 方留杨; 刘天逸; 刘惠兴; 李洪春; 纪芳芳; 姚聪; 李春晓; 吴昊; 李文; 曾珍; 马力; 贾志文; 杨昌浩; 段兴铭; 桂瑶俊
Original assignee: BROADVISION ENGINEERING CONSULTANTS
Current assignee: BROADVISION ENGINEERING CONSULTANTS
Priority date: 2023-07-17
Filing date: 2023-07-17
Publication date: 2023-08-15

Abstract

The invention relates to a remote sensing image landslide identification method based on a deep learning method, and belongs to the technical field of coatings. The method comprises the following steps: collecting satellite remote sensing image data with landslide areas, and acquiring a public remote sensing image landslide data set; preprocessing and cutting the acquired satellite remote sensing image data with the landslide area, and classifying to obtain a classified and recombined data set; then, carrying out data enhancement and landslide sample labeling; then constructing an improved landslide recognition model based on Mask R-CNN, and training and verifying; and finally, performing landslide recognition on the image to be recognized by adopting the finally obtained improved landslide recognition model based on Mask R-CNN to obtain a recognition result.

Description

Remote sensing image landslide recognition method based on deep learning method

Technical Field

The invention belongs to the technical field of traffic disasters, and particularly relates to a remote sensing image landslide identification method based on a deep learning method.

Background

In recent years, landslide disasters become one of the most important natural disasters in China, and are affected by landslide disasters, so that the landslide disasters suffer from a large number of casualties and huge economic losses each year. Along with the progress of earth observation technology in China, the acquisition difficulty of high-resolution remote sensing images is gradually reduced, and the automation level of landslide identification technology is gradually improved. The existing landslide identification method based on remote sensing images mainly comprises the following steps: visual interpretation methods, pixel-based methods, object-oriented methods. With the rapid development of information extraction technology, deep learning is gradually applied to the landslide extraction of remote sensing images, and the deep learning method can autonomously learn features from input data, so that subjective influence caused by manually extracting the features is reduced, and the working efficiency is improved. At present, the traditional remote sensing image interpretation method cannot adapt to the explosive growth of remote sensing image data, and landslide identification is carried out by utilizing a deep learning technology, so that a large amount of image data information can be fully digested, and landslide detection with higher automation degree, higher speed and lower cost can be realized. However, with rapid progress of deep learning technology, most of basic models have a shortage of performance in order to adapt to landslide target recognition tasks. For example, the performance of the ResNet model and the EfficientNet model in the feature extraction structure is insufficient, the Feature Pyramid (FPN) and other structures cannot achieve the feature fusion function from top to bottom and from bottom to top, and the performance of the regional suggestion network (RPN) in the Mask R-CNN model cannot meet the high-precision identification of landslide identification.

Therefore, how to optimize the target recognition model, and effectively improve the performance of the model, so as to be suitable for the landslide target detection task, is a technical problem to be solved urgently by those skilled in the art.

Disclosure of Invention

The invention aims to solve the defects of the prior art and provides a remote sensing image landslide identification method based on a deep learning method.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:

a remote sensing image landslide recognition method based on a deep learning method comprises the following steps:

step (1): collecting satellite remote sensing image data with landslide areas, and acquiring a public remote sensing image landslide data set; preprocessing acquired satellite remote sensing image data with landslide areas, cutting out landslide images in the acquired satellite remote sensing image data, and classifying the cut landslide images and a public remote sensing image landslide data set according to the landslide image data and the non-landslide image data to obtain a classified and recombined data set; the classified and recombined data set comprises a landslide image data set and a non-landslide image data set;

step (2): carrying out data enhancement on the landslide image data set by utilizing the recombined data set to construct an enhanced data set;

step (3): performing landslide sample labeling on the enhanced data set, and dividing the labeled sample set into a training set and a verification set;

step (4): constructing an improved landslide recognition model based on Mask R-CNN; the improved landslide recognition model based on the Mask R-CNN is based on the Mask R-CNN model, a feature extraction network ResNet in the Mask R-CNN model is changed into a feature extraction network ResNeSt, a feature fusion structure FPN in the Mask R-CNN model is changed into a feature fusion structure RFP, and an RPN structure in the Mask R-CNN model is changed into a detection head of a single-stage target detection model PISA-Retinonet;

step (5): training the improved landslide recognition model based on the Mask R-CNN constructed in the step (4) by adopting the training set obtained in the step (3), taking image data as input during training, outputting corresponding labels, and training by adopting a transfer learning method; and adopting the verification set to verify;

step (6): and (3) carrying out landslide recognition on the image to be recognized by adopting the improved landslide recognition model based on the Mask R-CNN obtained in the step (5), so as to obtain a recognition result.

Further, preferably, in step (1), the preprocessing includes radiation correction, geometric correction, orthographic correction, geometric registration and image fusion.

Further, preferably, in the step (1), RGB three-band in the image is extracted from the preprocessed satellite remote sensing image data, and a true color remote sensing image is formed, then the pixel value in the image is stretched to a range of 0-255 by adopting a linear stretching mode, and then a landslide area and a non-landslide area are searched by a visual recognition method, and a landslide image in the landslide area is cut.

Further, it is preferable that in the step (2), the data enhancement adopts a CutMix data enhancement method, a Mosaic data enhancement method and a random flip data enhancement method.

Further, it is preferable that in step (2):

each time the CutMix data is enhanced, 1 image is selected from the landslide image data set and the non-landslide image data set respectively, and the images are overlapped and spliced through the CutMix data enhancement method; selecting an image from the landslide image data set, wherein the aspect ratio of the image is more than or equal to 2;

selecting 1 image from the landslide image data set when the Mosaic data is enhanced each time, selecting 3 images from the non-landslide image data set, and performing image stitching by using a Mosaic data enhancement method; wherein, selecting the image length-width ratio from the landslide image data set is smaller than 2;

and combining the image data enhanced by the CutMix, the image data enhanced by the Mosaic with landslide image data and non-landslide image data which are not used for CutMix, mosaic enhancement, and then enhancing the data by adopting random overturn to obtain an enhanced data set.

Further, it is preferable that in the step (3), the division ratio of the training set and the verification set is 8:2.

further, preferably, in step (4), a feedback connection is added to the FPN structure, and the feature fusion layer generated in the FPN is extracted from landslide image fusion information through the ASPP module and reduced to the size of the feature layer in the resnett module, so as to construct the RFP structure.

Further, it is preferable that in the step (4), the overall model loss function in the detection head of the single-stage target detection model PISA-RetinaNetLThe calculation formula is as follows:

L _{PISA-RetinaNet=} L _{cls 2} + L _box2 +L _carl （3）

L ₌ L _cls + L _box +L _mask +L _{PISA-RetinaNet} （4）

in the method, in the process of the invention,L _cls 、L _box 、L _mask frame classification loss, frame regression loss and mask loss in two-stage identification;L _{cls 2} 、L _box2 is a classification loss and a regression loss in the suggestion frame generation stage;L _carl is the classification-aware regression loss in the PISA-RetinaNet model.

In the invention, when labeling landslide samples, 1 and 0 are adopted to distinguish landslide from background, the landslide is marked as 1, and the other marks are marked as 0. But is not limited thereto.

In the invention, when the verification set is adopted for verification, if the accuracy reaches a preset value (such as 95%, 90%, 85%, etc.), the model is successfully constructed; otherwise, the marked sample set is added to continue training.

Compared with the prior art, the invention has the beneficial effects that:

the method can be fully based on the remote sensing image data set, and the performance upper limit of the model is improved by optimizing the Mask R-CNN model, so that the application of the model in landslide identification tasks is realized. The ResNeSt structure adds a channel attention module and a multi-path structure in the model, optimizes the sensitivity of the model to different channel importance, optimizes the model volume, simultaneously gives consideration to the feature fusion function from top to bottom and from bottom to top, strengthens the feature information extraction in the feature fusion process from bottom to top, fully exerts the respective advantages by combining the two models, and can effectively improve the boundary frame recognition accuracy and the Mask recognition accuracy by 6% -9% compared with the original Mask R-CNN model by adding the two models into the Mask R-CNN model. The region proposal network structure in the Mask R-CNN model is replaced by a PISA-RetinaNet detection head structure, the deeper feature extraction structure is provided, and positive and negative samples, difficult and easy samples and important samples can be balanced in model training. Through the optimization, compared with an original Mask R-CNN model, the boundary box recognition accuracy and the Mask recognition accuracy can be effectively improved by 8% -10%.

Drawings

FIG. 1 is a flow chart of a landslide remote sensing image recognition method;

FIG. 2 is a schematic diagram of a Mask R-CNN model;

FIG. 3 is a schematic diagram of an improved landslide recognition model based on Mask R-CNN;

FIG. 4 is a schematic diagram of a ResNeSt bottleneck structure;

FIG. 5 is a view of the RFP structure;

FIG. 6 is a view showing an RFP development block;

FIG. 7 is a block diagram of ASPP;

fig. 8 is a schematic diagram of the cutmax data enhancement of the landslide remote sensing image; wherein (a) is a landslide image; (b) a non-landslide image;

FIG. 9 is a schematic diagram of the Mosaic data enhancement of the landslide remote sensing image; wherein (a) is a landslide image; (b) - (d) are non-landslide images;

fig. 10 is a graph showing comparison of landslide recognition results.

Detailed Description

The present invention will be described in further detail with reference to examples.

It will be appreciated by those skilled in the art that the following examples are illustrative of the present invention and should not be construed as limiting the scope of the invention. The specific techniques or conditions are not identified in the examples and are performed according to techniques or conditions described in the literature in this field or according to the product specifications. The materials or equipment used are conventional products available from commercial sources, not identified to the manufacturer.

Embodiment 1 is a remote sensing image landslide recognition method based on a deep learning method, comprising the following steps:

Embodiment 2 is a remote sensing image landslide recognition method based on a deep learning method, comprising the following steps:

In step (1), the preprocessing includes radiation correction, geometric correction, orthographic correction, geometric registration and image fusion.

In the step (1), RGB three bands in the image are extracted from the preprocessed satellite remote sensing image data, a true color remote sensing image is formed, then a linear stretching mode is adopted to stretch pixel values in the image to a range of 0-255, and then a landslide area and a non-landslide area are searched through a visual recognition method, and landslide images in the landslide areas are cut out.

In the step (2), the data enhancement adopts a CutMix data enhancement method, a Mosaic data enhancement method and a random flip data enhancement method.

In the step (2): each time the CutMix data is enhanced, 1 image is selected from the landslide image data set and the non-landslide image data set respectively, and the images are overlapped and spliced through the CutMix data enhancement method; selecting an image from the landslide image data set, wherein the aspect ratio of the image is more than or equal to 2;

In the step (3), the dividing ratio of the training set and the verification set is 8:2.

in the step (4), a feedback connection is added into the FPN structure, and the feature fusion layer generated in the FPN is used for extracting landslide image fusion information through the ASPP module and reducing the landslide image fusion information to the size of the feature layer in the ResNeSt module, so that the RFP structure is constructed.

In the step (4), the integral model loss function in the detection head of the single-stage target detection model PISA-RetinaNetLThe calculation formula is as follows:

L _{PISA-RetinaNet=} L _{cls 2} + L _box2 +L _carl （3）

L ₌ L _cls + L _box +L _mask +L _{PISA-RetinaNet} （4）

Embodiment 3 a remote sensing image landslide recognition method based on a deep learning method, as shown in fig. 1, includes the following steps:

step (1): collecting satellite remote sensing image data with landslide areas, and acquiring a public remote sensing image landslide data set; preprocessing acquired satellite remote sensing image data with landslide areas, cutting out landslide images in the acquired satellite remote sensing image data, and classifying the cut landslide images and a public remote sensing image landslide data set according to the landslide image data and the non-landslide image data to obtain a classified and recombined data set;

the satellite remote sensing image data with the landslide area comprises landslide remote sensing images acquired by multi-source remote sensing satellites, and the disclosed remote sensing image landslide data set comprises landslide area images and non-landslide area images.

Further, the remote sensing image data of the satellite needs to be subjected to remote sensing image preprocessing, and the preprocessing process comprises the following steps: the method comprises the steps of radiation correction, geometric correction, orthographic correction, geometric registration and image fusion, wherein RGB three wave bands in an image are extracted from a preprocessed remote sensing image, a true color remote sensing image is formed, then pixel values in the image are stretched to a range of 0-255 in a linear stretching mode, then landslide areas and non-landslide areas are searched by a visual recognition method based on expert knowledge, and images of target areas are cut, an independent landslide cutting method is adopted in the cutting work of the landslide images, namely each cut landslide image at least comprises 1 landslide target with complete edges, and data set manufacturing is completed.

Further, classifying all the data sets by taking landslide image data and non-landslide image data as classification standards to obtain a landslide image data set and a non-landslide image data set, namely obtaining the data set after classification and recombination.

the data enhancement is mainly aimed at expanding the background difference between a data set and an enhanced landslide sample, in particular to a small-size landslide target cut out from a landslide image, and the image resolution can not be improved by adopting a simple size conversion method, so that the image information is enhanced by a background increasing mode, and the value of the sample in model training is increased. The enhancement method comprises the following steps: data stitching-type data enhancement based on CutMix, mosaic, and random flip data enhancement based on all dataset training procedures.

Further, when the CutMix data is enhanced each time, 1 image is selected from the landslide image data set and the non-landslide image data set respectively, and image superposition and splicing are carried out through a CutMix data enhancement method; and taking all the image data enhanced by the CutMix data together as a CutMix enhancement data set. The CutMix data enhancement method aims at a landslide image sample with an aspect ratio larger than 2, namely, needs to conduct CutMix data enhancement by utilizing the landslide image sample with the aspect ratio larger than or equal to 2 in the landslide image data set.

Further, 1 image is selected from the landslide image data set when the Mosaic data is enhanced, 3 images are selected from the non-landslide image data set, and image superposition and splicing are carried out through a Mosaic data enhancement method; and taking all the image data enhanced by the Mosaic data together as a Mosaic enhancement data set. The method for enhancing the Mosaic data aims at landslide image samples with the aspect ratio smaller than 2, namely the landslide image samples with the aspect ratio smaller than 2 in the landslide image data set are needed to be utilized for enhancing the Mosaic data.

Further, combining the CutMix enhancement data set, the Mosaic enhancement data set and the landslide image data set which are not used for CutMix, mosaic enhancement, and carrying out data enhancement by adopting random inversion to obtain an enhancement data set.

labeling the enhanced data set constructed in the step (2) by using a Labelme tool to obtain a landslide image file and a landslide labeling file ([ Json "). And (3) matching the landslide image file with the corresponding landslide mark file according to a training set: the proportion of the verification set is converted into a COCO data set format, a landslide sample set is generated, wherein the training set is used for model training, the verification set is used for obtaining model verification indexes, and test reasoning is also performed in the verification set. The division ratio of the training set and the verification set is preferably 8:2.

Step (4): constructing an improved landslide recognition model based on Mask R-CNN;

after the image data is input into the network, the image feature extraction and feature fusion are realized through a backbone network ResNet and a neck structure feature pyramid network (Feature Pyramid Networks, FPN), five layers of feature layers P2-P6 generated in the FPN sequentially pass through a regional suggestion network (Region Proposal Networks, RPN) to obtain suggestion frames in five size feature images, then the feature images cut by the suggestion frames are restored to a uniform size through an ROI alignment module, the feature images in classification tasks are adjusted to 7×7, and the feature images in segmentation tasks are adjusted to 14×14. Finally, the second stage detection task is completed in the head network, and the prediction and mask result is obtained.

As shown in fig. 3, an improved landslide recognition model based on Mask R-CNN is constructed, and a new feature extraction network resnett is adopted to replace a feature extraction network ResNet in an original model in an improved structure; in the improved structure, a new feature fusion structure RFP is adopted to replace a feature fusion structure FPN in the original model; in the improved structure, a detection head of a single-stage target detection model PISA-RetinaNet is adopted to replace an RPN structure in an original model, so that the generation of a first-stage suggestion frame is realized.

Further, in the improved structure, a new feature extraction network ResNeSt is adopted to replace a feature extraction network ResNet in an original model, the overall structure of the ResNeSt is shown in a table 1, and the C2-C5 feature graphs correspond to output results of convolution layers 2-5. As shown in fig. 4, which is a bottleneck block of ResNeSt, the ResNeSt combines a channel attention mechanism with a multipath structure. Input content was grouped twice in each module of the ResNeSt model using R, K two hyper-parameters, totaling g=r×k packets. In the first grouping Split 1-Split R controlled by the super parameter R, a channel attention mechanism is introduced, and weights are calculated and distributed to Split 1-Split R graphs in a product mode through serial global pooling, two full connection layers and Softmax. And in the Split grouping, the second grouping of Cardinal 1-Cardinal K is realized by adopting the super parameter K, each grouping structure in the grouping is connected with a 1 multiplied by 1 and a 3 multiplied by 3 convolution layer in series, and finally, the stacking type 'concat' combination is adopted, so that more channel information is reserved to the maximum extent.

Table 1 resnesst 101 structural table

；

Further, in the "new feature fusion structure RFP is adopted in the improved structure to replace feature fusion structure FPN in the original model", as shown in fig. 5, the RFP structure is based on ResNeSt, as shown in fig. 6, the RFP unfolding structure is based on ResNeSt, feedback connection is added in the FPN structure, the feature fusion layer generated in the FPN extracts landslide image fusion information through the ASPP module, and the feature fusion layer is reduced to the size of the feature layer in the ResNeSt module, so that the RFP structure is constructed. In RFP, feedback connection transmits the fusion result from top to bottom back to backbone network, and in the second round of ResNeSt network feature extraction process, bottom-to-top information transmission is realized, so that the model can more fully acquire the signals in the feature image and the fused feature imageAnd (5) extinguishing. Arbitrary feature layers in a structureThe expression is as follows:

（1）

（2）

where t is the number of feedback connection cycles,indicating the i-th stage,/->Is the ith top-down output feature layer in the t-th feedback connection,/th top-down output feature layer in the t-th feedback connection>Is the i-th bottom-up feature extraction feature layer in the t-th feedback connection,/o>Representing the ith bottom-up backbone operation in the t-th feedback connection, +.>Representing the ith top-down FPN operation in the t-th feedback connection,/th>Is the feature layer size conversion at the i-th stage in the t-th feedback connection.

Further, as shown in fig. 7, an ASPP structure for fusing the feature layer into the size of the feature layer in the backbone network is provided. The ASPP module has 4 parallel branches, the first three branches adopt a serial structure of a convolution layer and a ReLU layer to finish convolution characteristic concentration, and the three-layer convolution layer is configured as follows: kernel size= [1, 3], atrous rate =[1, 3, 6], padding = [0, 3, 6]The 4 th branch adopts a global average pooling layer, a 1 multiplied by 1 convolution layer andthe ReLU layers are connected in series to realize the global feature compression of the output feature layer in the feature pyramid, and the four parallel branch inputs are all feature layersThe number of the output channels is 1/4 of the number of the input channels, and finally, the four branch characteristics are fused by adopting a 1 multiplied by 1 convolution layer.

Further, in the "the detection head of the single-stage target detection model PISA-RetinaNet is used in the improved structure to replace the RPN structure in the original model", as shown in fig. 3, where the PISA-RetinaNet detection head structure includes two branches that do not share weights, each separate branch uses four groups of continuous 3×3 convolution plus ReLU modules to perform the first-step feature integration, the fifth convolution uses a 3×3 convolution plus Sigmoid to obtain a result, the number of output channels in the classification branch is the landslide class number 1, and the number of output channels in the frame regression is the frame fine tuning parameter 4. The detection head of the single-stage target detection model PISA-RetinaNet is used for replacing the RPN structure in the original model, so that the accuracy of the suggestion frame of the two-stage target recognition model Mask R-CNN in the first-stage suggestion frame generation stage can be improved, and the overall model recognition accuracy is improved. After the RPN structure is replaced by the PISA-RetinaNet detection head, the integral model loss function L is required to be calculatedL _RPN Instead ofL _{PISA-RetinaNet} The overall model loss function formula is as follows:

L _{PISA-RetinaNet=} L _{cls 2} + L _box2 +L _carl （3）

L ₌ L _cls + L _box +L _mask +L _{PISA-RetinaNet} （4）

and judging model loss convergence by referring to the accuracy evaluation indexes BBox AP50 and BBox AP75 and Mask AP50 and Mask AP75, and enabling a training model to be available.

Further, in the process of adopting the transfer learning training, the initial weight of the improved landslide recognition model based on Mask R-CNN adopts the pre-training weight of the ResNeSt model.

Further, the BBox AP50 and the BBox AP75 are quantitative evaluation indexes of recognition results and the Mask AP50 and the Mask AP75 are quantitative evaluation indexes of segmentation results, in the quantitative evaluation indexes of the recognition results, when IoU% or more and 75% of the evaluation indexes of the recognition results are respectively used as starting points of TP discrimination thresholds, the AP values are obtained through Precision-Recall curve integral calculation, and the indexes can reflect the accuracy, recall rate and cross-over ratio simultaneously. The range of the two quantitative evaluation indexes is 0-1, and the higher the score is, the higher the identification accuracy is. The calculation formula is as follows:

；

in the middle ofpIs Precision；rIs thatRecall. TP, FP, TN, FN has different meanings in different tasks. In the landslide recognition task, TP (True-Positive) in the table indicates that the predicted result is a landslide, FP (False-Positive) indicates that the predicted result is a landslide, other sample frame numbers are actually used, TN (True-Negative) indicates that the predicted result is other, FN (False-Negative) indicates that the predicted result is other, and other sample frame numbers are actually used. In the landslide segmentation task, TP, FP,TN and FN are counted by taking pixels as basic units.

An application example is a remote sensing image landslide identification method based on a deep learning method, which comprises the following steps:

in the example, the satellite remote sensing image data adopts GF-2 remote sensing images, and the resolution is 0.8 m. And carrying out radiation correction, geometric correction, orthographic correction, geometric registration and image fusion on the acquired satellite remote sensing image data of GF-2 with the landslide area. And extracting RGB three bands in the image from the preprocessed satellite remote sensing image data, forming a true color remote sensing image, stretching pixel values in the image to a range of 0-255 by adopting a linear stretching mode, and cutting a landslide image therein by a visual recognition method to obtain 530 landslide images. The disclosed remote sensing image landslide data set adopts a Pichia landslide data set manufactured by Ji et al, and the total of the landslide image data is 770, and the total of the non-landslide image data is 2003.

And (3) taking landslide image data and non-landslide image data as classification standard reorganization data sets, wherein 1300 pieces of reorganized data and 2003 pieces of non-landslide image data are used.

Step (2): carrying out data enhancement on the recombined data set to construct an enhanced data set;

each time the cut mix data is enhanced, 1 image is selected from each of the landslide image dataset and the non-landslide image dataset, and image superposition and splicing are carried out by a cut mix data enhancement method, as shown in fig. 8; selecting an image from the landslide image data set, wherein the aspect ratio of the image is more than or equal to 2;

the embodiment uses 200 landslide image data and 200 non-landslide image data to carry out CutMix data enhancement, and the obtained image data after 200 CutMix data enhancement are taken as a CutMix enhancement data set together;

each time the mosaics data are enhanced, 1 image is selected from the landslide image data set, 3 images are taken from the non-landslide image data set, and image stitching is carried out through the mosaics data enhancement method, as shown in fig. 9; wherein, selecting the image length-width ratio from the landslide image data set is smaller than 2;

in the embodiment, the mobile phone uses 200 landslide image data and 800 non-landslide image data to enhance the mobile phone, and the obtained image data enhanced by the 200 mobile phone are taken as a mobile phone enhancement data set together;

and combining 200 image data enhanced by CutMix and 200 image data enhanced by Mosaic with 900 landslide image data which is not used for CutMix, mosaic enhancement, and then adopting random horizontal overturn for the image data to complete the construction of the enhanced data set.

labeling the 1300 enhanced data sets constructed in the step (2) by using a Labelme tool to obtain 1300 landslide image files and corresponding 1300 landslide labeling files (Json). 1300 sets of enhanced data are combined according to the training set: dividing the verification set into 8:2 proportion, converting the verification set into a COCO data set format, and generating a landslide sample set. The landslide sample set includes: 1300 landslide image files, training set annotation files comprise annotation files of 1040 images, and verification set annotation files comprise annotation files of 260 images. The training set is used for model training, the verification set is used for obtaining model verification indexes, and test reasoning is also performed in the verification set.

1. the feature extraction module is replaced by a ResNeSt structure, and the channel attention feature and the multipath structure are introduced to improve the feature extraction structure performance;

2. the feature fusion module is replaced by an RFP structure, and a feature layer with the number of channels of [256, 512, 1024, 2048] is adopted to access the RFP structure;

3. the PISA-RetinaNet detection head structure is replaced by RPN, and the step length between the PISA-RetinaNet detection head and the RFP structure is [4, 8, 16, 32, 64]The five characteristic layers of the sliding proposal frame are connected, and the position of the sliding proposal frame is output. Detecting the loss function in the PISA-RetinaNet detection headL _{PISA- tinaNetRe} Replacement ofL _RPN ；

4. Inputting the landslide suggestion frames and the feature images with corresponding sizes into an ROI alignment module, and restoring the five groups of scale images and the corresponding suggestion frames to the same size to obtain a landslide region of interest;

5. inputting the same-size image and the suggested frame thereof into a PISA-RetinaNet detection head, classifying and fine-adjusting the landslide region of interest with the size of 7 multiplied by 7 to obtain a prediction frame, and outputting the landslide region of interest with the size of 14 multiplied by 14 to obtain the prediction frame and the mask.

training is divided into two stages, specifically set as:

in the first stage of training, 200 iterative preheating training with initial learning rate of 0.001 is performed by adopting a gradient descent method, so that the initial training stability of the model is improved.

In the second stage of training, the optimizer adopts a random gradient descent method based on momentum, the initial learning rate is 0.02, the momentum is 0.9, the learning rate change method adopts cosine annealing, 100 rounds of training are adopted, and 8 pictures are learned in each batch.

And selecting the weight with the highest score in the last 10 rounds as the weight of the identification model according to the evaluation index BBox AP 50. As shown in table 1, the optimal weights under each optimization are in the verification set of step (3).

In the landslide identification task, due to the resolution of landslide remote sensing images, the spectrum characteristics are complex, the characteristic extraction process is difficult in general, and 8 groups of comparison experiments are set for highlighting the excellent performance of the novel model provided by the invention. Experiment 1 is based on Mask R-CNN model under original ResNet 50; experiment 2 is based on Mask R-CNN model under deepened ResNet 101; experiment 3 is a Mask R-CNN model under network EfficientNet based on commonly used high-performance characteristics; experiment 4 is based on Mask R-CNN model under ResNeSt 101; experiment 5 is accessed into RFP structure on the basis of experiment 3; experiment 6 accesses RFP structure based on experiment 4; experiment 7 is based on experiment 6 to replace RPN with FCOS target detection head; experiment 8 is based on experiment 6, replacing RPN with PISA-RetinaNet detection head; experiments 9-10 are landslide identification experiments performed by a Cascade R-CNN series model under the condition of the same data set; experiments 11-12 are landslide identification experiments performed by a YOLOX series model under the condition of the same data set, wherein the YOLOX model only comprises an identification structure and does not participate in mask precision comparison, and the experimental results are shown in table 2.

As can be seen from comparison of experiments 1 and 2, network performance improvement is realized by increasing network depth, the effect is more general, and in a landslide identification task, compared with corresponding indexes of an experiment 1 model, the corresponding indexes of an experiment 2 model are respectively improved by 0.6%, 0.8%, 0.6% and 0.7%.

As can be seen from comparison of experiments 2, 3 and 4, the improvement of the effect of the feature extraction model is that although the method for optimizing the feature extraction model can generate better recognition effect on landslide recognition results, the improvement is still limited, and the ResNeSt model can obtain the greatest optimization in the models, but compared with the experiment 1, the corresponding indexes are improved by only 1.7%, 3.9%, 2.5% and 4.4%.

As can be seen from comparison of experiments 3, 4, 5 and 6, after the RFP feature fusion structure is added, the feedback connection structure returns the fusion feature map to the feature extraction network for secondary extraction, so that the strength of the feature extraction network also influences the quality of the whole structure. Compared with the experiment 3 model, the experiment 5 model only achieves smaller improvement, and the corresponding indexes are respectively improved by 0.9%, 0.6%, 0.8% and 0.8%. Compared with the experiment 4 model, the experiment 6 model achieves larger improvement, and the main reason is that the ResNeSt model has channel attention and multipath structures, the upper limit of the characteristic extraction performance of the model is stronger, and the corresponding indexes are respectively improved by 5%, 1.3%, 6% and 2.4%, so that the combination effect of the ResNeSt and the RFP structure is stronger.

The single-stage target detection head replacing the RPN structure can be seen from comparison of experiments 6, 7 and 8, wherein the experiment 6 is an experiment result under the condition of adopting the RPN structure. The experiment 7 model has serious model performance degradation compared with the experiment 6 model, and the corresponding indexes are respectively reduced by 5.8%, 16.5%, 8.3% and 5.4%, so that the main reason for the phenomenon is that the first proposal frame generation stage of the two-stage target recognition model is connected in series with the second fine tuning stage, and if the first stage is not recognized as a landslide, the fine tuning is not performed in the second stage, so that the performance of the first recognition stage is very important in the whole model. Compared with the experiment 6 model, the PISA-Retinonet replacement structure of the experiment 8 model has stronger stability, so that the PISA-Retinonet replacement structure can be improved to a certain extent in corresponding indexes, and the corresponding indexes are respectively improved by 1.9%, 1.5%, 0.2% and 0.4%, and although the improvement degree is not particularly high, the quantity of the module body generated by the first-stage suggestion frame is considered to be smaller, but the overall performance is improved sufficiently to be comparable with that of an optimized feature extraction network, so that the optimization method can be considered to have stronger improvement effect.

The improved landslide recognition model based on Mask R-CNN has the advantages that in detection performance indexes, the scores of BBox AP50 and BBox AP75 are 91.4% and 63.5%, in segmentation performance indexes, the scores of Mask AP50 and BBox AP75 are 87.4% and 52.6%, and compared with the original Mask R-CNN adopting ResNet50, the corresponding indexes are respectively improved by 7.9%, 5.5%, 8.1% and 6.5%. As can be seen from comparison of experiment 8 with experiments 9, 10, 11 and 12, the performance improvement of the new model in the landslide recognition task exceeds the improvement amount caused by more excellent model structure iteration, so that the method provided by the invention can be considered to bring about huge performance improvement.

Table 2 comparison results

；

In the table, 50 and 101 are the number of layers of the convolutional layer, referring to the ResNet model and the ResNeSt model for different convolutional layer depths.

The recognition result of the new model is shown in fig. 10, and all the samples 1, 2, 3 and 4 can be correctly recognized, and the recognition range is more accurate, thereby eliminating the interference of factors such as roads.

The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and that the above embodiments and descriptions are merely illustrative of the principles of the present invention, and various changes and modifications may be made without departing from the spirit and scope of the invention, which is defined in the appended claims. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims

1. The remote sensing image landslide recognition method based on the deep learning method is characterized by comprising the following steps of:

2. The method for identifying landslide of remote sensing image based on deep learning method of claim 1 wherein the preprocessing in step (1) comprises radiation correction, geometric correction, orthographic correction, geometric registration and image fusion.

3. The method for identifying landslide of remote sensing image based on deep learning method of claim 1, wherein in step (1), RGB three wave bands in the image are extracted from the preprocessed satellite remote sensing image data to form a true color remote sensing image, then the pixel values in the image are stretched to 0-255 by linear stretching, and then landslide areas and non-landslide areas are searched by visual identification method, and landslide images therein are cut out.

4. The method for identifying landslide of remote sensing image based on deep learning method of claim 1, wherein in step (2), the data enhancement adopts a cut mix data enhancement method, a Mosaic data enhancement method and a random flip data enhancement method.

5. The method for identifying landslide of remote sensing image based on deep learning method of claim 4, wherein in step (2):

6. The method for identifying landslide of remote sensing image based on deep learning method of claim 1, wherein in step (3), the dividing ratio of training set and verification set is 8:2.

7. the method for identifying landslide of remote sensing image based on deep learning method as defined in claim 1, wherein in step (4), feedback connection is added in the FPN structure, and feature fusion layer generated in the FPN is extracted by ASPP module to obtain landslide image fusion information, and reduced to the size of feature layer in ResNeSt module, thereby constructing RFP structure.

8. The method for identifying landslide of remote sensing image based on deep learning method of claim 1, wherein in step (4), the integral model loss function in the detection head of the single-stage object detection model PISA-RetinaNet is determinedLThe calculation formula is as follows:

L _{PISA-RetinaNet=} L _{cls 2} + L _box2 +L _carl （3）

L ₌ L _cls + L _box +L _mask +L _{PISA-RetinaNet} （4）