CN118096784A

CN118096784A - Remote sensing image segmentation method and system based on self-adaptive enhancement and fine granularity guidance

Info

Publication number: CN118096784A
Application number: CN202410516951.8A
Authority: CN
Inventors: 王乐凯; 李嵩; 穆珂; 徐勤考; 杨公平; 周海龙
Original assignee: Beiming Chenggong Software Shandong Co ltd
Current assignee: Beiming Chenggong Software Shandong Co ltd
Priority date: 2024-04-28
Filing date: 2024-04-28
Publication date: 2024-05-28
Anticipated expiration: 2044-04-28
Also published as: CN118096784B

Abstract

The invention relates to the technical field of remote sensing image segmentation, in particular to a remote sensing image segmentation method and a remote sensing image segmentation system based on self-adaptive enhancement and fine granularity guiding, wherein the method comprises the following steps: acquiring a remote sensing image to be segmented; preprocessing a remote sensing image to be segmented; inputting the preprocessed remote sensing image into a trained remote sensing image segmentation model to obtain a remote sensing image segmentation result; the trained remote sensing segmentation model adopts an encoder to extract the characteristics of the preprocessed remote sensing image, so as to obtain characteristic diagrams with different scales; performing context self-adaptive enhancement processing on the feature images with different scales by adopting a context self-adaptive enhancement module to obtain an enhanced feature image; adopting a decoder to decode the enhanced feature map and the feature map extracted by the encoder; and carrying out weighted summation on the characteristics of the decoding processing by adopting a fine granularity guiding module to obtain a final prediction characteristic diagram. The invention has very excellent segmentation performance on remote sensing segmentation data of the water transport ship.

Description

Remote sensing image segmentation method and system based on self-adaptive enhancement and fine granularity guidance

Technical Field

The invention relates to the technical field of remote sensing image segmentation, in particular to a remote sensing image segmentation method and system based on self-adaptive enhancement and fine granularity guiding.

Background

The semantic segmentation task of the remote sensing image is one of important branches in the field of image processing, and has very wide application in the fields of environment monitoring, protection, resource management planning and the like. The remote sensing image segmentation of the water-borne channel ships is an important link of offshore monitoring, plays a very important role in marine economy, and can effectively identify the ship transportation and driving conditions in the channel by classifying each pixel point type in the image at the pixel level. Traditional ship segmentation tasks often depend on manual identification and sketching, often have low efficiency due to complex sea conditions and disorder ship distribution, and are not beneficial to completion of downstream tasks; therefore, how to solve the problem of multi-objective segmentation with complex and severe scale variation of ship characteristics, which exists in the same channel and multiple channels, is an important challenge of the current task.

For such segmentation techniques, most of the existing deep learning segmentation methods, such as full convolution networks FCN, U-Net, segNet and the like, have the problems of rough edge segmentation, too much attention to local features, unbalanced sample distribution, poor label quality and the like in the predicted image, so that the segmentation accuracy is limited.

Disclosure of Invention

In order to solve the defects of the prior art, the invention provides a remote sensing image segmentation method and a remote sensing image segmentation system based on self-adaptive enhancement and fine granularity guidance; by combining CNN and a transducer, a context self-adaptive enhancement module and a fine granularity guiding module are innovatively designed, so that the model has very excellent segmentation performance on remote sensing segmentation data of the water transport ship.

In one aspect, a remote sensing image segmentation method based on adaptive enhancement and fine granularity guidance is provided, including: acquiring a remote sensing image to be segmented; preprocessing a remote sensing image to be segmented; inputting the preprocessed remote sensing image into a trained remote sensing image segmentation model to obtain a remote sensing image segmentation result; firstly, carrying out feature extraction on the preprocessed remote sensing image by adopting an encoder to obtain feature images with different scales; then, adopting a context self-adaptive enhancement module to perform context self-adaptive enhancement processing on the feature images with different scales to obtain an enhanced feature image; then, adopting a decoder to decode the enhanced feature map and the feature map extracted by the encoder; and then, weighting and summing the characteristics of the decoding processing by adopting a fine granularity guiding module to obtain a final prediction characteristic diagram.

In another aspect, a remote sensing image segmentation system based on adaptive enhancement and fine granularity guidance is provided, comprising: an acquisition module configured to: acquiring a remote sensing image to be segmented; a preprocessing module configured to: preprocessing a remote sensing image to be segmented; a segmentation module configured to: inputting the preprocessed remote sensing image into a trained remote sensing image segmentation model to obtain a remote sensing image segmentation result; firstly, carrying out feature extraction on the preprocessed remote sensing image by adopting an encoder to obtain feature images with different scales; then, adopting a context self-adaptive enhancement module to perform context self-adaptive enhancement processing on the feature images with different scales to obtain an enhanced feature image; then, adopting a decoder to decode the enhanced feature map and the feature map extracted by the encoder; and then, weighting and summing the characteristics of the decoding processing by adopting a fine granularity guiding module to obtain a final prediction characteristic diagram.

The technical scheme has the following advantages or beneficial effects: the invention designs a context self-adaptive enhancement module after the CNN-converter encoder. This module enhances key information in the image and effectively integrates detail information and context information captured from the encoder, helping the model to better understand the position and role of the segmented object in the image. This helps the model to exclude disturbances in segmentation, identifying and locating objects more accurately. For example, in a practical scenario, this may enhance the adaptive acquisition capability of contextual internal ship features and external fore-aft similar ship features for the same batch of samples.

Furthermore, a fine-grain steering module is designed after the decoder. In the encoder, some detail information is lost due to continuous downsampling, and simple upsampling cannot recover these details, which is however particularly important for recovering the final segmentation result. By fine granularity guidance, the model can pay more attention to tiny details and local features in the image, so that the boundary and detail information lost in the encoder stage is compensated. In addition, for complex scenes, the module can help the model filter noise and irrelevant information by matching with the labels in the training stage, and guide the model to focus on the real segmentation target. The method has good performance on the remote sensing image segmentation dataset of the water-borne channel ship with uneven data distribution and complex constitution, so as to overcome the problems of most application scenes in practical application and improve the segmentation efficiency.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the invention.

Fig. 1 is a flow chart of a method according to a first embodiment.

Fig. 2 is a schematic diagram of an internal structure of a trained remote sensing image segmentation model according to the first embodiment.

Fig. 3 is a schematic diagram of an internal structure of a context adaptive enhancement module according to the first embodiment.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the invention. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs.

Embodiment one: as shown in fig. 1, the present embodiment provides a remote sensing image segmentation method based on adaptive enhancement and fine granularity guidance, which includes: s101: acquiring a remote sensing image to be segmented; s102: preprocessing a remote sensing image to be segmented; s103: inputting the preprocessed remote sensing image into a trained remote sensing image segmentation model to obtain a remote sensing image segmentation result; firstly, carrying out feature extraction on the preprocessed remote sensing image by adopting an encoder to obtain feature images with different scales; then, adopting a context self-adaptive enhancement module to perform context self-adaptive enhancement processing on the feature images with different scales to obtain an enhanced feature image; then, adopting a decoder to decode the enhanced feature map and the feature map extracted by the encoder; and then, weighting and summing the characteristics of the decoding processing by adopting a fine granularity guiding module to obtain a final prediction characteristic diagram.

Further, the step S101: acquiring a remote sensing image to be segmented, comprising: and acquiring a remote sensing image by adopting a sensor.

Further, the step S102: preprocessing the remote sensing image to be segmented, including: and (3) carrying out size adjustment on the remote sensing image to be segmented, and adjusting the remote sensing image to be segmented into a set size. The set dimensions are, for example: 512512。

Further, the step S103: inputting the preprocessed remote sensing image into a trained remote sensing image segmentation model to obtain a remote sensing image segmentation result, wherein the trained remote sensing image segmentation model comprises the following training processes: constructing a data set, wherein the data set is a remote sensing image with known image segmentation results; dividing a data set into a training set, a verification set and a test set according to a set proportion; inputting the training set into a remote sensing image segmentation model, training the model, and stopping training when the total loss function value of the model is not reduced or the iteration number exceeds the set number, so as to obtain a primarily trained remote sensing image segmentation model; inputting the verification set into the primarily trained remote sensing image segmentation model, and verifying the model to obtain a remote sensing image segmentation model passing verification; and inputting the test set into the remote sensing image segmentation model which passes verification, testing the model, and when the test indexes all reach a set threshold value, determining the current model as the final trained remote sensing image segmentation model.

Further, the constructing the dataset includes: according to the collected remote sensing image data of the water-borne channel ship, adjusting the resolution, and uniformly scaling to 512 multiplied by 512; and meanwhile, most data comprise ship images shot on the same water-borne channel within a time period, and the images correspond to the names of the labels one by one.

Further, the dividing the data set into a training set, a verification set and a test set according to a set proportion includes: all data were subjected to 8:1:1, carrying out cross-free training set, verification set and test set division, namely, each set cannot have slices of similar segmented object structures and the like in the same time period, so that the problem of data cross is avoided.

Further, the total loss function of the model is formulated as:；；/> ; wherein, And/>Separately, the Dice loss function/>And BCE (Binary CrossEntropy Loss) loss function/>The weight is set to 0.5 and 0.5; n represents the total number of pixel points, and C represents the total number of categories; /(I)Represents class c, class/>, in the tagThe individual pixels correspond to class values; /(I)Representing the corresponding class c, class/>Predictive probability values for individual pixels; /(I)The smoothing index is used for preventing extreme cases caused by 0 prediction of denominator, and also plays a role in smoothing loss and gradient.

Considering the extensive application range of the model, the method can better divide the ship area with large targets and intense motion changes on the water-borne channel, can also divide the ship structure with small targets and complex contours in the image, and provides a mixed division loss functionThe function is based on the Dice loss function/>And BCE loss function/>Designing, and weighting the twoAnd/>0.5 And 0.5, respectively.

It should be appreciated that the weighted segmentation total loss function is used during training, the weight is updated using a back propagation algorithm with the loss, and the model segmentation effect is evaluated using a validation set after one round of training, the validation loss is compared and the optimal model weight is saved.

Further, the test index includes:；。

；/>。

; wherein TP represents a positive sample predicted as a positive class, and TN represents a negative sample predicted as a negative class; FP represents a negative sample predicted to be a positive class; FN represents positive samples predicted to be negative; x and Y represent prediction and label image sets, respectively; x and Y represent pixel points belonging to the X set and the Y set, respectively.

It should be understood that in the present context,And/>The method is characterized in that the superposition degree of the predicted images is represented, the value is between 0 and 1, and the larger the numerical value is, the better the segmentation effect is; /(I)And/>Respectively representing the maximum value of the minimum distance of all pixel points in the x set and the y set, wherein the smaller the value is, the smaller the overall difference of the pixel points is, and the better the segmentation effect is; /(I)Reflecting the ability of the model to distinguish between positive and negative samples, the larger the value the better.

Further, as shown in fig. 2, S103: inputting the preprocessed remote sensing image into a trained remote sensing image segmentation model to obtain a remote sensing image segmentation result, wherein the trained remote sensing image segmentation model comprises the following components: the system comprises an encoder, a context self-adaptive enhancement module, a decoder and a fine granularity guiding module which are connected in sequence.

Further, the encoder includes: the first convolution layer, the second convolution layer, the third convolution layer, the feature map serialization unit and the plurality of serially connected converger layers are sequentially connected.

Further, the encoder is configured to: simultaneously inputting three continuously photographed images into an encoder; the first convolution layer performs feature extraction on the input image to obtain a first feature map; the second convolution layer performs feature extraction on the first feature map to obtain a second feature map; the third convolution layer performs feature extraction on the second feature map to obtain a third feature map; the feature map serialization unit conducts serialization operation on the third feature map to obtain a feature map patch sequence; and the plurality of serially connected transducer layers are used for extracting the feature map by taking the feature map sequence as a patch, and the last transducer layer outputs a fourth feature map.

Further, the feature map serialization unit is configured to segment the input feature map block by block according to a set size, and obtain a plurality of two-dimensional feature map patch sequences after segmentation.

It should be appreciated that the transducer layer comprises: a multi-head attention mechanism module, a multi-layer perceptron module and two-layer normalization operations. The input feature map patch sequence is processed through a Layer Normalization (LN) operation, then feature screening is carried out through a multi-head Attention mechanism (Multihead Self-Attention, MSA) module, and the output and the original input feature map are added. And carrying out feature extraction on the output feature map obtained through the series of operations through one layer normalization operation and a multi-layer perceptron (Multilayer Perceptron, MLP) in series to obtain a new output feature map, and adding the new output feature map and the feature map obtained before the operation to obtain the final conversion layer output.

The calculation formula is as follows:；/>；

。

Wherein, For the input sequence feature matrix,/>Normalizing normal operations for layers,/>For the regular operation of a multi-head attention mechanism,/>The multi-layer perceptron is operated normally; /(I)，/>The intermediate output and the final output of a transducer layer respectively; /(I)Is a self-attention module in a multi-head attention module,/>、/>、/>Query matrix, key matrix and value matrix corresponding to each input sequence respectively,/>Representing the dimension of the k matrix; /(I)For channel splice operation,/>For each input head in the multi-head attention mechanism,/>Representation/>Is set to 4; Is the corresponding weight matrix.

Further, the transducer layer is configured to: and carrying out feature processing on the remote sensing image with large resolution, combining the embedding of the spatial position information in a mode of image sequence diagram, and further strengthening the modeling capability of the model on global context information in the feature extraction process through self-attention learning.

The plurality of serially connected transducer layers are twelve sequentially serially connected transducer layers. The input of the encoder is three continuous images after data preprocessing, and the data in the group of data come from similar channel scenes and contain rich context information.

It should be appreciated that the first, second and third convolutional layers of the CNN of the encoder are stacked using the first three layers ResNet-50, and then the extracted feature map is serialized and embedded as a patch into the 12-layer transform layer for repeated operations to obtain global information. The encoder can acquire low-level high-resolution local detail information, and makes up for the defect that a transducer lacks low-level boundary, contour and other information; global context information is also added due to the presence of the transducer.

Further, as shown in fig. 3, the context adaptive enhancement module includes: the first, second, third, fourth, fifth, sixth convolution layers, the first, second, and third splice units; the first deformable convolution layer performs feature extraction on the first feature map to obtain a fifth feature map; the second deformable convolution layer performs feature extraction on the second feature map to obtain a sixth feature map; the third deformable convolution layer performs feature extraction on the third feature map to obtain a seventh feature map; the first splicing unit splices the fifth characteristic diagram and the sixth characteristic diagram to obtain an eighth characteristic diagram; the eighth feature map is subjected to feature extraction of the fourth convolution layer to obtain a ninth feature map; the second splicing unit splices the sixth feature map, the seventh feature map and the ninth feature map to obtain a tenth feature map, and the tenth feature map is subjected to feature extraction of a fifth convolution layer to obtain an eleventh feature map; and the third splicing unit carries out splicing treatment on the seventh characteristic diagram and the eleventh characteristic diagram to obtain a twelfth characteristic diagram, and the twelfth characteristic diagram is subjected to characteristic extraction of the sixth convolution layer to obtain a thirteenth characteristic diagram.

Further, the context adaptive enhancement module is configured to: the first feature map, the second feature map and the third feature map are calculated through first, second and third deformable convolutions respectively to obtain a fifth feature map, a sixth feature map and a seventh feature map; the method comprises the steps of up-sampling a fifth feature map and a sixth feature map to the same resolution, and then performing channel splicing on the fifth feature map and the sixth feature map to obtain an eighth feature map; performing feature optimization on the eighth feature map through the fifth convolution layer, changing the channel number into the input size of the original highest-layer feature map to obtain a ninth feature map, and taking the ninth feature map as the first output of the context self-adaptive enhancement module; secondly, the ninth feature map, the sixth feature map and the seventh feature map are subjected to channel splicing, and the size of the ninth feature map, the sixth feature map and the seventh feature map are still scaled to the same resolution through upsampling operation before splicing; after the channels are spliced, a tenth characteristic diagram is obtained; then, correcting the channel number of the tenth characteristic diagram through a fifth convolution layer to obtain an eleventh characteristic diagram, wherein the eleventh characteristic diagram is used as a second output of the context self-adaptive enhancement module; finally, the eleventh characteristic diagram and the seventh characteristic diagram are subjected to channel splicing to obtain a twelfth characteristic diagram; and correcting the channel number through a sixth convolution layer to obtain a thirteenth feature map, wherein the thirteenth feature map is used as the last output of the context self-adaptive enhancement module.

The interaction and extraction of the context information are realized through the series of compact feature map channel splicing and convolution operations. Further, the context adaptive enhancement module has the following calculation formula:，，/>。

。

Wherein, 、/>、/>Three original inputs are respectively input, and then the three original inputs are updated after being subjected to deformable processing; /(I)、/>、/>The last three outputs of the module are respectively; /(I)A 3 x 3 deformable convolution operation; /(I)The characteristic diagram is spliced according to the channel dimension; /(I)Representing an upsampling operation,Representing a downsampling operation, both linear interpolation to achieve scaling of resolution.

It should be appreciated that the above solution is to provide a context adaptive enhancement module to improve the adaptive extraction capability of the module for different scale ship features, considering the problem of different sizes of ships and large differences in different ship types in the water-borne channel. The context adaptive enhancement module receives three feature maps from three different depth convolutional layers of the encoder, respectively, and performs adaptive feature extraction by 3×3 deformable convolution (DCN, deformable Convolution Network).

The deformable convolution is a novel convolution with a new offset offfset based on the traditional convolution, and learning is completed through an additional convolution, so that the purpose of adaptively changing the position of a sampling point is achieved. The deformable convolution is adaptive to the extracted non-grid features through the learned convolution kernel offset, and can be more close to the shape and the size of the ship during sampling, so that the deformable convolution is more suitable for remote sensing image features with larger variation and complex distribution compared with the common convolution.

Further, the decoder includes: the first upsampling layer, the second upsampling layer, the third upsampling layer and the fourth upsampling layer are parallel; the first upsampling layer is configured to upsample the ninth feature map to obtain a fourteenth feature map; the second upsampling layer is configured to upsample the eleventh feature map to obtain a fifteenth feature map; the third upsampling layer is configured to upsample the thirteenth feature map to obtain a sixteenth feature map; and the fourth upsampling layer is used for upsampling the fourth characteristic diagram to obtain a seventeenth characteristic diagram.

It should be understood that the decoder is a U-Net decoder, and the jump connection feature map transmitted from the context adaptive enhancement module and the feature map transmitted from the deepest layer transducer at the decoder are respectively subjected to four upsampling operations to obtain four corresponding feature maps.

Finally, after the original image resolution is scaled to be the same as the original image resolution, weights are respectively given by four fine-granularity guide branches、/>、/>、/>And integrating the final prediction characteristic map as output.

The fine granularity guiding module, the working process includes: ; wherein/> A final prediction feature map; /(I)、/>、/>、/>Weighting the weight given to the corresponding feature map by 0.25; /(I)In order to make the drawing a fourteenth feature map,For the fifteenth feature map,/>For the sixteenth feature map,/>A seventeenth feature map.

It should be appreciated that, in consideration of the problems of blurred boundary contours and insufficient imaging of the ship in the remote sensing image due to weather, ship speed and other factors, the invention provides a fine-granularity guiding module, which further corrects the extracted fine-granularity characteristics of the ship through a plurality of fine-granularity guiding branches. The fine granularity guiding module carries out weight assignment and final prediction feature map calculation after up-sampling the four feature maps transmitted from the deepest layer of the encoder and the context self-adaptive enhancement module through a decoder.

In order to fully utilize deep and shallow features, reduce dimension difference and improve the attention capability of a model to deep semantic information and shallow fine granularity information (such as detailed features of a micro-ship, a ship outline and the like), a, b, c and d are taken as 0.25,0.25,0.25,0.25 respectively. The final output of the model is thus scaled to the average of four different levels of feature maps to the same resolution size to mitigate gradient differences.

The invention provides a context self-adaptive enhancement and fine granularity guided remote sensing segmentation method based on a transducer, which is characterized in that a context self-adaptive enhancement module is designed after a CNN-transducer encoder, the limitation of receptive fields in an inherent segmentation network is broken, and detail information and context information are fused in a more optimal way in a self-adaptive feature extraction way. In the decoder branch, a fine granularity guiding module is designed to compensate the lost detail information of the continuous downsampling of the encoder stage, so as to guide the model to align to the segmentation target. This approach solves the problem of poor segmentation capability in most networks when segmenting a small vessel from multiple vessel features that have complex structural boundaries. Compared with the traditional segmentation method and the common deep learning segmentation network, the method has better segmentation performance and good performance on the remote sensing segmentation task of the water-borne channel ship.

Embodiment two: the embodiment provides a remote sensing image segmentation system based on adaptive enhancement and fine granularity guidance, which comprises the following steps: an acquisition module configured to: acquiring a remote sensing image to be segmented; a preprocessing module configured to: preprocessing a remote sensing image to be segmented; a segmentation module configured to: inputting the preprocessed remote sensing image into a trained remote sensing image segmentation model to obtain a remote sensing image segmentation result; firstly, carrying out feature extraction on the preprocessed remote sensing image by adopting an encoder to obtain feature images with different scales; then, adopting a context self-adaptive enhancement module to perform context self-adaptive enhancement processing on the feature images with different scales to obtain an enhanced feature image; then, adopting a decoder to decode the enhanced feature map and the feature map extracted by the encoder; and then, weighting and summing the characteristics of the decoding processing by adopting a fine granularity guiding module to obtain a final prediction characteristic diagram. It should be understood that the details of each step in embodiment two are consistent with embodiment one.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, but various modifications and variations can be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. The remote sensing image segmentation method based on self-adaptive enhancement and fine granularity guidance is characterized by comprising the following steps of:

Acquiring a remote sensing image to be segmented;

preprocessing a remote sensing image to be segmented;

Inputting the preprocessed remote sensing image into a trained remote sensing image segmentation model to obtain a remote sensing image segmentation result; firstly, carrying out feature extraction on the preprocessed remote sensing image by adopting an encoder to obtain feature images with different scales; then, adopting a context self-adaptive enhancement module to perform context self-adaptive enhancement processing on the feature images with different scales to obtain an enhanced feature image; then, adopting a decoder to decode the enhanced feature map and the feature map extracted by the encoder; and then, weighting and summing the characteristics of the decoding processing by adopting a fine granularity guiding module to obtain a final prediction characteristic diagram.

2. The remote sensing image segmentation method based on self-adaptive enhancement and fine granularity guidance according to claim 1, wherein the pre-processed remote sensing image is input into a trained remote sensing image segmentation model to obtain a remote sensing image segmentation result, and the training process of the trained remote sensing image segmentation model comprises the following steps:

Constructing a data set, wherein the data set is a remote sensing image with known image segmentation results;

Dividing the data set into a training set, a verification set and a test set according to a set proportion;

inputting the training set into a remote sensing image segmentation model, training the model, and stopping training when the total loss function value of the model is not reduced or the iteration number exceeds the set number, so as to obtain a primarily trained remote sensing image segmentation model;

inputting the verification set into the primarily trained remote sensing image segmentation model, and verifying the model to obtain a remote sensing image segmentation model passing verification;

And inputting the test set into the remote sensing image segmentation model which passes verification, testing the model, and when the test indexes all reach a set threshold value, determining the current model as the final trained remote sensing image segmentation model.

3. The adaptive enhancement and fine granularity guided remote sensing image segmentation method of claim 2, wherein the total loss function of the model is formulated as:

；

；/>；

Wherein, And/>Separately, the Dice loss function/>And BCE loss function/>The weight is occupied; n represents the total number of pixel points, and C represents the total number of categories; /(I)Represents class c, class/>, in the tagThe individual pixels correspond to class values; /(I)Representing the corresponding class c, class/>Predictive probability values for individual pixels; /(I)Is a smooth index.

4. The remote sensing image segmentation method based on adaptive enhancement and fine granularity guidance according to claim 2, wherein the test index comprises:

；

Wherein TP represents a positive sample predicted as a positive class, and TN represents a negative sample predicted as a negative class; FP represents a negative sample predicted to be a positive class; FN represents positive samples predicted to be negative; x and Y represent prediction and label image sets, respectively; And/> Pixels belonging to the X-set and the Y-set are represented, respectively.

5. The remote sensing image segmentation method based on self-adaptive enhancement and fine granularity guidance according to claim 1, wherein the pre-processed remote sensing image is input into a trained remote sensing image segmentation model to obtain a remote sensing image segmentation result, and the trained remote sensing image segmentation model comprises:

The system comprises an encoder, a context self-adaptive enhancement module, a decoder and a fine granularity guiding module which are connected in sequence;

The encoder includes: the first convolution layer, the second convolution layer, the third convolution layer, the feature map serialization unit and the plurality of serially connected Transformer layers are sequentially connected;

The encoder is used for: simultaneously inputting three continuously photographed images into an encoder;

The first convolution layer performs feature extraction on the input image to obtain a first feature map;

the second convolution layer performs feature extraction on the first feature map to obtain a second feature map;

the third convolution layer performs feature extraction on the second feature map to obtain a third feature map;

the feature map serialization unit conducts serialization operation on the third feature map to obtain a feature map patch sequence;

And the plurality of serially connected transducer layers are used for extracting the feature map by taking the feature map sequence as a patch, and the last transducer layer outputs a fourth feature map.

6. The remote sensing image segmentation method based on adaptive enhancement and fine granularity guidance according to claim 1 or 5, wherein the context adaptive enhancement module comprises:

The first, second, third, fourth, fifth, sixth convolution layers, the first, second, and third splice units;

the first deformable convolution layer performs feature extraction on the first feature map to obtain a fifth feature map;

The second deformable convolution layer performs feature extraction on the second feature map to obtain a sixth feature map;

The third deformable convolution layer performs feature extraction on the third feature map to obtain a seventh feature map;

the first splicing unit splices the fifth characteristic diagram and the sixth characteristic diagram to obtain an eighth characteristic diagram; the eighth feature map is subjected to feature extraction of the fourth convolution layer to obtain a ninth feature map;

the second splicing unit splices the sixth feature map, the seventh feature map and the ninth feature map to obtain a tenth feature map, and the tenth feature map is subjected to feature extraction of a fifth convolution layer to obtain an eleventh feature map;

And the third splicing unit carries out splicing treatment on the seventh characteristic diagram and the eleventh characteristic diagram to obtain a twelfth characteristic diagram, and the twelfth characteristic diagram is subjected to characteristic extraction of the sixth convolution layer to obtain a thirteenth characteristic diagram.

7. The remote sensing image segmentation method based on adaptive enhancement and fine granularity guidance according to claim 1 or 5, wherein the context adaptive enhancement module is configured to:

The first feature map, the second feature map and the third feature map are calculated through first, second and third deformable convolutions respectively to obtain a fifth feature map, a sixth feature map and a seventh feature map;

The method comprises the steps of up-sampling a fifth feature map and a sixth feature map to the same resolution, and then performing channel splicing on the fifth feature map and the sixth feature map to obtain an eighth feature map; performing feature optimization on the eighth feature map through the fifth convolution layer, changing the channel number into the input size of the original highest-layer feature map to obtain a ninth feature map, and taking the ninth feature map as the first output of the context self-adaptive enhancement module;

Secondly, the ninth feature map, the sixth feature map and the seventh feature map are subjected to channel splicing, and the size of the ninth feature map, the sixth feature map and the seventh feature map are still scaled to the same resolution through upsampling operation before splicing; after the channels are spliced, a tenth characteristic diagram is obtained;

Then, correcting the channel number of the tenth characteristic diagram through a fifth convolution layer to obtain an eleventh characteristic diagram, wherein the eleventh characteristic diagram is used as a second output of the context self-adaptive enhancement module;

Finally, the eleventh characteristic diagram and the seventh characteristic diagram are subjected to channel splicing to obtain a twelfth characteristic diagram; and correcting the channel number through a sixth convolution layer to obtain a thirteenth feature map, wherein the thirteenth feature map is used as the last output of the context self-adaptive enhancement module.

8. The adaptive enhancement and fine granularity guided remote sensing image segmentation method according to claim 1 or 5, wherein the decoder comprises: the first upsampling layer, the second upsampling layer, the third upsampling layer and the fourth upsampling layer are parallel;

The first upsampling layer is configured to upsample the ninth feature map to obtain a fourteenth feature map;

The second upsampling layer is configured to upsample the eleventh feature map to obtain a fifteenth feature map;

The third upsampling layer is configured to upsample the thirteenth feature map to obtain a sixteenth feature map;

And the fourth upsampling layer is used for upsampling the fourth characteristic diagram to obtain a seventeenth characteristic diagram.

9. The remote sensing image segmentation method based on adaptive enhancement and fine granularity guidance according to claim 1 or 5, wherein the fine granularity guidance module comprises:

；

Wherein, A final prediction feature map; /(I)、/>、/>、/>Weights assigned to the corresponding feature maps; /(I)For the fourteenth feature map,/>For the fifteenth feature map,/>For the sixteenth feature map,/>A seventeenth feature map.

10. The remote sensing image segmentation system based on self-adaptive enhancement and fine granularity guidance is characterized by comprising the following components:

an acquisition module configured to: acquiring a remote sensing image to be segmented;

A preprocessing module configured to: preprocessing a remote sensing image to be segmented;

A segmentation module configured to: inputting the preprocessed remote sensing image into a trained remote sensing image segmentation model to obtain a remote sensing image segmentation result; firstly, carrying out feature extraction on the preprocessed remote sensing image by adopting an encoder to obtain feature images with different scales; then, adopting a context self-adaptive enhancement module to perform context self-adaptive enhancement processing on the feature images with different scales to obtain an enhanced feature image; then, adopting a decoder to decode the enhanced feature map and the feature map extracted by the encoder; and then, weighting and summing the characteristics of the decoding processing by adopting a fine granularity guiding module to obtain a final prediction characteristic diagram.