CN115797633A

CN115797633A - Remote sensing image segmentation method, system, storage medium and electronic equipment

Info

Publication number: CN115797633A
Application number: CN202211542414.8A
Authority: CN
Inventors: 许乐乐; 李叶; 徐金中; 郭丽丽
Original assignee: Technology and Engineering Center for Space Utilization of CAS
Current assignee: Technology and Engineering Center for Space Utilization of CAS
Priority date: 2022-12-02
Filing date: 2022-12-02
Publication date: 2023-03-14
Anticipated expiration: 2042-12-02
Also published as: CN115797633B

Abstract

The invention discloses a remote sensing image segmentation method, a system, a storage medium and electronic equipment, comprising: constructing a first remote sensing image segmentation model comprising a convolution feature extraction network, an edge semantic auxiliary network, a transform network combining edge enhancement and Gaussian position coding and a segmentation network; the convolution characteristic extraction network, the edge semantic auxiliary network and the segmentation network are respectively connected with a transform network combining edge enhancement and Gaussian position coding; training the first remote sensing image segmentation model based on a plurality of remote sensing image samples to obtain a second remote sensing image segmentation model, and deleting an edge semantic auxiliary network in the second remote sensing image segmentation model to obtain a target remote sensing image segmentation model; and inputting the remote sensing image to be detected into the target remote sensing image segmentation model to obtain a target image segmentation result of the remote sensing image to be detected. The invention improves the fine and accurate segmentation capability under the condition of dense target distribution in the image.

Description

Remote sensing image segmentation method, system, storage medium and electronic equipment

Technical Field

The invention relates to the technical field of image processing, in particular to a remote sensing image segmentation method, a remote sensing image segmentation system, a storage medium and electronic equipment.

Background

In high spatial resolution optical remote sensing images, there are abundant texture details. The ground features in the image are densely distributed, for example, houses are closely adjacent, trees grow densely and cover the houses in a shielding manner, and the like, so that the target edge information is seriously lost. Meanwhile, the interference effect of complex scene information such as illumination, shadow and the like in the image is obvious, which also brings great challenges to the fine and accurate segmentation of the image.

The attention mechanism technology is widely applied to remote sensing image segmentation and achieves remarkable effect. Recently, the transform model has received more and more attention in the field of computer vision because of its advantages in global information extraction. However, when facing an optical remote sensing image in a complex scene such as dense target distribution, edge information is lost more, and the segmentation accuracy needs to be improved.

Therefore, it is desirable to provide a technical solution to solve the above technical problems.

Disclosure of Invention

In order to solve the technical problems, the invention provides a remote sensing image segmentation method, a remote sensing image segmentation system, a storage medium and electronic equipment.

The technical scheme of the remote sensing image segmentation method is as follows:

constructing a first remote sensing image segmentation model comprising a convolution feature extraction network, an edge semantic auxiliary network, a transform network combining edge enhancement and Gaussian position coding and a segmentation network; the convolutional feature extraction network, the edge semantic auxiliary network and the segmentation network are respectively connected with the transform network combining edge enhancement and Gaussian position coding;

training the first remote sensing image segmentation model based on a plurality of remote sensing image samples to obtain a second remote sensing image segmentation model, and deleting the edge semantic auxiliary network in the second remote sensing image segmentation model to obtain a target remote sensing image segmentation model;

and inputting the remote sensing image to be detected into the target remote sensing image segmentation model to obtain a target image segmentation result of the remote sensing image to be detected.

The remote sensing image segmentation method has the following beneficial effects:

according to the method, the remote sensing image is segmented through the convolution feature extraction network, the transform network combining edge enhancement and Gaussian position coding and the segmentation network, and the fine and accurate segmentation capability under the condition that the targets in the image are densely distributed is improved.

On the basis of the scheme, the remote sensing image segmentation method can be further improved as follows.

Further, the method also comprises the following steps:

and obtaining a plurality of remote sensing image samples, and labeling at least two categories in any remote sensing image sample to obtain a semantic labeled image corresponding to the remote sensing image sample until obtaining the semantic labeled image corresponding to each remote sensing image sample.

Further, the step of training the first remote sensing image segmentation model based on the plurality of remote sensing image samples to obtain a second remote sensing image segmentation model comprises:

inputting any remote sensing image sample into the convolution feature extraction network to obtain an initial feature map corresponding to the remote sensing image sample, performing edge extraction on a semantic annotation image corresponding to the remote sensing image sample to obtain a first edge image corresponding to the remote sensing image sample, and inputting the first edge image into the edge semantic auxiliary network to obtain an edge semantic feature map corresponding to the remote sensing image sample;

inputting an initial characteristic diagram and an edge semantic characteristic diagram corresponding to any remote sensing image sample into the transform network combining edge enhancement and Gaussian position coding to obtain and input an enhanced characteristic diagram corresponding to the remote sensing image sample into the segmentation network to obtain a first image segmentation result of the remote sensing image sample;

obtaining a loss value of each remote sensing image sample according to a first image segmentation result and a semantic annotation image corresponding to any remote sensing image sample until a loss value of each remote sensing image sample is obtained;

and optimizing the first remote sensing image segmentation model based on all loss values to obtain an optimized remote sensing image segmentation model, taking the optimized remote sensing image segmentation model as the first remote sensing image segmentation model, returning to execute the step of inputting any remote sensing image sample into the convolution feature extraction network, and determining the optimized remote sensing image segmentation model as the second remote sensing image segmentation model until preset iteration training conditions are met.

Further, the convolutional feature extraction network comprises: at least one first build-up layer; the step of inputting any remote sensing image sample into the convolution feature extraction network to obtain the initial feature map corresponding to the remote sensing image sample comprises the following steps:

and inputting any remote sensing image sample into the convolution feature extraction network to carry out feature extraction through each first convolution layer respectively to obtain an initial feature map corresponding to the remote sensing image sample.

Further, the edge semantic assistance network includes: the edge vectors, the non-edge vectors and the edge semantic layer are sequentially connected; inputting a first edge image corresponding to any remote sensing image sample into the edge semantic auxiliary network to obtain an edge semantic feature map corresponding to the remote sensing image sample, wherein the step comprises the following steps:

inputting a first edge image, an edge vector and a non-edge vector corresponding to any remote sensing image sample into the edge semantic layer for feature extraction to obtain an edge semantic feature map corresponding to the remote sensing image sample.

Further, the transform network combining edge enhancement and gaussian position coding comprises: at least one edge position transformer module, each edge position transformer module comprising: the device comprises a fusion layer, a position coding layer, a first additive layer, a multi-head attention layer, a second additive layer, a full-connection layer and a third additive layer which are arranged in sequence; inputting the initial feature map and the edge semantic feature map corresponding to any remote sensing image sample into the transform network combining edge enhancement and Gaussian position coding to obtain an enhanced feature map corresponding to the remote sensing image sample, wherein the step comprises the following steps:

inputting the initial characteristic diagram and the edge semantic characteristic diagram corresponding to any remote sensing image sample into a fusion layer of a first edge position transformer module for fusion to obtain a first fusion characteristic diagram corresponding to the remote sensing image sample, and performing position coding on each pixel point in the first fusion characteristic diagram corresponding to the remote sensing image sample through a position coding layer of the first edge position transformer module to obtain two-dimensional position coding information of the first fusion characteristic diagram;

inputting a first fusion characteristic diagram corresponding to any remote sensing image sample and two-dimensional position coding information of the first fusion characteristic diagram into a first addition layer of the first edge position transformer module for addition to obtain a first intermediate characteristic diagram corresponding to the remote sensing image sample;

and inputting a first middle feature map corresponding to any remote sensing image sample into a multi-head attention layer, a second additive layer, a full-link layer and a third additive layer of a first edge position transform module which are sequentially connected for processing to obtain a second middle feature map corresponding to the remote sensing image sample, taking the second middle feature map as an initial feature map of a next edge position transform module until the second middle feature map is processed by all the edge position transform modules to obtain an enhanced feature map corresponding to the remote sensing image sample.

The beneficial effect of adopting the further technical scheme is that: the method can further make full use of the enhanced target edge information and the two-dimensional position information in the network, and enhance the training of the remote sensing image segmentation model so as to improve the fine and accurate segmentation capability under the condition of dense target distribution in the image.

Further, the split network includes: at least one second convolutional layer; inputting the enhanced feature map corresponding to any remote sensing image sample into the segmentation network to obtain a first image segmentation result of the remote sensing image sample, wherein the step comprises the following steps:

and inputting the enhanced feature map corresponding to any remote sensing image sample into the segmentation network to perform feature extraction through each second convolution layer respectively to obtain a first image segmentation result of the remote sensing image sample.

The technical scheme of the remote sensing image segmentation system is as follows:

the method comprises the following steps: the system comprises a model construction module, a model training module and an image segmentation module;

the model building module is configured to: constructing a first remote sensing image segmentation model comprising a convolution feature extraction network, an edge semantic auxiliary network, a transform network combining edge enhancement and Gaussian position coding and a segmentation network; wherein the convolutional feature extraction network, the edge semantic auxiliary network and the segmentation network are respectively connected with the transform network combining edge enhancement and Gaussian position coding;

the model training module is configured to: training the first remote sensing image segmentation model based on a plurality of remote sensing image samples to obtain a second remote sensing image segmentation model, and deleting the edge semantic auxiliary network in the second remote sensing image segmentation model to obtain a target remote sensing image segmentation model;

the image segmentation module is to: and inputting the remote sensing image to be detected into the target remote sensing image segmentation model to obtain a target image segmentation result of the remote sensing image to be detected.

The remote sensing image segmentation system has the following beneficial effects:

the system of the invention segments the remote sensing image through the convolution feature extraction network, the transform network combining edge enhancement and Gaussian position coding and the segmentation network, thereby improving the fine and accurate segmentation capability under the condition of dense target distribution in the image.

The technical scheme of the storage medium of the invention is as follows:

the storage medium has stored therein instructions which, when read by a computer, cause the computer to carry out the steps of a method of remote sensing image segmentation in accordance with the invention.

The technical scheme of the electronic equipment is as follows:

comprising a memory, a processor and a computer program stored on the memory and being executable on the processor, characterized in that the processor, when executing the computer program, causes the computer to carry out the steps of a method for remote sensing image segmentation according to the invention.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a remote sensing image segmentation method provided by the present invention;

FIG. 2 is a flow chart illustrating step 120 of an embodiment of a method for segmenting a remote sensing image according to the present invention;

FIG. 3 is a first schematic structural diagram of a first remote sensing image segmentation model in an embodiment of a remote sensing image segmentation method provided by the invention;

FIG. 4 is a second schematic structural diagram of the first remote sensing image segmentation model in the embodiment of the remote sensing image segmentation method provided by the invention;

fig. 5 shows a schematic structural diagram of an embodiment of a remote sensing image segmentation system provided by the invention.

Detailed Description

Fig. 1 shows a schematic flow chart of a remote sensing image segmentation method according to a first embodiment of the present invention. As shown in fig. 1, the method comprises the steps of:

step 110: and constructing a first remote sensing image segmentation model comprising a convolution feature extraction network N1, an edge semantic auxiliary network N2, a transform network N3 combining edge enhancement and Gaussian position coding and a segmentation network N4.

Wherein, (1) the convolutional feature extraction network N1, the edge semantic auxiliary network N2, and the segmentation network N4 are respectively connected to the transform network N3 that combines edge enhancement and gaussian position coding. (2) The convolution feature extraction network N1 is configured to: an initial feature map with local context information is extracted. (3) The edge semantic auxiliary network N2 is used to: and acquiring an edge semantic feature map containing rich semantic information of the target edge. (4) The transform network N3, which combines edge enhancement with gaussian position coding, is used to: based on the initial feature map, the edge semantic feature map and the two-dimensional position coding vector, an enhanced feature map which has edge semantic enhancement and contains rich global information is extracted. (5) The split network N4 is used to: and acquiring an image segmentation result based on the enhanced feature map. (6) The first remote sensing image segmentation model is as follows: and (5) a remote sensing image segmentation model to be trained.

Step 120: training the first remote sensing image segmentation model based on a plurality of remote sensing image samples to obtain a second remote sensing image segmentation model, and deleting the edge semantic auxiliary network N2 in the second remote sensing image segmentation model to obtain a target remote sensing image segmentation model.

Wherein, (1) the remote sensing image sample is: and the randomly acquired remote sensing image is used for training the first remote sensing image segmentation model. (2) The second remote sensing image segmentation model comprises the following steps: and the trained first remote sensing image segmentation model. (3) The target remote sensing image segmentation model is as follows: deleting the edge semantic auxiliary network N2 in the trained first remote sensing image segmentation model to obtain a model, wherein the model comprises the following steps: the method comprises a convolution feature extraction network N1, a transform network N3 combined with edge enhancement and Gaussian position coding and a segmentation network N4 which are sequentially connected.

Step 130: and inputting the remote sensing image to be detected into the target remote sensing image segmentation model to obtain a target image segmentation result of the remote sensing image to be detected.

Wherein, (1) the remote sensing image to be measured is: and (4) randomly selecting the remote sensing image. (2) The target image segmentation result comprises: and (5) carrying out multi-class segmentation on the remote sensing image to be detected.

Preferably, the method further comprises the following steps:

Wherein, the semantic annotation image is: remote sensing images comprising at least two categories.

It should be noted that the process of performing category labeling on the remote sensing image sample is as follows: at least two categories for distinguishing pixels are preset. For example, at least the object to be recognized and the background may be set to 2 categories. It can also be set as the object A, the object B and the background which need to be identified, and set as 3 categories. Taking building identification as an example, building and background labeling can be performed on each pixel in the remote sensing image, the building is identified, and then each pixel is labeled as a building or a background.

Preferably, as shown in fig. 2, step 120 includes:

step 121: inputting any remote sensing image sample into the convolution feature extraction network N1 to obtain an initial feature map corresponding to the remote sensing image sample, performing edge extraction on a semantic annotation image corresponding to the remote sensing image sample to obtain a first edge image corresponding to the remote sensing image sample, and inputting the first edge image into the edge semantic auxiliary network N2 to obtain an edge semantic feature map corresponding to the remote sensing image sample.

Wherein, (1) the convolution feature extraction network N1 may include: first convolution layer { C _i ¹ }(i∈{1,…,n _c ¹ },n _c ¹ Not less than 1); i is a variable, n _c ¹ Indicating the number of layers of the first buildup layer. (2) The initial characteristic diagram is as follows: the remote sensing image sample passes through the first convolution layer { C _i ¹ And (6) carrying out feature extraction to obtain a feature map. (3) The process of extracting the edge of the semantic annotation image to obtain the first edge image is the prior art and is not described herein in any greater detail. (4) The edge semantic assisting network N2 may include: edge vector v _e Non-edge vector v _ne Edge semantic layer { E _i ² }(i∈{1,…,n _e ² },n _e ² ≧ 1), gated layer { G _i ² }(i∈{1,…,n _g ² },n _g ² Not less than 0); i is a variable, n _e ² Number of layers, n, representing edge semantic layer _g ² Indicating the number of gated layers. Edge vector v _e And a non-edge vector v _ne Is a learnable vector. (5) The edge semantic feature map is: and (3) processing the first edge image by an edge semantic auxiliary network N2 to obtain a feature map.

It should be noted that: (1) edge semantic layer E in edge semantic assisted network N2 _i The input is a first edge map and an edge vector v _e And a non-edge vector v _ne The output is an edge semantic feature map E _if . Firstly, performing edge extraction on a semantic annotation image corresponding to an input image to obtain a first edge image; then, performing expansion operation on the first edge image to obtain an edge expansion image; then, it is determined whether the pixel (i, j) in the edge-expanded image is 0, and if 0, the non-edge vector v is used _ne For the initial edge semantic feature map E _if Is assigned and if not 0, the edge vector v is used _e For the initial edge semantic feature map E _if The pixel (i, j) of (a) is assigned; based on this, until the initial edge semantic feature map E _if After each pixel in the graph is assigned, a required edge semantic feature graph E can be obtained _if . (2) Gating layer G in edge semantic assistance network N2 _i For updating edge semantic feature map E _if The input is an edge expansion image and an edge semantic feature map E _if And a feature map EGTB output by an edge position transformer module in a transformer network N3 combining edge enhancement and Gaussian position coding _if The output is an updated edge semantic feature map E _(i+1)f . Firstly, judging whether a pixel (i, j) in the edge expansion image is 0, if so, not updating the value of the pixel (i, j) in the edge semantic feature map, and if not, then E is judged _if And EGTB _if Adding the values of the middle pixels (i, j) to obtain an updated edge semantic feature map E _(i+1)f The value of pixel (i, j); based on the above, the updated edge semantic feature graph E can be obtained _(i+1)f 。

Step 122: inputting the initial characteristic graph and the edge semantic characteristic graph corresponding to any remote sensing image sample into the transform network N3 combining edge enhancement and Gaussian position coding to obtain an enhanced characteristic graph corresponding to the remote sensing image sample, and inputting the enhanced characteristic graph corresponding to the remote sensing image sample into the segmentation network N4 to obtain a first image segmentation result of the remote sensing image sample.

Wherein, (1) the transform network N3 combining edge enhancement and gaussian position coding may include: edge position transformer module { EGTB _i ³ }(i∈{1，...，n _egtb ³ }，n _egtb ³ ≧ 1), downsampled layer { D _i ³ }(i∈{1，...，n _d ³ }，nd ³ Not less than 0); i is a variable, n _egtb ³ Indicates the number of edge position transform modules, n _d ³ The number of layers of the down-sampling layer is indicated. Each edge position transformer module comprises a fusion layer M, a position encoding layer P, a multi-head attention layer MA, a full link layer FC, and an addition layer { a first addition layer A ₁ Second additional layer A ₂ Third phase addition layer A ₃ }. (2) The enhanced feature map is: and processing the initial feature map and the edge semantic feature map by the transform network N3 combining edge enhancement and Gaussian position coding to obtain a feature map. (3) The split network N4 may include a second convolutional layer { C _i ⁴ Or S _i ⁴ }(i∈{1，...，n _c ⁴ }，n _c ⁴ Not less than 1), upsampling layer { U ≧ 1) _i ⁴ }(i∈{1，...，n _u ⁴ }，n _u ⁴ Not less than 0); i is a variable, n _c ⁴ Number of layers of the convolutional layer, n _u ⁴ The number of upsampled layers is indicated. (4) The first image segmentation result is: and (4) segmenting the score map of multiple classes corresponding to the remote sensing image sample.

It should be noted that: combining the edge enhancement and the position coding layer P in the transform network N3 of Gaussian position coding, and calculating the position coding of each pixel (i, j) in the feature map by adopting K two-dimensional Gaussian functions, wherein the formula is as follows:

wherein p ∈ R ^K×d Is a learnable coding matrix, d is the dimension of the position code, mu ₁ ∈R ^K 、μ ₂ ∈R ^K Is a learnable mean vector, σ ₁ ∈R ^K 、σ ₂ ∈R ^K For a learnable standard deviation vector, ρ ∈ R ^K Is a learnable closeness parameter vector, omega is KAnd P is the finally obtained two-dimensional position code. By adopting a plurality of two-dimensional Gaussian distributions to calculate the position codes, the target distribution conditions of different positions in the image can be captured in a self-adaptive manner, and effective position distribution information is provided for fine and accurate segmentation of the image.

Step 123: and obtaining the loss value of the remote sensing image sample according to the first image segmentation result and the semantic annotation image corresponding to any remote sensing image sample until obtaining the loss value of each remote sensing image sample.

Specifically, a first image segmentation result corresponding to any remote sensing image sample is compared with a semantic annotation image, a loss value of the remote sensing image sample is obtained based on a loss function of a first remote sensing image segmentation model, and the above mode is repeated until the loss value of each remote sensing image sample is obtained.

Step 124: and optimizing the first remote sensing image segmentation model based on all loss values to obtain an optimized remote sensing image segmentation model, taking the optimized remote sensing image segmentation model as the first remote sensing image segmentation model, returning to the step 121, and determining the optimized remote sensing image segmentation model as the second remote sensing image segmentation model until preset iterative training conditions are met.

Wherein, (1) the preset iterative training condition is as follows: maximum iterative training times or loss function convergence, etc. (2) The optimized remote sensing image segmentation model comprises the following steps: and obtaining the remote sensing image segmentation model after one iteration training.

Specifically, parameters of the first remote sensing image segmentation model are optimized according to all loss values to obtain an optimized remote sensing image segmentation model, and whether the optimized remote sensing image segmentation model meets preset iterative training conditions or not is judged; if yes, determining the optimized remote sensing image segmentation model as a second remote sensing image segmentation model; if not, the optimized remote sensing image segmentation model is used as a first remote sensing image segmentation model and returns to the execution step 121 until the preset iterative training condition is met, and the optimized remote sensing image segmentation model is determined as a second remote sensing image segmentation model.

Preferably, the convolutional feature extraction network N1 includes: at least one first build-up layer; the step of inputting any remote sensing image sample into the convolution feature extraction network N1 to obtain an initial feature map corresponding to the remote sensing image sample comprises the following steps:

and inputting any remote sensing image sample into the convolution feature extraction network N1 to carry out feature extraction through each first convolution layer respectively to obtain an initial feature map corresponding to the remote sensing image sample.

Wherein fig. 3 shows a first structural diagram of a first remote sensing image segmentation model. As shown in fig. 3, the convolutional feature extraction network N1 includes: at least one convolution layer C ₁ ¹ 。

Preferably, the edge semantic assisting network N2 includes: and the edge vectors, the non-edge vectors and the edge semantic layer are sequentially connected.

As shown in fig. 3, the edge semantic auxiliary network N2 includes edge vectors v arranged in sequence _e Non-edge vector v _ne And an edge semantic layer E ₁ ² 。

Inputting the first edge image corresponding to any remote sensing image sample into the edge semantic auxiliary network N2 to obtain an edge semantic feature map corresponding to the remote sensing image sample, wherein the step comprises the following steps:

and inputting the first edge image, the edge vector and the non-edge vector corresponding to any remote sensing image sample into the edge semantic layer for feature extraction to obtain an edge semantic feature map corresponding to the remote sensing image sample.

Preferably, the transform network N3 combining edge enhancement and gaussian position coding comprises: at least one edge position transformer module, each edge position transformer module comprising: the device comprises a fusion layer, a position coding layer, a first additive layer, a multi-head attention layer, a second additive layer, a full-connection layer and a third additive layer which are sequentially arranged.

Wherein, as shown in fig. 3, the transform network N3 combining edge enhancement and gaussian position coding comprises an edge position transformAn er module, comprising: one fusion layer M arranged in sequence ₁ ³ A position-coding layer P ₁ ³ A first additive layer A ₁₁ ³ A multi-headed attention layer MA ₁ ³ A second additional layer A ₁₂ ³ A full connection layer FC ₁ ³ And a third additional layer A ₁₃ ³ 。

Inputting the initial feature map and the edge semantic feature map corresponding to any remote sensing image sample into the transform network N3 combining edge enhancement and Gaussian position coding to obtain an enhanced feature map corresponding to the remote sensing image sample, wherein the step comprises the following steps:

inputting the initial characteristic diagram and the edge semantic characteristic diagram corresponding to any remote sensing image sample into a fusion layer of a first edge position transformer module for fusion to obtain a first fusion characteristic diagram corresponding to the remote sensing image sample, and carrying out position coding on each pixel point in the first fusion characteristic diagram corresponding to the remote sensing image sample through a position coding layer of the first edge position transformer module to obtain two-dimensional position coding information of the first fusion characteristic diagram.

And (1) the first fusion feature map is a feature map obtained by fusing an initial feature map and an edge semantic feature map corresponding to the remote sensing image sample by the fusion layer of the edge position transformer module. (2) Two-dimensional position-coding information according to the position-coding layer P ₁ ³ And carrying out position coding on the pixels in the first fused feature map.

Inputting the first fusion characteristic diagram corresponding to any remote sensing image sample and the two-dimensional position coding information of the first fusion characteristic diagram into the first addition layer of the first edge position transformer module for addition to obtain a first intermediate characteristic diagram corresponding to the remote sensing image sample.

It should be noted that, in the training stage, since the first remote sensing image segmentation model includes the edge semantic auxiliary network N2, the first fused feature map is obtained by fusing the initial feature map and the edge semantic feature map. In the testing stage, the target remote sensing image segmentation model does not contain the edge semantic auxiliary network N2, so that the first fusion feature map is an initial feature map.

Preferably, the split network N4 comprises: at least one second convolution layer.

As shown in fig. 3, the split network N4 includes: a convolution layer S ₁ ⁴ 。

Inputting the enhanced feature map corresponding to any remote sensing image sample into the segmentation network N4 to obtain a first image segmentation result of the remote sensing image sample, wherein the step comprises the following steps:

and inputting the enhanced feature map corresponding to any remote sensing image sample into the segmentation network N4 to perform feature extraction through each second convolution layer respectively to obtain a first image segmentation result of the remote sensing image sample.

Further, fig. 4 shows a second block diagram of the first remote sensing image segmentation model. As shown in FIG. 4, the convolution feature extraction network N1 includes a first convolution layer C arranged in sequence ₁ ¹ And a first convolution layer C ₂ ¹ (ii) a That is, in the convolution feature extraction network N1, N is included in total _c ¹ =2 convolutional layers.

The edge semantic auxiliary network N2 comprises edge vectors v which are arranged in sequence _e Non-edge vector v _ne Edge semantic layer E ₁ ² Gate control layer G ₁ ² And a gate layer G ₂ ² (ii) a I.e. in the edge semantic auxiliary network N2, N is included altogether _e ² =1 edge semantic layer, n _g ² =2 gated layers.

The transform network N3 combining edge enhancement and Gaussian position coding comprises an edge position transform module EGTB arranged in sequence ₁ ³ Down-sampling layer D ₁ ³ Edge position transformer module EGTB ₂ ³ Down-sampling layer D ₂ ³ Edge position transformer module EGTB ₃ ³ . Wherein each edge position transformer module comprises a fusion layer M which is arranged in sequence ₁ ³ Position-coding layer P ₁ ³ The first additive layer A ₁₁ ³ Multi-head attention layer MA ₁ ³ Second additional layer A ₁₂ ³ Full connection layer FC ₁ ³ And a third phase addition layer A ₁₃ ³ . That is, in the transform network N3 combining edge enhancement and Gaussian position coding, N is included in total ₁ +n ₂ +n ₃ A fused layer of n ₁ +n ₂ +n ₃ Position coding layer, n ₁ +n ₂ +n ₃ Multiple head attention layer, n ₁ +n ₂ +n ₃ Full connection layer, 3 × (n) ₁ +n ₂ +n ₃ ) Add layers, 2 downsample layers; together include n _egtb ³ ＝n ₁ +n ₂ +n ₃ An EGTB module, and n _d ³ =2 downsampled layers.

In the portion framed by the edge position transformer module in fig. 4, the fusion layer M may be formed ₁ ³ Position-coding layer P ₁ ³ Multi-head attention layer MA ₁ ³ Full connection layer FC ₁ ³ And addition layer { A ₁₁ ³ ,A ₁₂ ³ ,A ₁₃ ³ And the processing layer is used for extracting the global context information to obtain an enhanced feature map, and the processing layer also comprises other groups of same structures which are arranged in parallel.

The partition network N4 includes a second convolution layer C arranged in sequence ₁ ⁴ A second convolution layer C ₂ ⁴ Upper sampling layer U ₁ ⁴ A second convolution layer C ₃ ⁴ A second convolution layer C ₄ ⁴ Upper sampling layer U ₂ ⁴ A second convolution layer C ₅ ⁴ A second convolution layer C ₆ ⁴ And a second convolution layer S ₁ ⁴ . That is, in the divided network N4, N is included in total _c ⁴ =7 convolutional layers, n _u ⁴ =2 upsampled layers.

It should be noted that (1) the down-sampling layer can be implemented by using a pooling operation or a convolution operation with a step greater than 1 to perform dimensionality reduction on the features. The upsampling layer may be implemented using a transposed convolution operation or a bilinear interpolation operation or pooling-up operation to upscale the features. The fusion layer can be realized by using addition operation, concatenation operation or mean value operation to perform information fusion on a plurality of characteristics. (2) C represents a convolution layer with a convolution kernel of 3 × 3, and S represents a convolution layer with a convolution kernel of 1 × 1. (3) The convolutional layers in the convolutional feature extraction network N1 are used to extract an initial feature map with local context information. (4) The edge semantic layer in the edge semantic auxiliary network N2 is used for extracting an edge semantic feature map containing rich target edge information; the gating layer updates the edge semantic feature map based on the feature map that is continuously learned. (5) A fusion layer in a transform network N3 combining edge enhancement and Gaussian position coding is used for fully fusing an initial feature map and an edge semantic feature map; the position coding layer is used for adaptively capturing the target distribution conditions of different positions in the image and providing effective target position distribution information; the multi-head attention layer, the adding layer and the full connection layer are used for extracting global context information to obtain an enhanced feature map. (6) The upper sampling layer in the segmentation network N4 is used for ascending the dimension of the image characteristics to gradually reach the size of the original image; and the convolution layers are used for refining the characteristic graph, wherein the last convolution layer is used for generating a score map to obtain an image segmentation result.

The technical scheme of the embodiment is suitable for remote sensing image segmentation under complex scenes such as target dense distribution, segmentation processing is carried out on the remote sensing image through the convolution feature extraction network N1, the transform network N3 combining edge enhancement and Gaussian position coding and the segmentation network N4, the enhanced target edge information and two-dimensional position information in the network can be fully utilized, and the fine and accurate segmentation capability under the condition of target dense distribution in the remote sensing image is improved.

Fig. 5 shows a schematic structural diagram of an embodiment of a remote sensing image segmentation system provided by the invention. As shown in fig. 5, the system 200 includes: a model building module 210, a model training module 220, and an image segmentation module 230.

The model building module 210 is configured to: constructing a first remote sensing image segmentation model comprising a convolution feature extraction network N1, an edge semantic auxiliary network N2, a transform network N3 combining edge enhancement and Gaussian position coding and a segmentation network N4; the convolutional feature extraction network N1, the edge semantic auxiliary network N2 and the segmentation network N4 are respectively connected with the transform network N3 which combines edge enhancement and Gaussian position coding;

the model training module 220 is configured to: training the first remote sensing image segmentation model based on a plurality of remote sensing image samples to obtain a second remote sensing image segmentation model, and deleting the edge semantic auxiliary network N2 in the second remote sensing image segmentation model to obtain a target remote sensing image segmentation model;

the image segmentation module 230 is configured to: and inputting the remote sensing image to be detected into the target remote sensing image segmentation model to obtain a target image segmentation result of the remote sensing image to be detected.

The above steps for realizing the corresponding functions of each parameter and each module in the remote sensing image segmentation system 200 of the present embodiment may refer to each parameter and step in the above embodiments of a remote sensing image segmentation method, which are not described herein again.

An embodiment of the present invention provides a storage medium, including: the storage medium stores instructions, and when the computer reads the instructions, the computer executes the steps of the remote sensing image segmentation method, which may specifically refer to each parameter and step in the above embodiment of the remote sensing image segmentation method, and details are not described here.

Computer storage media such as: flash disks, portable hard disks, and the like.

An electronic device provided in an embodiment of the present invention includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and is characterized in that when the processor executes the computer program, the computer executes steps of a remote sensing image segmentation method, which may specifically refer to each parameter and step in the above embodiment of a remote sensing image segmentation method, and are not described herein again.

Those skilled in the art will appreciate that the present invention may be embodied as methods, systems, storage media and electronic devices.

Thus, the present invention may be embodied in the form of: may be embodied entirely in hardware, entirely in software (including firmware, resident software, micro-code, etc.) or in a combination of hardware and software, and may be referred to herein generally as a "circuit," module "or" system. Furthermore, in some embodiments, the invention may also be embodied in the form of a computer program product in one or more computer-readable media having computer-readable program code embodied in the medium. Any combination of one or more computer-readable media may be employed. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A remote sensing image segmentation method is characterized by comprising the following steps:

2. The remote sensing image segmentation method according to claim 1, further comprising:

3. The remote sensing image segmentation method according to claim 2, wherein the step of training the first remote sensing image segmentation model based on the plurality of remote sensing image samples to obtain a second remote sensing image segmentation model comprises:

obtaining a loss value of each remote sensing image sample according to a first image segmentation result and a semantic annotation image corresponding to any remote sensing image sample until obtaining the loss value of each remote sensing image sample;

4. The remote sensing image segmentation method according to claim 3, wherein the convolutional feature extraction network includes: at least one first build-up layer; the step of inputting any remote sensing image sample into the convolution feature extraction network to obtain an initial feature map corresponding to the remote sensing image sample comprises the following steps:

5. The remote sensing image segmentation method according to claim 3, wherein the edge semantic auxiliary network includes: the edge vectors, the non-edge vectors and the edge semantic layer are sequentially connected; inputting a first edge image corresponding to any remote sensing image sample into the edge semantic auxiliary network to obtain an edge semantic feature map corresponding to the remote sensing image sample, wherein the step comprises the following steps:

6. The remote sensing image segmentation method according to claim 3, wherein the transform network combining edge enhancement and Gaussian position coding comprises: at least one edge position transformer module, each edge position transformer module comprising: the device comprises a fusion layer, a position coding layer, a first additive layer, a multi-head attention layer, a second additive layer, a full-connection layer and a third additive layer which are arranged in sequence; inputting the initial feature map and the edge semantic feature map corresponding to any remote sensing image sample into the transform network combining edge enhancement and Gaussian position coding to obtain an enhanced feature map corresponding to the remote sensing image sample, wherein the step comprises the following steps:

inputting an initial feature map and an edge semantic feature map corresponding to any remote sensing image sample into a fusion layer of a first edge position transformer module for fusion to obtain a first fusion feature map corresponding to the remote sensing image sample, and performing position coding on each pixel point in the first fusion feature map corresponding to the remote sensing image sample through a position coding layer of the first edge position transformer module to obtain two-dimensional position coding information of the first fusion feature map;

7. A remote sensing image segmentation method as claimed in claim 3, wherein the segmentation network comprises: at least one second convolutional layer; inputting the enhanced feature map corresponding to any remote sensing image sample into the segmentation network to obtain a first image segmentation result of the remote sensing image sample, wherein the step comprises the following steps:

8. A remote sensing image segmentation system, comprising: the system comprises a model construction module, a model training module and an image segmentation module;

9. A storage medium, characterized in that instructions are stored therein, which when read by a computer, cause the computer to carry out the remote sensing image segmentation method according to any one of claims 1 to 7.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor, when executing the computer program, causes the computer to perform the method of segmentation of remote sensing images according to any one of claims 1 to 7.