CN115049919A

CN115049919A - Attention regulation based remote sensing image semantic segmentation method and system

Info

Publication number: CN115049919A
Application number: CN202210072053.9A
Authority: CN
Inventors: 何自芬; 史本杰; 张印辉; 张朋程; 陈光晨
Original assignee: Kunming University of Science and Technology
Current assignee: Kunming University of Science and Technology
Priority date: 2022-01-21
Filing date: 2022-01-21
Publication date: 2022-09-13

Abstract

The invention discloses a remote sensing image semantic segmentation method and a system based on attention regulation, wherein the method comprises the following steps: obtaining a first remote sensing image set; obtaining a first training set and a first testing set; obtaining a first label set, wherein the first label set comprises a first training label set and a first testing label set; building a first initial model based on the FCMANet coding and decoding semantic segmentation structure; constructing an attention regulation model; training the first initial model based on the first training set, the first training label set and the attention regulation model to obtain a coding-decoding semantic segmentation model; and performing semantic segmentation on the remote sensing image based on the encoding-decoding semantic segmentation model. The method solves the technical problem that the semantic segmentation accuracy of the remote sensing image is low due to the fact that target areas with uneven scales, similar features and scattered distribution under a complex remote sensing scene cannot be effectively extracted in the prior art.

Description

Attention regulation based remote sensing image semantic segmentation method and system

Technical Field

The invention relates to the technical field of computer application, in particular to a remote sensing image semantic segmentation method and system based on attention regulation.

Background

The plateau lake is used as an important component of the earth ecosystem, and the unique geographical environment of the plateau lake creates scarce natural resources, thereby providing valuable data for researching the change of the earth natural environment and the science of the ecosystem. When monitoring plateau lakes, in addition to on-site research, data analysis of a large number of remote sensing images is relied upon. Among them, data analysis is performed by remote sensing images, which is specific to field investigation, such as high efficiency and accuracy. By means of the remote sensing image data analysis result, the global evolution information of the plateau lake on the space-time can be effectively mastered, the land of the lake peripheral area is classified, and visual and effective reference basis is finally provided for land resource optimization of the lake surrounding area. However, in the prior art, contour and category information of a dense semantic target region cannot be extracted based on a complex remote sensing scene, so that accurate semantic segmentation cannot be performed on a remote sensing image. The method has important significance in improving the semantic segmentation precision and the segmentation efficiency of the remote sensing image by utilizing the computer technology.

However, the prior art has the technical problem that target areas with uneven scales, similar features and scattered distribution in a complex remote sensing scene cannot be effectively extracted, so that the semantic segmentation accuracy of the remote sensing image is low.

Disclosure of Invention

The invention aims to provide a remote sensing image semantic segmentation method and system based on attention regulation and control, which are used for solving the technical problem that the semantic segmentation accuracy of a remote sensing image is low due to the fact that target regions with uneven scales, similar features and scattered distribution in a complex remote sensing scene cannot be effectively extracted in the prior art.

In view of the above problems, the invention provides a remote sensing image semantic segmentation method and system based on attention regulation.

In a first aspect, the present invention provides a method for semantic segmentation of remote sensing images based on attention control, where the method is implemented by a system for semantic segmentation of remote sensing images based on attention control, and the method includes: obtaining a first set of remotely sensed images, wherein the first set of remotely sensed images comprises a plurality of remotely sensed images; randomly dividing the first remote sensing image set into two parts, and respectively recording the two parts as a first training set and a first testing set; sequentially carrying out target contour labeling on each remote sensing image in the first remote sensing image set to obtain a first label set, wherein the first label set comprises a first training label set and a first testing label set, the first training label set and the first training set have a one-to-one correspondence relationship, and the first testing label set and the first testing set have a one-to-one correspondence relationship; building a first initial model based on the FCMANet coding and decoding semantic segmentation structure; constructing an attention regulation model, wherein the attention regulation model comprises a category attention module and a multi-scale attention module; training the first initial model based on the first training set, the first training label set and the attention regulation model to obtain a coding-decoding semantic segmentation model; and performing semantic segmentation on the remote sensing image based on the encoding-decoding semantic segmentation model.

On the other hand, the invention also provides a remote sensing image semantic segmentation system based on attention regulation, which is used for executing the remote sensing image semantic segmentation method based on attention regulation in the first aspect, wherein the system comprises: a first obtaining unit: the first obtaining unit is used for obtaining a first remote sensing image set, wherein the first remote sensing image set comprises a plurality of remote sensing images; a first setting unit: the first setting unit is used for randomly dividing the first remote sensing image set into two parts which are respectively marked as a first training set and a first testing set; a second obtaining unit: the second obtaining unit is used for sequentially carrying out target contour labeling on each remote sensing image in the first remote sensing image set to obtain a first label set, wherein the first label set comprises a first training label set and a first testing label set, the first training label set and the first training set have a one-to-one correspondence relationship, and the first testing label set and the first testing set have a one-to-one correspondence relationship; a first building element: the first construction unit is used for constructing a first initial model based on the FCMANet coding and decoding semantic segmentation structure; a second building element: the second construction unit is used for constructing an attention regulation model, wherein the attention regulation model comprises a category attention module and a multi-scale attention module; a third obtaining unit: the third obtaining unit is configured to train the first initial model based on the first training set, the first training label set, and the attention regulation and control model, and obtain a coding-decoding semantic segmentation model; a first execution unit: the first execution unit is used for carrying out semantic segmentation on the remote sensing image based on the encoding-decoding semantic segmentation model.

In a third aspect, the present invention further provides an attention regulation based remote sensing image semantic segmentation system, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the method according to the first aspect when executing the program.

In a fourth aspect, an electronic device, comprising a processor and a memory;

the memory is used for storing;

the processor is configured to execute the method according to any one of the first aspect above by calling.

In a fifth aspect, a computer program product comprises a computer program and/or instructions which, when executed by a processor, performs the steps of the method of any of the first aspect described above.

One or more technical schemes provided by the invention at least have the following technical effects or advantages:

1. acquiring a first remote sensing image set through acquisition, randomly dividing the acquired first remote sensing image set into a first training set and a first testing set, and labeling the outline and the type of all remote sensing images by using labelme to obtain a first label set; then constructing a first initial model based on the FCMANet coding and decoding semantic segmentation structure, and training the first initial model by using a first training set so as to obtain a coding-decoding semantic segmentation model; further, an attention regulation module is constructed based on the category attention module and the multi-scale attention module, and the attention regulation module is used for carrying out attention regulation on the coding-decoding semantic segmentation model; and finally, performing semantic segmentation on the remote sensing image by using a coding-decoding semantic segmentation model. The method achieves the technical effects of effectively extracting target areas with uneven scales, similar features and scattered distribution in a complex remote sensing scene and further improving the semantic segmentation accuracy of the remote sensing image.

2. The method comprises the steps of evaluating the segmentation precision of the intelligent semantic segmentation of the model based on the actual condition of the remote sensing image, evaluating the segmentation efficiency of the intelligent semantic segmentation of the model based on the average semantic segmentation time of the model, and further calculating by using an entropy weight method to obtain a comprehensive evaluation result of the model. The technical effects of objectively evaluating the model based on visual data and improving the effectiveness and reliability of the evaluation result are achieved.

3. By obtaining the first label set, the technical effects of visually observing the land type, land area and shape condition corresponding to each region in the remote sensing image and providing a reference basis for the accuracy of a subsequent detection model are achieved.

4. Through ablation experiments, the category attention module and the multi-scale attention module in the attention regulation and control model are evaluated respectively, and the technical effect of objectively evaluating the practicability and effectiveness of the two modules based on data basis is achieved.

The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.

Drawings

In order to more clearly illustrate the present invention or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only exemplary, and for those skilled in the art, other drawings can be obtained according to the provided drawings without inventive effort.

FIG. 1 is a schematic flow diagram of a remote sensing image semantic segmentation method based on attention regulation and control according to the invention;

FIG. 2 is a schematic flow chart of comprehensive evaluation of the encoding-decoding semantic segmentation model in the attention-based remote sensing image semantic segmentation method of the present invention;

FIG. 3 is a schematic flow chart of attention regulation and control of the encoding-decoding semantic segmentation model in the attention regulation and control-based remote sensing image semantic segmentation method of the present invention;

FIG. 4 is a schematic flow diagram illustrating the evaluation of the category attention module and the multi-scale attention module in the attention regulation-based remote sensing image semantic segmentation method of the present invention;

FIG. 5 is a schematic structural diagram of a remote sensing image semantic segmentation system based on attention regulation and control according to the present invention;

fig. 6 is a schematic structural diagram of an exemplary electronic device of the present invention.

Description of reference numerals:

a first obtaining unit 11, a first setting unit 12, a second obtaining unit 13, a first constructing unit 14, a second constructing unit 15, a third obtaining unit 16, a first executing unit 17, a bus 300, a receiver 301, a processor 302, a transmitter 303, a memory 304, and a bus interface 305.

Detailed Description

The invention provides a remote sensing image semantic segmentation method and system based on attention regulation, and solves the technical problem that the accuracy of remote sensing image semantic segmentation is low due to the fact that target regions with uneven scales, similar features and scattered distribution under complex remote sensing scenes cannot be effectively extracted in the prior art. The method achieves the technical effects of effectively extracting target areas with uneven scales, similar features and scattered distribution in a complex remote sensing scene and further improving the semantic segmentation accuracy of the remote sensing image.

In the technical scheme of the invention, the data acquisition, storage, use, processing and the like all conform to relevant regulations of national laws and regulations.

In the following, the technical solutions in the present invention will be clearly and completely described with reference to the accompanying drawings, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments of the present invention, and it should be understood that the present invention is not limited by the example embodiments described herein. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention. It should be further noted that, for the convenience of description, only some but not all of the elements associated with the present invention are shown in the drawings.

The invention provides a remote sensing image semantic segmentation method based on attention regulation, which is applied to a remote sensing image semantic segmentation system based on attention regulation, wherein the method comprises the following steps: obtaining a first set of remotely sensed images, wherein the first set of remotely sensed images comprises a plurality of remotely sensed images; randomly dividing the first remote sensing image set into two parts, and respectively recording the two parts as a first training set and a first testing set; sequentially carrying out target contour labeling on each remote sensing image in the first remote sensing image set to obtain a first label set, wherein the first label set comprises a first training label set and a first testing label set, the first training label set and the first training set have a one-to-one correspondence relationship, and the first testing label set and the first testing set have a one-to-one correspondence relationship; constructing a first initial model based on the FCMANet coding and decoding semantic segmentation structure; constructing an attention regulation model, wherein the attention regulation model comprises a category attention module and a multi-scale attention module; training the first initial model based on the first training set, the first training label set and the attention regulation model to obtain a coding-decoding semantic segmentation model; and performing semantic segmentation on the remote sensing image based on the encoding-decoding semantic segmentation model.

Having described the general principles of the invention, reference will now be made in detail to various non-limiting embodiments of the invention, examples of which are illustrated in the accompanying drawings.

Example one

Referring to the attached figure 1, the invention provides a remote sensing image semantic segmentation method based on attention regulation, wherein the method is applied to a remote sensing image semantic segmentation system based on attention regulation, and the method specifically comprises the following steps:

step S100: obtaining a first set of remotely sensed images, wherein the first set of remotely sensed images comprises a plurality of remotely sensed images;

specifically, the attention regulation based remote sensing image semantic segmentation method is applied to the attention regulation based remote sensing image semantic segmentation system, a category attention module can be arranged for performing attention regulation on a coding-decoding semantic segmentation model aiming at a target region with similar characteristics in a remote sensing scene, category attention weight extraction is realized to guide network pixel classification and spatial information filtering, and pixel classification errors are further reduced; aiming at target areas with scattered distribution and uneven scale in a remote sensing scene, a multi-scale attention module is arranged to perform attention regulation on a coding-decoding semantic segmentation model, multi-scale feature information extraction and context information correlation fusion are achieved, and therefore robustness of a network to target scale change is enhanced. And carrying out remote sensing image acquisition on the plateau ring lake region based on Google Earth or a high-resolution satellite resource platform and the like, thereby obtaining the first remote sensing image set. The first remote sensing image set comprises a plurality of remote sensing images. Such as lake remote sensing images, river remote sensing images, building remote sensing images, etc. By obtaining the obtained first remote sensing image set, training data are provided for training an intelligent remote sensing image semantic segmentation model based on the remote sensing images subsequently, and the technical effect of providing a basis for the subsequent training is achieved.

Step S200: randomly dividing the first remote sensing image set into two parts, and respectively recording the two parts as a first training set and a first testing set;

specifically, each remote sensing image in the first remote sensing image set is randomly divided, one of the divided remote sensing image sets is taken as the first training set, and the other remote sensing image set is taken as the first testing set. The first training set is used for training a model for intelligently performing semantic segmentation on the remote sensing image, and the first test set is used for performing segmentation precision calculation, segmentation time test and the like on the model obtained by training the first training set. And obtaining a first training set and a first testing set through division, and preparing remote sensing images for the training and testing of subsequent models.

Step S300: sequentially carrying out target contour labeling on each remote sensing image in the first remote sensing image set to obtain a first label set, wherein the first label set comprises a first training label set and a first testing label set, the first training label set and the first training set have a one-to-one correspondence relationship, and the first testing label set and the first testing set have a one-to-one correspondence relationship;

specifically, labeling each remote sensing image in the first remote sensing image set in sequence by using labelme, wherein labeling comprises target contour labeling and target type labeling. Wherein the labelme is a data annotation tool. That is to say, the remote sensing images in the first training set and the first test set are respectively subjected to contour and type labeling, so that a first training label set corresponding to the first training set and a first test label set corresponding to the first test set are respectively obtained. Further, the first training set of labels and the first testing set of labels collectively comprise the first set of labels. The contour and the type of each remote sensing image in the first remote sensing image set are labeled in advance, so that the first training label set and the first testing label set are obtained, and the technical effect of providing remote sensing image data for subsequent model training and model precision testing is achieved.

Step S400: building a first initial model based on the FCMANet coding and decoding semantic segmentation structure;

specifically, the FCMANet is used as a semantic segmentation structure for coding and decoding, and Resnet50 is used as a backbone network. The Resnet, i.e., the Residual Network, is commonly used in the fields of object classification and the like. The Resnet proves that networks can be developed towards deeper and more hidden layers. And constructing an initial model for intelligently and semantically segmenting the remote sensing image, namely the first initial model, by decoding the semantic segmentation structure based on the FCMANet code. By constructing the first initial model, a matrix is provided for subsequent model training based on each remote sensing image in the first training set, and the technical effect of preparing a network model matrix for subsequent model training is achieved.

Step S500: constructing an attention regulation model, wherein the attention regulation model comprises a category attention module and a multi-scale attention module;

specifically, aiming at a target area with similar characteristics in a remote sensing scene, a category attention module is arranged to regulate and control attention of a first initial model, so that category attention weight is extracted to guide network pixel classification and spatial information filtering, and further pixel classification errors are reduced; aiming at a target area with scattered distribution and uneven scale in a remote sensing scene, a multi-scale attention module is arranged to regulate and control the attention of the first initial model, so that the extraction of multi-scale characteristic information and the associated fusion of context information are realized, and the robustness of a network to the target scale change is further enhanced. And finally, constructing an attention regulation and control model based on the category attention module and the multi-scale attention module. By constructing the attention regulation and control model, a regulation and control basis is provided for the subsequent remote sensing image intelligent semantic segmentation based on the attention regulation and control.

Step S600: training the first initial model based on the first training set, the first training label set and the attention regulation model to obtain a coding-decoding semantic segmentation model;

specifically, the first training set and the first training label set are imported into a first initial model for training, so that a trained intelligent network model is obtained, the attention regulation and control model is further added to the trained intelligent network model, and finally a coding-decoding semantic segmentation model is formed. The attention regulation and control model comprises a category attention module for carrying out attention regulation and control on the condition of similar characteristics of a target region in the remote sensing image and a multi-scale attention module for carrying out attention regulation and control on the condition of scattered distribution and uneven scale of the target region in the remote sensing image. The model is trained based on the training data, and then the regulation and control module is added in a targeted manner, so that the encoding-decoding semantic segmentation model is finally formed, and the technical effect of constructing the adaptive semantic segmentation model aiming at the technical problems in the prior art is achieved.

Step S700: and performing semantic segmentation on the remote sensing image based on the encoding-decoding semantic segmentation model.

Specifically, the remote sensing image to be semantically segmented is input into the encoding-decoding semantic segmentation model, a target area in the remote sensing image to be semantically segmented is finally extracted through intelligent analysis and judgment of the model, and is labeled in a targeted manner, so that intelligent remote sensing image semantic segmentation is finally realized. Acquiring a first remote sensing image set through acquisition, randomly dividing the acquired first remote sensing image set into a first training set and a first testing set, and carrying out contour and type labeling on all remote sensing images by using labelme to obtain a first label set; then constructing a first initial model based on the FCMANet coding and decoding semantic segmentation structure, and training the first initial model by using a first training set so as to obtain a coding-decoding semantic segmentation model; further, an attention regulation module is constructed based on the category attention module and the multi-scale attention module, and the attention regulation module is used for carrying out attention regulation on the coding-decoding semantic segmentation model; and finally, performing semantic segmentation on the remote sensing image by using a coding-decoding semantic segmentation model. The method achieves the technical effects of effectively extracting target areas with uneven scales, similar features and scattered distribution in a complex remote sensing scene and further improving the semantic segmentation accuracy of the remote sensing image.

Further, as shown in fig. 2, the present invention further includes step S800:

step S810: performing semantic segmentation on each remote sensing image in the first test set based on the encoding-decoding semantic segmentation model to obtain a semantic segmentation result of the first test set;

step S820: comparing the semantic segmentation result of the first test set with the first test label set to obtain a segmentation deviation of the first test set;

step S830: if the segmentation deviation of the first test set meets a preset deviation threshold value, calculating the time of semantic segmentation of each remote sensing image in the first test set by the encoding-decoding semantic segmentation model, and recording the time as the semantic segmentation time of the first test set;

step 840: and comprehensively evaluating the coding-decoding semantic segmentation model based on the first test set segmentation deviation and the first test set semantic segmentation time.

Specifically, after the coding-decoding semantic segmentation model is obtained, the remote sensing images in the first test set obtained by dividing the first remote sensing image set are sequentially input into the model, and then the intelligent semantic segmentation result of each remote sensing image in the first test set is obtained intelligently and recorded as the semantic segmentation result of the first test set. Furthermore, the semantic segmentation result of the first test set is compared with the first test label set, and the obtained result is recorded as the segmentation deviation of the first test set. The first test label set is a result obtained by labeling the contour and type of each remote sensing image in the first test set one by utilizing a labelme tool in advance. And the semantic segmentation result of the first test set is a result obtained after intelligent recognition and judgment of a model constructed by the system. By comparison, the deviation degree between the model semantic segmentation and the actual situation can be clarified, and the accuracy data of the model semantic segmentation is obtained by taking the actual situation as a reference.

Further, when the first test set segmentation deviation meets a preset deviation threshold, it is stated that the segmentation precision of the corresponding coding-decoding semantic segmentation model meets the requirement, that is, the model is proved to have applicability. And then respectively recording the time of the encoding-decoding semantic segmentation model for performing semantic segmentation on each remote sensing image in the first test set, obtaining an average value, and recording the average value as the semantic segmentation time of the first test set, namely, visually knowing the semantic segmentation efficiency of the encoding-decoding semantic segmentation model. And finally, evaluating the segmentation precision and the segmentation efficiency of the coding-decoding semantic segmentation model based on the first test set segmentation deviation and the first test set semantic segmentation time, and calculating by combining an entropy weight method to finally obtain a model comprehensive evaluation result. The technical effects of objectively evaluating the model based on visual data and improving the effectiveness and reliability of the evaluation result are achieved.

Further, step S300 of the present invention further includes:

step S310: sequentially classifying the remote sensing images in the first remote sensing image set according to land categories, wherein the classification results of the remote sensing images form a first land category set, and the first land category set comprises lakes, rivers, buildings, vegetation, agriculture and the like;

step S320: sequentially carrying out target contour labeling on each remote sensing image in the first remote sensing image set by using labelme, and generating a first label file after labeling;

step S330: and converting the first label file into a color label image, and recording the color label image as the first label set.

Specifically, when contours and types of the remote sensing images in the first remote sensing image set are labeled by using labelme, the land use types are labeled distinctively based on the corresponding target areas. That is to say, for lakes, rivers, buildings, vegetation, agriculture, and other land types, labeling is performed using different colors, and then a json-format semantic segmentation label file, that is, the first label file, is generated. And converting the first label file into a color mask label image, establishing a category index for each remote sensing image in the first remote sensing image set according to a data import mode of a first initial model, and converting the color mask image into a digital label image. Wherein the digital label images constitute the first label set.

By obtaining the first label set, the technical effects of visually observing the land type, land area and shape condition corresponding to each region in the remote sensing image and providing a reference basis for the accuracy of a subsequent detection model are achieved.

Further, step S400 of the present invention further includes:

step S410: constructing a first coding structure by utilizing a channel compression residual error module;

step S420: generating dense feature information based on the first coding structure;

step S430: utilizing bilinear interpolation up-sampling to construct a first decoding structure;

step S440: decoding the dense feature information based on the first decoding structure;

step S450: constructing the first initial model based on the first encoding structure and the first decoding structure.

Specifically, a channel compression residual module with a bottleneck structure is adopted in a network coding part, the number of characteristic channels is shrunk to one fourth of the original number to generate dense characteristic information, and the purposes of reducing the number of parameters and filtering redundant information in the channels are achieved. That is to say the first coding structure is constructed using the channel compression residual module. Further, the feature layer is decoded by using bilinear interpolation upsampling, that is, the first decoding structure is constructed by using bilinear interpolation upsampling. The upsampling is an operation of restoring the extracted dense feature information to the original resolution size. Finally, the first initial model is formed based on the first encoding structure and the first decoding structure. By constructing the first initial model, the technical effect of making network model matrix preparation for subsequent model training is achieved.

Further, step S600 of the present invention further includes:

step S610: training the first initial model based on the first training set and the first training label set;

step S620: obtaining a first real-time training precision of the first initial model by using a cross entropy loss function;

step S630: and if the first real-time training progress accords with a preset threshold value, obtaining the coding-decoding semantic segmentation model.

Specifically, the first initial model which is well established is trained by using the first training set and the first training label set. And monitoring a first real-time training progress of the first initial model in real time by using a cross entropy loss function, and acquiring the coding-decoding semantic segmentation model when the first real-time training progress meets a preset threshold value. The preset threshold value is a training condition range which is comprehensively set by the system based on actual requirements, model training conditions and the like. In addition, when the first initial model is trained, necessary hyper-parameters such as a learning rate, a picture input size, a Batch _ size, a momentum, an iteration step number, a data loading thread number, an optimizer, a loss function and the like are also included, and the training of the first initial model is completed by combining parameter values set by the corresponding hyper-parameters. The target of training the first initial model is realized through the first training set, the first training label set and necessary hyper-parameters, and the technical effect of providing a model foundation for the intelligent semantic segmentation of the subsequent remote sensing image is achieved through constructing a coding-decoding semantic segmentation model.

Further, as shown in fig. 3, the present invention further includes step S640:

step S641: obtaining a first backbone network Bottleneck structure based on the first coding structure, wherein the first backbone network comprises a Bottleneck1 structure, a Bottleneck2 structure, a Bottleneck3 structure and a Bottleneck4 structure;

step S642: adding the category attention module in sequence after the Bottleneck1 structure, the Bottleneck2 structure and the Bottleneck3 structure;

step S643: adding a multi-scale attention module after the Bottleneck4 structure;

step S644: performing attention regulation on the encoding-decoding semantic segmentation model based on the category attention module and the multi-scale attention module.

Specifically, the first backbone network Bottleneck structure is determined based on the first coding structure. Wherein the first backbone network comprises a Bottleneck1 structure, a Bottleneck2 structure and BotA structure of tleeneck 3 and a structure of Bottleneck 4. The class attention module is sequentially added after the Bottleneck1 structure, the Bottleneck2 structure and the Bottleneck3 structure, so that the class attention weight can be extracted, network pixel classification and spatial information filtering are guided, and finally, pixel classification errors are reduced. The class attention module is mainly used for adding spatial attention after class filtering is carried out on feature information by setting a feature information filter, so that attention weight containing classification information is output to guide network pixel classification. Firstly extracting Bottleneeck output characteristics F e to R ^C×H×W Compressing the channel from C to K by using a convolution of 3 x 3 and a convolution of 1 x 1, wherein K represents the number of classes of semantic segmentation, and further obtaining the characteristic F epsilon R ^K×H×W Wherein each channel contains dense spatial information representing a class of coarse segmentation features. Then F is equal to R ^K×H×W Deformation to F ∈ R ^K×HW The spatial feature information is mapped onto a one-dimensional vector. And finally, extracting and filtering classification information of the HW dimension through a Softmax function.

And adding a multi-scale attention module behind the Bottleneck4 structure, and performing channel splicing on the output characteristics of the multi-scale attention module and the output characteristics of the Bottleneck3, so as to fuse deep-layer and shallow-layer characteristic information and achieve the technical effect of making up for the loss of detailed information of deep-layer characteristics of the network. Wherein, the multi-scale attention module performs multi-scale information extraction on output features T of Bottleneck 4. The method obtains four different scale features, adds TCTA to all the scale features to extract different scale attention weights, and finally stacks all output features of the TCTA according to channels to realize the purpose of adaptively regulating and controlling network context information in a plurality of scale regions. Wherein the multi-scale attention module implements the following formula:

T _out ＝Cat(T ₁ ，T ₂ ，T ₄ ，T ₄ )

where k represents the convolution kernel size, k ∈ {3,5,7,9 }; i represents the code number of the feature layer with different scales, and i belongs to {1,2,3 and 4 };

showing a single scale TCTA output characteristic; t is belonged to R ^C×H×W Representing an input feature; t is _out ∈R ^C×H×W Representing all scale features T _i And (4) output characteristics after channel splicing.

By means of the class attention module and the multi-scale attention module, the technical effect of attention regulation and control on the encoding-decoding semantic segmentation model is achieved.

Further, as shown in fig. 4, the present invention further includes step S850:

step S851: removing the multi-scale attention module in the coding-decoding semantic segmentation model to obtain a first experiment model;

step S852: performing semantic segmentation on each remote sensing image in the first test set based on the first experiment model to obtain a first experiment result;

step S853: removing the class attention module in the coding-decoding semantic segmentation model to obtain a second experiment model;

step S854: performing semantic segmentation on each remote sensing image in the first test set based on the second experiment model to obtain a second experiment result;

step S855: comparing the first experiment result, the second experiment result and the first test set semantic segmentation result in sequence to obtain a first experiment comparison result and a second experiment comparison result respectively;

step S856: and evaluating the category attention module according to the first experiment comparison result, and evaluating the multi-scale attention module according to the second experiment comparison result.

Specifically, after the semantic segmentation precision and the semantic segmentation efficiency of the coding-decoding semantic segmentation model are calculated and the coding-decoding semantic segmentation model is comprehensively evaluated, an ablation experiment is designed again, and the category attention module and the multi-scale attention module in the coding-decoding semantic segmentation model are respectively tested, so that the effectiveness and the practicability of the two attention regulation modules are determined.

Firstly, removing the multi-scale attention module in the coding-decoding semantic segmentation model to obtain a model only added with a class attention module, and recording the model as a first experimental model. And then, sequentially inputting each remote sensing image in the first test set into a first experiment model for semantic segmentation to obtain a corresponding first experiment result. Further, the category attention module is evaluated according to the same principle as the principle of obtaining the first test set segmentation deviation and further evaluating the semantic segmentation precision of the encoding-decoding semantic segmentation model. In the same way, the multi-scale attention module is evaluated.

Through ablation experiments, the category attention module and the multi-scale attention module in the attention regulation and control model are evaluated respectively, and the technical effect of objectively evaluating the practicability and effectiveness of the two modules based on data basis is achieved.

In summary, the attention regulation-based remote sensing image semantic segmentation method provided by the invention has the following technical effects:

Example two

Based on the same inventive concept as the method for semantic segmentation of remote sensing images based on attention regulation in the foregoing embodiment, the present invention further provides a system for semantic segmentation of remote sensing images based on attention regulation, referring to fig. 5, where the system includes:

a first obtaining unit 11, where the first obtaining unit 11 is configured to obtain a first set of remote sensing images, where the first set of remote sensing images includes a plurality of remote sensing images;

the first setting unit 12 is configured to randomly divide the first remote sensing image set into two parts, which are respectively recorded as a first training set and a first testing set;

a second obtaining unit 13, where the second obtaining unit 13 is configured to perform target contour labeling on each remote sensing image in the first remote sensing image set in sequence to obtain a first label set, where the first label set includes a first training label set and a first testing label set, the first training label set and the first training set have a one-to-one correspondence relationship, and the first testing label set and the first testing set have a one-to-one correspondence relationship;

a first construction unit 14, said first construction unit 14 is used for constructing a first initial model based on the FCMANet encoding and decoding semantic segmentation structure;

a second construction unit 15, configured to construct an attention regulation model, wherein the attention regulation model includes a category attention module and a multi-scale attention module;

a third obtaining unit 16, where the third obtaining unit 16 is configured to train the first initial model based on the first training set, the first training label set, and the attention regulation and control model, and obtain a coding-decoding semantic segmentation model;

a first executing unit 17, where the first executing unit 17 is configured to perform semantic segmentation on the remote sensing image based on the encoding-decoding semantic segmentation model.

Further, the system further comprises:

a fourth obtaining unit, configured to perform semantic segmentation on each remote sensing image in the first test set based on the encoding-decoding semantic segmentation model to obtain a first test set semantic segmentation result;

a fifth obtaining unit, configured to compare the semantic segmentation result of the first test set with the first test tag set, and obtain a first test set segmentation deviation;

the second setting unit is used for calculating the time of semantic segmentation of each remote sensing image in the first test set by the coding-decoding semantic segmentation model according to the preset deviation threshold value and recording the time as the semantic segmentation time of the first test set;

a second execution unit to perform a comprehensive evaluation of the encoding-decoding semantic segmentation model based on the first test set segmentation bias and the first test set semantic segmentation time.

Further, the system further comprises:

the first composition unit is used for sequentially classifying the remote sensing images in the first remote sensing image set according to land categories, and classification results of the remote sensing images form a first land category set, wherein the first land category set comprises lakes, rivers, buildings, vegetation, agriculture and the like;

the first generating unit is used for sequentially carrying out target contour labeling on each remote sensing image in the first remote sensing image set by using label and generating a first label file after labeling;

and the second setting unit is used for converting the first label file into a color label image and recording the color label image as the first label set.

Further, the system further comprises:

a third constructing unit, configured to construct the first coding structure by using the channel compression residual module;

a second generating unit configured to generate dense feature information based on the first encoding structure;

a fourth construction unit for constructing a first decoding structure using bilinear interpolation upsampling;

a third execution unit configured to decode the dense feature information based on the first decoding structure;

a fifth construction unit for constructing the first initial model based on the first coding structure and the first decoding structure.

Further, the system further comprises:

a fourth execution unit, configured to train the first initial model based on the first training set and the first training label set;

a sixth obtaining unit, configured to obtain a first real-time training progress of the first initial model by using a cross entropy loss function;

a seventh obtaining unit, configured to obtain the coding-decoding semantic segmentation model if the first real-time training progress meets a preset threshold.

Further, the system further comprises:

an eighth obtaining unit, configured to obtain a first backbone network bottleeck structure based on the first coding structure, where the first backbone network includes a bottleeck 1 structure, a bottleeck 2 structure, a bottleeck 3 structure, and a bottleeck 4 structure;

a first adding unit, configured to add the category attention module sequentially after the bottleeck 1 structure, the bottleeck 2 structure, and the bottleeck 3 structure;

a second adding unit, configured to add a multi-scale attention module after the bottleeck 4 structure;

a fifth execution unit, configured to perform attention regulation on the coding-decoding semantic segmentation model based on the class attention module and the multi-scale attention module.

Further, the system further comprises:

a ninth obtaining unit, configured to remove the multi-scale attention module in the encoding-decoding semantic segmentation model, and obtain a first experimental model;

a tenth obtaining unit, configured to perform semantic segmentation on each remote sensing image in the first test set based on the first experiment model to obtain a first experiment result;

an eleventh obtaining unit, configured to remove the class attention module from the encoding-decoding semantic segmentation model, and obtain a second experimental model;

a twelfth obtaining unit, configured to perform semantic segmentation on each remote sensing image in the first test set based on the second experiment model to obtain a second experiment result;

a thirteenth obtaining unit, configured to compare the first experiment result and the second experiment result with the semantic segmentation result of the first test set in sequence, and obtain a first experiment comparison result and a second experiment comparison result respectively;

a sixth execution unit, configured to evaluate the category attention module according to the first experiment comparison result, and evaluate the multi-scale attention module according to the second experiment comparison result.

In the present description, each embodiment is described in a progressive manner, and the main point of each embodiment is that the embodiment is different from other embodiments, and the aforementioned method for semantic segmentation of a remote sensing image based on attention regulation in the first embodiment of fig. 1 and the specific example are also applicable to a system for semantic segmentation of a remote sensing image based on attention regulation in this embodiment. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Exemplary electronic device

The electronic device of the present invention is described below with reference to fig. 6.

Fig. 6 illustrates a schematic structural diagram of an electronic device according to the present invention.

Based on the inventive concept of the attention regulation based remote sensing image semantic segmentation method in the embodiment, the invention also provides an attention regulation based remote sensing image semantic segmentation system, wherein a computer program is stored on the attention regulation based remote sensing image semantic segmentation system, and when the computer program is executed by a processor, the computer program realizes the steps of any one of the attention regulation based remote sensing image semantic segmentation methods.

Where in fig. 6 a bus architecture (represented by bus 300), bus 300 may include any number of interconnected buses and bridges, bus 300 linking together various circuits including one or more processors, represented by processor 302, and memory, represented by memory 304. The bus 300 may also link together various other circuits such as peripherals, voltage regulators, power management circuits, and the like, which are well known in the art, and therefore, will not be described any further herein. A bus interface 305 provides an interface between the bus 300 and the receiver 301 and transmitter 303. The receiver 301 and the transmitter 303 may be one and the same element, i.e. a transceiver, providing a means for communicating with various other apparatus over a transmission medium.

The processor 302 is responsible for managing the bus 300 and general processing, and the memory 304 may be used for storing data used by the processor 302 in performing operations.

The invention provides a remote sensing image semantic segmentation method based on attention regulation, which is applied to a remote sensing image semantic segmentation system based on attention regulation, wherein the method comprises the following steps: obtaining a first set of remotely sensed images, wherein the first set of remotely sensed images comprises a plurality of remotely sensed images; randomly dividing the first remote sensing image set into two parts, and respectively recording the two parts as a first training set and a first testing set; sequentially carrying out target contour labeling on each remote sensing image in the first remote sensing image set to obtain a first label set, wherein the first label set comprises a first training label set and a first testing label set, the first training label set and the first training set have a one-to-one correspondence relationship, and the first testing label set and the first testing set have a one-to-one correspondence relationship; constructing a first initial model based on the FCMANet coding and decoding semantic segmentation structure; constructing an attention regulation model, wherein the attention regulation model comprises a category attention module and a multi-scale attention module; training the first initial model based on the first training set, the first training label set and the attention regulation model to obtain a coding-decoding semantic segmentation model; and performing semantic segmentation on the remote sensing image based on the encoding-decoding semantic segmentation model. The method solves the technical problem that the semantic segmentation accuracy of the remote sensing image is low due to the fact that target areas with uneven scales, similar features and scattered distribution under a complex remote sensing scene cannot be effectively extracted in the prior art. The method achieves the technical effects of effectively extracting target areas with uneven scales, similar features and scattered distribution in a complex remote sensing scene and further improving the semantic segmentation accuracy of the remote sensing image.

The invention also provides an electronic device, which comprises a processor and a memory;

the memory is used for storing;

the processor is configured to execute the method according to any one of the first embodiment through calling.

The invention also provides a computer program product comprising a computer program and/or instructions which, when executed by a processor, performs the steps of the method of any of the above embodiments.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, the present invention may take the form of an entirely software embodiment, an entirely hardware embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention is in the form of a computer program product that may be embodied on one or more computer-usable storage media having computer-usable program code embodied therewith. And such computer-usable storage media include, but are not limited to: various media capable of storing program codes, such as a usb disk, a portable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk Memory, a Compact Disc Read-Only Memory (CD-ROM), and an optical Memory.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create a system for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including an instruction system which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks. While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the present invention and its equivalent technology, it is intended that the present invention also encompass such modifications and variations.

Claims

1. A remote sensing image semantic segmentation method based on attention regulation is characterized by comprising the following steps:

obtaining a first set of remotely sensed images, wherein the first set of remotely sensed images comprises a plurality of remotely sensed images;

randomly dividing the first remote sensing image set into two parts, and respectively recording the two parts as a first training set and a first testing set;

sequentially carrying out target contour labeling on each remote sensing image in the first remote sensing image set to obtain a first label set, wherein the first label set comprises a first training label set and a first testing label set, the first training label set and the first training set have a one-to-one correspondence relationship, and the first testing label set and the first testing set have a one-to-one correspondence relationship;

building a first initial model based on the FCMANet coding and decoding semantic segmentation structure;

constructing an attention regulation model, wherein the attention regulation model comprises a category attention module and a multi-scale attention module;

training the first initial model based on the first training set, the first training label set and the attention regulation model to obtain a coding-decoding semantic segmentation model;

and performing semantic segmentation on the remote sensing image based on the encoding-decoding semantic segmentation model.

2. The method of claim 1, wherein the method further comprises:

performing semantic segmentation on each remote sensing image in the first test set based on the coding-decoding semantic segmentation model to obtain a semantic segmentation result of the first test set;

comparing the semantic segmentation result of the first test set with the first test label set to obtain a segmentation deviation of the first test set;

if the segmentation deviation of the first test set meets a preset deviation threshold value, calculating the time of semantic segmentation of each remote sensing image in the first test set by the encoding-decoding semantic segmentation model, and recording the time as the semantic segmentation time of the first test set;

and comprehensively evaluating the coding-decoding semantic segmentation model based on the first test set segmentation deviation and the first test set semantic segmentation time.

3. The method of claim 1, wherein said sequentially performing target contour labeling on each remote sensing image in the first set of remote sensing images to obtain a first set of labels comprises:

sequentially classifying the remote sensing images in the first remote sensing image set according to land categories, wherein the classification results of the remote sensing images form a first land category set, and the first land category set comprises lakes, rivers, buildings, vegetation, agriculture and the like;

sequentially carrying out target contour labeling on each remote sensing image in the first remote sensing image set by using labelme, and generating a first label file after labeling;

and converting the first label file into a color label image, and recording the color label image as the first label set.

4. The method according to claim 1, wherein said decoding a semantic segmentation structure based on FCMANet encoding, constructing a first initial model, comprises:

constructing a first coding structure by utilizing a channel compression residual error module;

generating dense feature information based on the first coding structure;

utilizing bilinear interpolation up-sampling to construct a first decoding structure;

decoding the dense feature information based on the first decoding structure;

constructing the first initial model based on the first encoding structure and the first decoding structure.

5. The method of claim 4, wherein after said building said first initial model, further comprising:

training the first initial model based on the first training set and the first training label set;

obtaining a first real-time training precision of the first initial model by using a cross entropy loss function;

and if the first real-time training progress accords with a preset threshold value, obtaining the coding-decoding semantic segmentation model.

6. The method of claim 5, wherein after obtaining the encoding-decoding semantic segmentation model, comprising:

obtaining a first backbone network Bottleneck structure based on the first coding structure, wherein the first backbone network comprises a Bottleneck1 structure, a Bottleneck2 structure, a Bottleneck3 structure and a Bottleneck4 structure;

adding the category attention module in sequence after the Bottleneeck 1 structure, the Bottleneeck 2 structure and the Bottleneeck 3 structure;

adding a multi-scale attention module after the Bottleneck4 structure;

performing attention regulation on the encoding-decoding semantic segmentation model based on the category attention module and the multi-scale attention module.

7. The method of claim 2, wherein after the comprehensively evaluating the encoding-decoding semantic segmentation model, comprising:

removing the multi-scale attention module in the coding-decoding semantic segmentation model to obtain a first experiment model;

performing semantic segmentation on each remote sensing image in the first test set based on the first experiment model to obtain a first experiment result;

removing the category attention module in the coding-decoding semantic segmentation model to obtain a second experiment model;

performing semantic segmentation on each remote sensing image in the first test set based on the second experiment model to obtain a second experiment result;

comparing the first experiment result, the second experiment result and the first test set semantic segmentation result in sequence to obtain a first experiment comparison result and a second experiment comparison result respectively;

and evaluating the category attention module according to the first experiment comparison result, and evaluating the multi-scale attention module according to the second experiment comparison result.

8. A remote sensing image semantic segmentation system based on attention regulation and control is characterized by comprising:

a first obtaining unit: the first obtaining unit is used for obtaining a first remote sensing image set, wherein the first remote sensing image set comprises a plurality of remote sensing images;

a first setting unit: the first setting unit is used for randomly dividing the first remote sensing image set into two parts which are respectively recorded as a first training set and a first testing set;

a second obtaining unit: the second obtaining unit is used for sequentially carrying out target contour labeling on each remote sensing image in the first remote sensing image set to obtain a first label set, wherein the first label set comprises a first training label set and a first testing label set, the first training label set and the first training set have a one-to-one correspondence relationship, and the first testing label set and the first testing set have a one-to-one correspondence relationship;

a first building unit: the first construction unit is used for constructing a first initial model based on the FCMANet coding and decoding semantic segmentation structure;

a second building element: the second construction unit is used for constructing an attention regulation model, wherein the attention regulation model comprises a category attention module and a multi-scale attention module;

a third obtaining unit: the third obtaining unit is configured to train the first initial model based on the first training set, the first training label set, and the attention regulation model, and obtain a coding-decoding semantic segmentation model;

a first execution unit: the first execution unit is used for carrying out semantic segmentation on the remote sensing image based on the encoding-decoding semantic segmentation model.

9. An electronic device comprising a processor and a memory;

the memory is used for storing;

the processor is used for executing the method of any one of claims 1-7 through calling.

10. A computer program product comprising a computer program and/or instructions for implementing the steps of the method according to any one of claims 1 to 7 when executed by a processor.