CN115410081A - Multi-scale aggregated cloud and cloud shadow identification method, system, equipment and storage medium - Google Patents

Multi-scale aggregated cloud and cloud shadow identification method, system, equipment and storage medium Download PDF

Info

Publication number
CN115410081A
CN115410081A CN202210959194.2A CN202210959194A CN115410081A CN 115410081 A CN115410081 A CN 115410081A CN 202210959194 A CN202210959194 A CN 202210959194A CN 115410081 A CN115410081 A CN 115410081A
Authority
CN
China
Prior art keywords
cloud
convolution
feature
attention
scale
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210959194.2A
Other languages
Chinese (zh)
Inventor
夏旻
陈凯
翁理国
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202210959194.2A priority Critical patent/CN115410081A/en
Publication of CN115410081A publication Critical patent/CN115410081A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a multi-scale aggregated cloud and cloud shadow identification method, a system, equipment and a storage medium, belonging to the technical field of image processing, wherein the method comprises the following steps: acquiring a picture to be detected; inputting a picture to be detected into a pre-trained multi-scale attention feature aggregation network to obtain a mask image of cloud and cloud shadow, and completing identification of the cloud and the cloud shadow; the mask image of the cloud and the cloud shadow is output after the trained weight extraction features are used for coding and decoding, the cloud and cloud shadow identification accuracy can be effectively improved, the interference of complex backgrounds and noise in the image can be effectively reduced, scattered small-scale cloud and cloud shadow targets can be effectively captured, the detection capability of thin clouds is enhanced, the segmentation of irregular joints of the cloud and the cloud shadow is refined, the segmentation accuracy of the complex edge details of the cloud and the cloud shadow is improved, good effects are achieved in segmentation experiments of other targets, and the method has excellent generalization capability and good robustness.

Description

Multi-scale aggregated cloud and cloud shadow identification method, system, equipment and storage medium
Technical Field
The invention relates to a multi-scale aggregated cloud and cloud shadow identification method, system, equipment and storage medium, and belongs to the technical field of image processing.
Background
With the development of remote sensing technology, remote sensing images have been widely applied to various fields such as agriculture, meteorology and military; since 67% of the earth's surface is covered by cloud layers, many areas in the remote sensing image are often covered by cloud layers, which causes the ground information obtained by us to be attenuated and even directly lost; therefore, accurate identification of cloud and cloud shadow is of great significance to the application of the optical remote sensing image; due to the fact that the traditional deep learning network is easily influenced by factors such as ground object interference and noise interference and lacks of generalization capability, if the traditional deep learning network is directly applied to cloud detection, details and spatial information are easily lost, and the problems that the cloud and cloud shadow boundary is roughly segmented, and images are missed and mistakenly detected are caused.
Disclosure of Invention
The invention aims to provide a multi-scale aggregated cloud and cloud shadow identification method, system, equipment and storage medium, which solve the problems of rough segmentation of cloud and cloud shadow boundaries, missed detection and false detection of images and the like and improve the identification accuracy of clouds and cloud shadows.
In order to realize the purpose, the invention adopts the following technical scheme:
in a first aspect, the present invention provides a method for identifying a multi-scale aggregated cloud and cloud shadow, including:
acquiring a picture to be detected;
and inputting the picture to be detected into a pre-trained multi-scale attention feature aggregation network to obtain a mask image of cloud and cloud shadow, and completing the identification of the cloud and the cloud shadow.
Further in combination with the first aspect, the multi-scale attention feature aggregation network is trained by:
acquiring training data;
performing data enhancement processing on the image in the training data, and then converting the image and the corresponding label into a tensor;
and inputting the tensor into a multi-scale attention feature aggregation network for training to obtain the trained multi-scale attention feature aggregation network.
With reference to the first aspect, further, the multi-scale attention feature aggregation network includes a multi-scale strip-shaped pooling attention module, which is composed of 4 parallel strip-shaped average pooling branches and an adaptive average pooling branch, two parallel strip-shaped convolution branches, a spatial attention module, and a channel attention module, and is configured to extract multi-scale context information and deep space and channel information;
the system comprises 4 average pooling branches and a self-adaptive average pooling branch, wherein the average pooling branches and the self-adaptive average pooling branch are used for extracting and adding a picture to be detected in parallel, recovering the size of the picture to be detected after obtaining a multi-scale characteristic diagram, and connecting the multi-scale characteristic diagram and the picture to be detected in a height dimension to obtain a weight vector;
respectively inputting the weight vectors into two parallel strip convolution branches, wherein the first branch consists of a convolution kernel 1 multiplied by 7 and a convolution kernel 7 multiplied by 1, the second branch consists of a convolution kernel 7 multiplied by 1 and a convolution kernel 1 multiplied by 7, the first branch extracts a feature map and then inputs the feature map into a space attention module to extract a first feature map containing space information, the second branch extracts a feature map and then inputs the feature map into a channel attention module to extract a second feature map containing channel information, and then the final feature maps are output after connection and interaction;
the calculation process of the channel attention module is as follows:
features were extracted using global average pooling and global maximum pooling, respectively:
Figure BDA0003791591600000021
Figure BDA0003791591600000031
wherein, x represents the characteristic diagram of the input,
Figure BDA0003791591600000032
and
Figure BDA0003791591600000033
second and third weight vectors representing the outputs of the global max-pooled branch and the global mean-pooled branch, respectively, gmax and Gavg representing the global max-pooled and the global mean-pooled, respectively, C2D 1×1 Two-dimensional convolution representing a convolution kernel of 1 × 1;
splicing the features extracted by the global average pooling and the global maximum pooling:
Figure BDA0003791591600000034
wherein, CAT 3 Representing the stitching in the width dimension,
Figure BDA0003791591600000035
are images stitched in the width dimension;
size recovery, feature selection, re-weighting:
Figure BDA0003791591600000036
where CA (x) represents the first signature of the channel attention module output, DWC2D 1×2 Two-dimensional depth separable convolution, DWC2D, representing a convolution kernel of 1 x 2 1×1 A two-dimensional depth separable convolution with a convolution kernel of 1 × 1 is represented, and σ represents a nonlinear activation function Sigmoid;
the spatial attention module is calculated as follows:
respectively extracting features by using global average pooling and global maximum pooling, then connecting along channel dimensions, executing convolution operation, and generating a second feature map by using a nonlinear activation function:
SA(x)=σ(C2D 7×7 (CAT 1 (MP(x),AP(x))))
wherein SA (x) represents a second profile, C2D, of the spatial attention module output 7×7 Representing a two-dimensional convolution with a convolution kernel of 7 x 7, CAT 1 Indicating stitching in channel dimensions, MP and AP indicate maximum pooling and average pooling, respectively.
With reference to the first aspect, further, the multi-scale attention feature aggregation network includes a deep multi-head feedforward transfer attention module, configured to promote two adjacent layers of a backbone network in the multi-scale attention feature aggregation network to guide each other for feature mining, and to merge feature map information of the two adjacent layers extracted from the backbone network;
firstly, carrying out layer normalization operation on the feature maps output by two adjacent layers to generate a first layer normalization tensor and a second layer normalization tensor, generating a query vector by the first layer normalization tensor, generating a key vector and a value vector by the second layer normalization tensor, and calculating as follows:
Figure BDA0003791591600000041
Figure BDA0003791591600000042
Figure BDA0003791591600000043
wherein X is a first-level normalized tensor, Y is a second-level normalized tensor, Q is a query vector, K is a key vector, V is a value vector,
Figure BDA0003791591600000044
the convolution kernel representing the computed query vector is a 1 x 1 two-dimensional convolution,
Figure BDA0003791591600000045
the convolution kernel representing the computed query vector is a 3 x 3 two-dimensional depth separable convolution,
Figure BDA0003791591600000046
the convolution kernel representing the calculated key vector is a 1 x 1 two-dimensional convolution,
Figure BDA0003791591600000047
the convolution kernel representing the computed key vector is a 3 x 3 two-dimensional depth separable convolution,
Figure BDA0003791591600000048
the convolution kernel representing the vector of computed values is a 1 x 1 two-dimensional convolution,
Figure BDA0003791591600000049
a convolution kernel representing a vector of computed values is a 3 x 3 two-dimensional depth separable convolution;
reshaping the query vector and the key vector to have their dot product interactions generate a transposed attention map:
Attention(Q′,K′,V′)=V′·Softmax(K′·V′/β)
P′=C2D 1×1 Attention(Q′,K′,V′)+x+y
wherein x and y represent shallow and deep input feature maps, respectively, P ' is the output transposed Attention feature map, Q ', K ', V ' are three matrices obtained from the original size remodeling tensor, β is a scaling parameter, and Attention (Q ', K ', V ') is an Attention function;
inputting the transposed attention feature map into a feedforward network to obtain a feedforward feature map:
Z 1 =DWC2D 3×3 ((C2D 1×1 (Z))
Z 2 =δ(Z 1 )⊙Z 1
Z′=C2D 1×1 (Z 2 )+Z
wherein Z is a third layer normalization tensor, C2D, obtained by performing layer normalization on the transposed attention feature map 1×1 Is a two-dimensional convolution with a convolution kernel of 1 × 1, DWC2D 3×3 Is a two-dimensional depth separable convolution with a convolution kernel of 3 x 3, Z 1 Is a first feedforward intermediate diagram, indicates a dot product, δ is a Gelu not-lineSexual activation function, Z 2 Is the second feedforward intermediate graph and Z' is the feedforward characteristic graph of the output.
With reference to the first aspect, further, the multi-scale attention feature aggregation network includes a bilateral feature fusion module including a detail branch and a context branch;
and simultaneously inputting the feature diagram output by the detail branch into two branches to obtain two feature-mapped detail output values:
Figure BDA0003791591600000051
Figure BDA0003791591600000052
wherein x _ d is a feature map of the detail branch output,
Figure BDA0003791591600000053
two-dimensional depth separable convolution with a convolution kernel of 3 x 3 and an expansion factor of 2, BN batch normalization, R Relu activation function, AP average pooling, Y 1 And Y 2 Respectively representing the detail output values after the two feature mappings;
simultaneously inputting the feature diagram output by the context branch into two branches to obtain two feature mapped context output values:
y 1 =γ 1 ·σ(x_c up )
Figure BDA0003791591600000054
wherein, x _ c up Is a characteristic diagram of the context branch output, sigma represents a nonlinear activation function Sigmoid,
Figure BDA0003791591600000055
two-dimensional convolution, DWC2D, with a convolution kernel of 3 × 3 and a dilation factor of 2 3×3 Representing a convolution kernel of 3 x 3 two dimensionsDepth separable convolution, y 1 And y 2 Respectively representing context output values after the two characteristic mappings;
adding the context output values after the two feature mappings, and then performing two-dimensional depth separable convolution, batch normalization processing and activation processing to obtain a bilateral feature fusion feature map:
y out =R(BN(DWC2D 3×3 (y 1 +y 2 )))
wherein, y out Is a bilateral feature fusion feature map.
With reference to the first aspect, further, the multiscale attention feature aggregation network includes a boundary refinement boosting module, configured to enhance detection of cloud and cloud shadow complex edge information, where a computation process of the boundary refinement boosting module is as follows:
x′=C2D 3×3 (C2D 3×3 (x))+x
y=drop(C2D 3×3 (x′))
y′=Up(C2D 1×1 (y))
wherein x and y' respectively represent the input value and the output value of the boundary refinement boosting module, and C2D 1×1 Representing a two-dimensional convolution with a convolution kernel of 1 × 1, up representing 2 times upsampling, C2D 3×3 Representing a two-dimensional convolution with a convolution kernel of 3 x 3, drop representing the dropout algorithm, x' representing a first refinement intermediate quantity, and y representing a second refinement intermediate quantity.
In a second aspect, the present invention further provides a multi-scale aggregated cloud and cloud shadow identification system, including:
the picture acquisition module: the method comprises the steps of obtaining a picture to be detected;
cloud and cloud shadow identification module: the method is used for inputting the picture to be detected into the pre-trained multi-scale attention feature aggregation network to obtain the mask image of the cloud and the cloud shadow, and the identification of the cloud and the cloud shadow is completed.
In a third aspect, the present invention further provides a multi-scale aggregated cloud and cloud shadow recognition apparatus, including a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any one of the first aspect.
In a fourth aspect, the invention also provides a computer-readable storage medium, on which a computer program is stored, which program, when executed by a processor, performs the steps of the method of any one of the first aspects.
Compared with the prior art, the invention has the following beneficial effects:
according to the cloud and cloud shadow identification method, system, equipment and storage medium for multi-scale aggregation, the picture to be detected is input into a pre-trained multi-scale attention feature aggregation network, the trained weight extraction features are used for carrying out coding and decoding operations, and then the mask image of the cloud and cloud shadow is output, so that the cloud and cloud shadow identification accuracy can be effectively improved;
the multi-scale strip pooling attention module is adopted to further extract multi-scale context information and deep space and channel information, the cloud and cloud shadows are classified through the context information, edge detail processing and segmentation are carried out between the cloud and cloud shadows, strip pooling can reduce interference of other irrelevant areas in the image, interference of complex backgrounds and noise in the image is effectively reduced, and scattered small-scale cloud and cloud shadow targets can be effectively captured;
the deep multi-head feedforward attention transfer module is adopted to enhance the communication capacity of two channels, promote the mutual guiding of two adjacent layers of a backbone network to carry out feature mining, fuse the feature map information of the two adjacent layers extracted from the backbone network, enhance the detection capacity of thin clouds and refine the segmentation of irregular combination parts of the clouds and cloud shadows;
a bilateral feature fusion module is adopted to fuse low-level semantic information and high-level detail information, and context branch semantic information is used to guide the feature response of detail branches, so that efficient information exchange is realized, and the influence of interference objects on identification is reduced;
the feature representation is enhanced in the training phase by adopting a boundary thinning boosting module, so that the segmentation precision of the cloud and the complex edge details of the cloud shadow is improved.
Drawings
FIG. 1 is a schematic structural diagram of a multi-scale attention feature aggregation network provided by an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a channel attention module provided in an embodiment of the present invention;
FIG. 3 is a schematic diagram of a spatial attention module according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a deep multi-head feedforward transfer attention module according to an embodiment of the invention;
fig. 5 is a schematic structural diagram of a bilateral feature fusion module according to an embodiment of the present invention;
FIG. 6 is a schematic structural diagram of a boundary refinement boosting module provided by an embodiment of the present invention;
fig. 7 is a flowchart of a method for identifying a cloud and cloud shadow for multi-scale aggregation according to an embodiment of the present invention.
Detailed Description
The present invention is further described with reference to the accompanying drawings, and the following examples are only for clearly illustrating the technical solutions of the present invention, and should not be taken as limiting the scope of the present invention.
Example 1
As shown in fig. 7, the method for identifying a cloud and a cloud shadow in a multi-scale aggregation according to an embodiment of the present invention includes:
s1, obtaining a picture to be detected.
An original color picture is collected as a picture to be detected.
S2, inputting the picture to be detected into a pre-trained multi-scale attention feature aggregation network to obtain a mask image of cloud and cloud shadow, and completing identification of the cloud and the cloud shadow.
The multi-scale attention feature aggregation network shown in fig. 1 is constructed, the whole network is of an encoder-decoder structure, an end-to-end training mode is adopted, and the multi-scale strip pooling attention module, the deep multi-head feedforward transfer attention module, the bilateral feature fusion module and the boundary refinement boosting module mainly form the multi-scale strip pooling attention network.
In the process of identifying cloud and cloud shadow in the remote sensing image, the extraction of the feature information in the image is very important, and the detection efficiency of the network can be greatly improved by selecting a proper backbone network.
A multi-scale strip pooling attention Module (MSPA) for further extracting multi-scale context information and deep space and channel information; the system consists of 4 parallel strip average pooling branches, an adaptive average pooling branch, two parallel strip convolution branches, a space attention module and a channel attention module.
The pooling kernel of the 4 average pooling branches is Nx1 (N =1,3,5, 6), the pooling kernel of the adaptive average pooling branch is 1 xN, and the pooling kernel is used for extracting and adding input features in parallel to obtain a multi-scale feature map, restoring the size of the multi-scale feature map to the size of an input picture to be detected, and connecting the multi-scale feature map and the input feature map together in a height dimension to obtain a weight vector so as to finish extraction of the multi-scale features.
The weight vectors are respectively input into two parallel strip convolution branches, the first branch is composed of a convolution kernel 1 multiplied by 7 and a convolution kernel 7 multiplied by 1, the second branch is composed of a convolution kernel 7 multiplied by 1 and a convolution kernel 1 multiplied by 7, the first branch extracts a feature map and then inputs the feature map into a space attention module, a first feature map containing space information is extracted, the second branch extracts a feature map and then inputs the feature map into a channel attention module, a second feature map containing channel information is extracted, and finally the two branch image information are connected and interacted to output a final feature map.
The horizontal stripe convolution tends to learn some horizontal details in cloud and cloud shadow images, the longitudinal stripe convolution tends to learn some longitudinal details, edges of the cloud and the cloud shadow are often closely connected, the operation can capture edge feature information at the connection position of the cloud and the cloud shadow well, and the edge segmentation effect is improved.
The structure of the channel attention module is shown in fig. 2, the core content of the channel attention module is that two types of global pooling are used for extracting high-level features, namely global average pooling and global maximum pooling, the use of different global pooling means that the extracted high-level features are richer, then two-dimensional depth separable convolution is used as an extractor for information between channels, an endpoint focuses on the importance of features in different channels, and the calculation process of the channel attention module is as follows:
features were extracted using global average pooling and global maximum pooling, respectively:
Figure BDA0003791591600000101
Figure BDA0003791591600000102
wherein, x represents the characteristic diagram of the input,
Figure BDA0003791591600000103
and
Figure BDA0003791591600000104
second and third weight vectors representing the global max-pooling branch and global mean-pooling branch outputs, respectively, gmax and Gavg representing global max-pooling and global mean-pooling, respectively, C2D 1×1 A two-dimensional convolution representing a convolution kernel of 1 × 1;
then, splicing the features extracted by the global average pooling and the global maximum pooling:
Figure BDA0003791591600000105
wherein, CAT 3 Representing the stitching in the width dimension,
Figure BDA0003791591600000106
are images stitched in the width dimension;
then, performing size recovery on the spliced image by using a two-dimensional depth separable convolution with a convolution kernel of 1 × 2, and paying attention to detail characteristic information of the image; then, a two-dimensional depth separable convolution with a convolution kernel of 1 x 1 is used as a selector, and the feature representation of the global average pooling branch and the global maximum pooling branch is focused in a self-adaptive mode; finally, after the output of the selector, the original feature map is re-weighted using the nonlinear activation function Sigmoid.
The calculation processes of the size recovery, the feature selection and the re-weighting are as follows:
Figure BDA0003791591600000107
where CA (x) represents a first feature map of the channel attention module output, DWC2D 1×2 Two-dimensional depth separable convolution, DWC2D, representing a convolution kernel of 1 x 2 1×1 A two-dimensional depth separable convolution with a convolution kernel of 1 x 1 is denoted, and σ denotes the nonlinear activation function Sigmoid.
The structure of the spatial attention module is shown in fig. 3, which extracts feature information using average pooling and maximum pooling, unlike the channel attention module, where both convergence methods are performed along the channel dimension; after the feature map results generated by average pooling and maximum pooling are connected in channel dimensions, convolution operation with convolution kernel of 7 × 7 is performed in order to reduce the number of channels from 2 to 1; a relatively large reception field can be extracted by convolution operation with a large convolution kernel with a convolution kernel of 7 × 7; and finally generating a final feature map through a nonlinear activation function Sigmoid.
The calculation process of the above spatial attention module is as follows:
extracting features by using global average pooling and global maximum pooling respectively, then connecting along channel dimensions, executing convolution operation, and generating a second feature map by using a nonlinear activation function:
SA(x)=σ(C2D 7×7 (cAT 1 (MP(x),AP(x))))
wherein SA (x) represents a second profile, C2D, of the spatial attention module output 7×7 Representing a two-dimensional convolution with a convolution kernel of 7 x 7, CAT 1 Representation stitching in channel dimension, MPAnd AP represent maximum pooling and average pooling, respectively.
The structure of the deep multi-head feedforward attention transfer module (DMFA) is shown in fig. 4, and is used for promoting two adjacent layers of the backbone network in the multi-scale attention feature aggregation network to mutually guide feature mining, and fusing feature map information of the two adjacent layers extracted from the backbone network.
The deep multi-head feedforward attention transfer module firstly performs layer normalization operation on feature maps output by two adjacent layers to generate a first layer of normalization tensor X belonging to R H×W×C And a second layer of normalized tensors Y ∈ R H×W×C The query vector Q is generated from the first layer of normalized tensor, the key vector K and the value vector V are generated from the second layer of normalized tensor, and enrichment is performed by using local context information, aggregating pixel-level cross-channel context information by using 1 × 1 convolution, and then encoding channel-level spatial context information by using 3 × 3 deep convolution, and the calculation process is as follows:
Figure BDA0003791591600000111
Figure BDA0003791591600000112
Figure BDA0003791591600000113
where X is the first layer normalized tensor, Y is the second layer normalized tensor, Q is the query vector, K is the key vector, V is the value vector,
Figure BDA0003791591600000121
representing a 1 x 1 two-dimensional convolution with the convolution kernel that computes the query vector,
Figure BDA0003791591600000122
the convolution kernel representing the computed query vector is a 3 x 3 two-dimensional depth separable convolution,
Figure BDA0003791591600000123
the convolution kernel representing the calculated key vector is a 1 x 1 two-dimensional convolution,
Figure BDA0003791591600000124
the convolution kernel representing the computed key vector is a 3 x 3 two-dimensional depth separable convolution,
Figure BDA0003791591600000125
the convolution kernel representing the vector of calculated values is a 1 x 1 two-dimensional convolution,
Figure BDA0003791591600000126
the convolution kernel representing the vector of computed values is a 3 x 3 two-dimensional depth separable convolution.
Then reshaping the query vector and the key vector to have their dot product interaction generation size R C×C The calculation process is as follows:
Attention(Q′,K′,V′)=V′·Softmax(K′·V′/β)
P′=C2D 1×1 Attention(Q′,K′,V′)+x+y
where x and y represent shallow and deep input eigenmaps respectively, P ' is the output transposed attention eigenmap, Q ', K ', V ' are the three matrices obtained from the original size reshaped tensor, Q ' is for R HW×C ,K′∈ R C×HW ,V′∈R HW×C β is a scaling parameter used to control the size of the dot product of K ' and V ' before performing the Softmax function, and Attention (Q ', K ', V ') is the Attention function.
After the characteristic diagram information is processed in the previous step, the transposed attention characteristic diagram is input into a feedforward network, the feedforward network respectively carries out the same operation on each pixel position of the input transposed attention characteristic diagram, firstly, a third layer normalization tensor Z is obtained through layer normalization operation, and Z belongs to R H×W×C Then the feature channels are extended by a two-dimensional convolution with a convolution kernel of 1 x 1, using a two-dimensional depth separable convolution with a convolution kernel of 3 x 3 to encode bits from spatially adjacent pixelsInformation of the placement; then, through a gating mechanism, namely, after splitting the channel dimension, the feature map information after deep separable convolution passes through two parallel branches, wherein one branch passes through a Gelu nonlinear activation function, the feature map points output by the two parallel branches are multiplied, and then the channel is reduced back to the original input dimension through 1 × 1 convolution, so as to obtain a feedforward feature map, wherein the calculation process is as follows:
Z 1 =DWC2D 3×3 ((C2D 1×1 (Z))
Z 2 =δ(Z 1 )⊙Z 1
Z′=C2D 1×1 (Z 2 )+Z
wherein Z is a third layer normalization tensor, C2D, obtained by performing layer normalization on the transposed attention feature map 1×1 Is a two-dimensional convolution with a convolution kernel of 1 × 1, DWC2D 3×3 Is a two-dimensional depth separable convolution with a convolution kernel of 3 x 3, Z 1 Is a first feed-forward intermediate diagram, indicating a dot product, δ being a Gelu nonlinear activation function, Z 2 Is the second feedforward intermediate graph and Z' is the feedforward characteristic graph of the output.
The structure of the bilateral feature fusion module (BFF) is shown in fig. 5, and includes a detail branch and a context branch; the method is used for fusing low-level semantic information and high-level detail information, improving the overall anti-interference performance of the model, and reducing the influence of interferents and noise in the image on cloud and cloud shadow prediction; the module uses semantic information of the context branch to guide the feature response of the detail branch, and through guidance of different scales, we can extract feature representations of different scales.
Simultaneously inputting the feature map output by the detail branch into two branches to obtain two feature-mapped detail output values; one branch enters the depth separable cavity convolution, the convolution kernel is 3 multiplied by 3, the expansion factor is 2, the receptive field can be greatly improved by using the cavity convolution, but extra parameters cannot be increased, the context information is enhanced by increasing the receptive field, and the accuracy of the segmentation boundary can be better improved; the other branch enters a depth separable cavity convolution and average pooling layer with a convolution kernel of 3 x 3 and an expansion factor of 2; batch normalization and Relu activation functions are added to the two branches, so that the network convergence is faster and more stable, and overfitting is prevented; the above calculation process is as follows:
Figure BDA0003791591600000131
Figure BDA0003791591600000132
wherein x _ d is a feature map of the detail branch output,
Figure BDA0003791591600000133
two-dimensional depth separable convolution with a convolution kernel of 3 x 3 and an expansion factor of 2, BN batch normalization, R Relu activation function, AP average pooling, Y 1 And Y 2 And respectively representing the detail output values after the two feature mappings.
Because the pixel sizes of the feature images output by the detail branch and the context branch are different, and the size of the feature image output by the detail branch is twice that of the context branch, the feature images output by the context branch are simultaneously input into the two branches, and the two branches are respectively subjected to upsampling; the method comprises the following steps that (1) Sigmoid activation is directly carried out on a sampled feature map on one branch, in the other branch, the feature map is subjected to depth separable convolution after being subjected to cavity convolution, the feature map is subjected to Sigmoid activation after being subjected to convolution layer, and after detail branch feature mapping, the value of Sigmoid activation is reweighed to feature mapping to obtain a context output value after two feature mappings:
y 1 =γ 1 ·σ(x_c up )
Figure BDA0003791591600000141
wherein, x _ c up Is a characteristic diagram of the context branch output, sigma represents a nonlinear activation function Sigmoid,
Figure BDA0003791591600000142
two-dimensional convolution with a convolution kernel of 3 × 3 and a dilation factor of 2, DWC2D 3×3 Representing a convolution kernel as a 3 x 3 two-dimensional depth separable convolution, y 1 And y 2 Respectively representing the context output values after the two feature mappings.
Summarizing results under different scales, and further extracting characteristic information, namely y 1 And y 2 Adding the two feature mapping context output values, performing two-dimensional depth separable convolution, and performing batch normalization processing and activation processing to obtain a bilateral feature fusion feature map:
y out =R(BN(DWC2D 3×3 (y 1 +y 2 )))
wherein, y out Is a bilateral feature fusion feature map.
The structure of a boundary refinement boosting module (BRB) is shown in FIG. 6, and is used for enhancing the detection of cloud and cloud shadow complex edge information; the method comprises the steps that the boundary details of the cloud and the cloud shadow are predicted again through end-to-end training, and aiming at the problem that the effect of a segmentation prediction graph is not ideal when the segmentation precision is not high, a BRB module provides a training enhancement strategy, the characteristic representation is enhanced in a training stage, a dropout link is added in an intermediate training process, and neurons in a probability network of 0.1 can be discarded in a prediction stage, so that the segmentation precision can be improved to a certain extent, the prediction graph with good segmentation effect is obtained, and overfitting of the network is prevented; the calculation process of the boundary refinement boosting module is as follows:
x′=C2D 3×3 (C2D 3 × 3 (x))+x
y=drop(C2D 3×3 (x′))
y′=Up(C2D 1×1 (y))
wherein x and y' respectively represent the input value and the output value of the boundary refining boosting module, and C2D 1×1 Representing a two-dimensional convolution with a convolution kernel of 1 × 1, up representing 2 times upsampling, C2D 3×3 To representThe convolution kernel is a 3 × 3 two-dimensional convolution, drop represents the dropout algorithm, x' represents the first refinement intermediate quantity, and y represents the second refinement intermediate quantity.
For a deep neural network, capturing long-distance correlation is crucial; however, convolution operation is used for processing local areas, and the receptive field is limited, so that the correlation of long-distance feature information is difficult to capture; the method has good effect when detecting large-scale cloud layers, but has poor effect on scattered small-scale cloud clusters, and because the large square kernels extract too much information from irrelevant areas, the final prediction of the model is interfered, and the segmentation precision is reduced; in order to solve the problems, the invention provides a multi-scale strip-shaped pooling attention Module (MSPA) for further extracting multi-scale context information and deep space and channel information.
On one hand, the cloud and the cloud shadow have similar shapes, so that the cloud and the cloud shadow can be classified through context information, the edge between the cloud and the cloud shadow is subjected to detailed processing and segmentation, and the strip pooling can reduce the interference of other irrelevant areas in an image and more effectively identify scattered small-sized cloud and cloud shadow, so that the probability of missed detection and false detection of a detected target is reduced, and the segmentation effect is improved; on the other hand, after the strip pooling operation is performed, attention mechanism operation is performed next to extract multi-scale deep space information and multi-scale deep channel information in parallel, and the category information and the position information of cloud and cloud shadow are better concerned, so that the model can focus on important information in the image, and the segmentation effect is further improved.
In order to meet the requirements of thin cloud layers and the segmentation of irregular cloud and cloud shadow joints in a cloud and cloud shadow segmentation task, the scheme of the invention selects and promotes the mutual guidance of two adjacent layers of a backbone network to carry out feature mining, and fuses feature map information of the two adjacent layers extracted from the backbone network; however, simply combining two feature maps with different scales can cause loss of diversity of the two kinds of information, so the scheme of the invention designs a deep multi-head feedforward transfer attention module (DMEA) for enhancing the communication capability of two channels, so that two adjacent layers of a backbone network are guided to each other for feature mining, and the feature information fusion of images is promoted so as to provide more useful feature information for an upsampling process.
In order to reduce the influence of an interference object existing in an image on the cloud and cloud shadow prediction to the maximum extent when the cloud and cloud shadow is segmented, improve the overall anti-interference performance of a model, reduce the probability of false detection and missed detection, and further solve the problem that the irregular shape of the cloud and cloud shadow is difficult to predict accurately; in the decoding stage, the valve rod of the invention provides a bilateral feature fusion module (BFF) which is used for fusing low-level semantic information and high-level detail information; the feature representations of the detail branch and the context branch are complementary, one party being unaware of the information of the other party; there are several different ways to combine the two characteristic responses, i.e. element summation connections; however, the outputs of these two branches have different levels of feature representation, the detail branch for the lower level and the semantic branch for the higher level; therefore, the simple combination ignores the diversity of the two types of information, resulting in poor performance and difficulty in optimization; the bilateral feature fusion module can greatly improve the fusion of feature graphs output by two branches, and guides the feature response of a detail branch by using the semantic information of a context branch; by guiding in different scales, different scale feature representations can be extracted, and meanwhile compared with a simple combination, the guiding mode can achieve efficient information communication between the two branches.
Because the size and shape of the cloud and cloud shadows are arbitrary and irregular, it is difficult to detect boundary information; the scheme of the invention provides a new module (BRB) to re-predict the boundary details of cloud and cloud shadow through end-to-end training, and provides a training-enhancing strategy for solving the problem that the effect of segmenting a prediction graph is not ideal when the segmentation precision is not high, wherein a dropout link is added in the middle training process in the training stage to enhance feature representation, and the neuron in a 0.1 probability network can be discarded in the prediction stage, so that the segmentation precision can be improved to a certain extent, the prediction graph with good segmentation effect can be obtained, and the overfitting of the network can be prevented.
And pre-training the multi-scale attention feature aggregation network after the multi-scale attention feature aggregation network is constructed.
Acquisition of a training data set:
the cloud and cloud shadow data set used in the embodiment of the invention is from Google Earth (Google Earth) which is virtual Earth software developed by Google; it places satellite photographs, aerial photographs and geographic information systems on a three-dimensional model of the earth, google earth with an effective resolution of at least 100 meters, usually 30 meters, and an observation height (Eyealt) of 15 kilometers.
The data set consists of high-definition remote sensing images randomly acquired by professional meteorological experts in Qinghai, yunnan plateau, qinghai-Tibet plateau and Yangtze river delta; in order to better reflect the performance of the model, several groups of high-resolution cloud images with different shooting angles and heights are selected, due to the limitation of video storage capacity of the GPU, a high-resolution cloud remote sensing image with an original resolution of 4800 × 2692 is cut into a size of 224 × 224, and after screening, 12280 images are obtained, wherein 9824 images are used as a training set, 2456 images are used as a verification set, and the ratio of the training set to the verification set in the data set is 8/2.
Deep neural networks require a large amount of training data, but it is difficult to obtain these learning samples; therefore, when training samples are few, it is very necessary to use data enhancement to avoid overfitting; therefore, the embodiment of the invention performs data enhancement through translation, overturning and rotation. High resolution cloud and cloud shadow images obtained from Google Earth are divided into 5 types with different backgrounds, namely waters, forests, fields, towns and deserts, these tags are manually labeled, there are three types: cloud (red), cloud shadow (green), and background (black).
The embodiment of the invention adopts a supervised training mode, firstly, data enhancement processing is carried out on the images in the data set, and then the original images and the corresponding labels are converted into tensors and then input into a model for training; the batch size of each training is set to be 16, the learning rate is correspondingly reduced by adopting an equal interval adjustment learning rate (StepLR) strategy along with the increase of the training times to train so as to achieve a better training effect, wherein the initial learning rate is set to be 0.001, the attenuation coefficient is 0.98, the learning rate is updated every 3 times of training, and the training is performed for 200 times in total; the Adam algorithm is used as an optimizer in the training process.
After training is finished, the weight of the model is obtained, then a prediction stage of the model is started, the collected picture to be tested (original color picture) is input into a pre-trained multi-scale attention feature aggregation network, a mask image of cloud and cloud shadow is obtained, and identification of the cloud and the cloud shadow is completed.
Example 2
The embodiment of the invention provides a multi-scale aggregated cloud and cloud shadow identification system, which comprises:
the picture acquisition module: the method comprises the steps of obtaining a picture to be detected;
cloud and cloud shadow identification module: the method is used for inputting the picture to be detected into the pre-trained multi-scale attention feature aggregation network to obtain the mask image of the cloud and the cloud shadow, and the identification of the cloud and the cloud shadow is completed.
Example 3
The embodiment of the invention provides a multi-scale aggregated cloud and cloud shadow identification device, which comprises a processor and a storage medium, wherein the processor is used for processing cloud shadow and cloud shadow;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method of:
acquiring a picture to be detected;
and inputting the picture to be detected into a pre-trained multi-scale attention feature aggregation network to obtain a mask image of cloud and cloud shadow, and completing the identification of the cloud and the cloud shadow.
Example 4
An embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the following method steps:
acquiring a picture to be detected;
and inputting the picture to be detected into a pre-trained multi-scale attention feature aggregation network to obtain a mask image of the cloud and the cloud shadow, and completing the identification of the cloud and the cloud shadow.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The above description is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, several modifications and variations can be made without departing from the technical principle of the present invention, and these modifications and variations should also be regarded as the protection scope of the present invention.

Claims (9)

1. A multi-scale aggregated cloud and cloud shadow identification method is characterized by comprising the following steps:
acquiring a picture to be detected;
and inputting the picture to be detected into a pre-trained multi-scale attention feature aggregation network to obtain a mask image of cloud and cloud shadow, and completing the identification of the cloud and the cloud shadow.
2. The method of claim 1, wherein the multi-scale aggregated attention feature aggregation network is trained by:
acquiring training data;
performing data enhancement processing on images in the training data, and then converting the images and corresponding labels into tensors;
and inputting the tensor into a multi-scale attention feature aggregation network for training to obtain the trained multi-scale attention feature aggregation network.
3. The method of claim 1, wherein the multi-scale attention feature aggregation network comprises a multi-scale strip pooling attention module, which is composed of parallel 4 strip average pooling branches and an adaptive average pooling branch, two parallel strip convolution branches, a spatial attention module, and a channel attention module, and is used for extracting multi-scale context information and deep space and channel information;
the system comprises 4 average pooling branches and a self-adaptive average pooling branch, wherein the average pooling branches and the self-adaptive average pooling branch are used for extracting and adding a picture to be detected in parallel, recovering the size of the picture to be detected after obtaining a multi-scale characteristic diagram, and connecting the multi-scale characteristic diagram and the picture to be detected in a height dimension to obtain a weight vector;
respectively inputting the weight vectors into two parallel strip convolution branches, wherein the first branch consists of a convolution kernel 1 multiplied by 7 and a convolution kernel 7 multiplied by 1, the second branch consists of a convolution kernel 7 multiplied by 1 and a convolution kernel 1 multiplied by 7, the first branch extracts a feature map and then inputs the feature map into a space attention module to extract a first feature map containing space information, the second branch extracts a feature map and then inputs the feature map into a channel attention module to extract a second feature map containing channel information, and then the final feature maps are output after connection and interaction;
the calculation process of the channel attention module is as follows:
features were extracted using global average pooling and global maximum pooling, respectively:
Figure FDA0003791591590000021
Figure FDA0003791591590000022
wherein, x represents the characteristic diagram of the input,
Figure FDA0003791591590000023
and
Figure FDA0003791591590000024
second and third weight vectors representing the global max-pooling branch and global mean-pooling branch outputs, respectively, gmax and Gavg representing global max-pooling and global mean-pooling, respectively, C2D 1×1 A two-dimensional convolution representing a convolution kernel of 1 × 1;
splicing the features extracted by the global average pooling and the global maximum pooling:
Figure FDA0003791591590000025
wherein, CAT 3 Representing the stitching in the width dimension,
Figure FDA0003791591590000026
are images stitched in the width dimension;
size recovery, feature selection, re-weighting:
Figure FDA0003791591590000027
where CA (x) represents the first signature of the channel attention module output, DWC2D 1×2 Two-dimensional depth separable convolution, DWC2D, representing a convolution kernel of 1 x 2 1×1 A two-dimensional depth separable convolution with a convolution kernel of 1 × 1 is represented, and σ represents a nonlinear activation function Sigmoid;
the spatial attention module is calculated as follows:
extracting features by using global average pooling and global maximum pooling respectively, then connecting along channel dimensions, executing convolution operation, and generating a second feature map by using a nonlinear activation function:
SA(x)=σ(C2D 7×7 (CAT 1 (MP(x),AP(x))))
wherein SA (x) represents a second profile of spatial attention module output, C2D 7×7 Representing a two-dimensional convolution with a convolution kernel of 7 x 7, CAT 1 Indicating stitching in channel dimensions, MP and AP indicate maximum pooling and average pooling, respectively.
4. The method as claimed in claim 1, wherein the multi-scale aggregated cloud and cloud shadow recognition method is characterized in that the multi-scale attention feature aggregation network comprises a deep multi-head feedforward transfer attention module, which is used for promoting two adjacent layers of a backbone network in the multi-scale attention feature aggregation network to mutually guide feature mining, and fusing feature map information of two adjacent layers extracted from the backbone network;
firstly, carrying out layer normalization operation on the feature maps output by two adjacent layers to generate a first layer normalization tensor and a second layer normalization tensor, generating a query vector by the first layer normalization tensor, generating a key vector and a value vector by the second layer normalization tensor, and calculating as follows:
Figure FDA0003791591590000031
Figure FDA0003791591590000032
Figure FDA0003791591590000033
wherein X is a first-level normalized tensor, Y is a second-level normalized tensor, Q is a query vector, K is a key vector, V is a value vector,
Figure FDA0003791591590000034
the convolution kernel representing the computed query vector is a 1 x 1 two-dimensional convolution,
Figure FDA0003791591590000035
the convolution kernel representing the computed query vector is a 3 x 3 two-dimensional depth separable convolution,
Figure FDA0003791591590000036
the convolution kernel representing the calculated key vector is a 1 x 1 two-dimensional convolution,
Figure FDA0003791591590000037
the convolution kernel representing the computed key vector is a 3 x 3 two-dimensional depth separable convolution,
Figure FDA0003791591590000038
the convolution kernel representing the vector of calculated values is a 1 x 1 two-dimensional convolution,
Figure FDA0003791591590000039
the convolution kernel representing the vector of computed values is a 3 x 3 two-dimensional depth separable convolution;
reshaping the query vector and the key vector to have their dot product interactions generate a transposed attention map:
Attention(Q′,K′,V′)=V′·Softmax(K′·V′/β)
P′=C2D 1×1 Attention(Q′,K′,V′)+x+y
wherein x and y represent shallow and deep input feature maps, respectively, P ' is the output transposed Attention feature map, Q ', K ', V ' are three matrices obtained from the original size remodeling tensor, β is a scaling parameter, and Attention (Q ', K ', V ') is an Attention function;
inputting the transposed attention feature map into a feedforward network to obtain a feedforward feature map:
Z 1 =DWC2D 3×3 ((C2D 1×1 (Z))
Z 2 =δ(Z 1 )⊙Z 1
Z′=C2D 1×1 (Z 2 )+Z
wherein Z is a third layer normalization tensor, C2D, obtained by performing layer normalization on the transposed attention feature map 1×1 Is a two-dimensional convolution with a convolution kernel of 1 × 1, DWC2D 3×3 Is a two-dimensional depth separable convolution with a convolution kernel of 3 x 3, Z 1 Is a first feed-forward intermediate diagram, indicating a dot product, δ being a Gelu nonlinear activation function, Z 2 Is the second feedforward intermediate graph, and Z' is the feedforward characteristic graph of the output.
5. The method of claim 1, wherein the multi-scale aggregated cloud and cloud shadow recognition comprises a bilateral feature fusion module comprising a detail branch and a context branch;
and simultaneously inputting the feature diagram output by the detail branch into two branches to obtain two feature-mapped detail output values:
Figure FDA0003791591590000041
Figure FDA0003791591590000042
wherein x _ d is a feature map of the detail branch output,
Figure FDA0003791591590000043
two-dimensional depth separable convolution with a convolution kernel of 3 x 3 and an expansion factor of 2, BN batch normalization, R Relu activation function, AP average pooling, and gamma 1 And gamma 2 Respectively representing the detail output values after the two feature mappings;
simultaneously inputting the feature diagram output by the context branch into two branches to obtain two feature mapped context output values:
y 1 =γ 1 ·σ(x_c up )
Figure FDA0003791591590000051
wherein, x _ c up Is a characteristic diagram of the context branch output, sigma represents a nonlinear activation function Sigmoid,
Figure FDA0003791591590000052
two-dimensional convolution with a convolution kernel of 3 × 3 and a dilation factor of 2, DWC2D 3×3 Representing the convolution kernel as a 3 x 3 two-dimensional depth separable convolution, y 1 And y 2 Respectively representing context output values after the two characteristic mappings;
adding the context output values after the two feature mappings, and then performing two-dimensional depth separable convolution, batch normalization processing and activation processing to obtain a bilateral feature fusion feature map:
y out =R(BN(DWC2D 3×3 (y 1 +y 2 )))
wherein, y out Is a bilateral feature fusion feature map.
6. The method for cloud and cloud shadow recognition for multi-scale aggregation according to claim 1, wherein the multi-scale attention feature aggregation network comprises a boundary refinement boosting module for enhancing detection of cloud and cloud shadow complex edge information, and the calculation process of the boundary refinement boosting module is as follows:
x′=C2D 3×3 (C2D 3×3 (x))+x
y=drop(C2D 3×3 (x′))
y′=Up(C2D 1×1 (y))
wherein x and y' respectively represent the input value and the output value of the boundary refinement boosting module, and C2D 1×1 Representing a two-dimensional convolution with a convolution kernel of 1 × 1, up representing 2 times upsampling, C2D 3×3 Representing a two-dimensional convolution with a convolution kernel of 3 x 3, drop representing the dropout algorithm, x' representing a first refinement intermediate quantity, and y representing a second refinement intermediate quantity.
7. A multiscale aggregated cloud and cloud shadow identification system, comprising:
the picture acquisition module: the method comprises the steps of obtaining a picture to be detected;
cloud and cloud shadow identification module: the method is used for inputting the picture to be detected into the pre-trained multi-scale attention feature aggregation network to obtain the mask image of the cloud and the cloud shadow, and the identification of the cloud and the cloud shadow is completed.
8. A multi-scale aggregated cloud and cloud shadow recognition device, comprising a processor and a storage medium;
the storage medium is used for storing instructions;
the processor is configured to operate in accordance with the instructions to perform the steps of the method according to any one of claims 1 to 6.
9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 6.
CN202210959194.2A 2022-08-10 2022-08-10 Multi-scale aggregated cloud and cloud shadow identification method, system, equipment and storage medium Pending CN115410081A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210959194.2A CN115410081A (en) 2022-08-10 2022-08-10 Multi-scale aggregated cloud and cloud shadow identification method, system, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210959194.2A CN115410081A (en) 2022-08-10 2022-08-10 Multi-scale aggregated cloud and cloud shadow identification method, system, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115410081A true CN115410081A (en) 2022-11-29

Family

ID=84158523

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210959194.2A Pending CN115410081A (en) 2022-08-10 2022-08-10 Multi-scale aggregated cloud and cloud shadow identification method, system, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115410081A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116703928A (en) * 2023-08-08 2023-09-05 宁德市天铭新能源汽车配件有限公司 Automobile part production detection method and system based on machine learning
CN117173103A (en) * 2023-08-04 2023-12-05 山东大学 Image shadow detection method and system
CN117612029A (en) * 2023-12-21 2024-02-27 石家庄铁道大学 Remote sensing image target detection method based on progressive feature smoothing and scale adaptive expansion convolution

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117173103A (en) * 2023-08-04 2023-12-05 山东大学 Image shadow detection method and system
CN117173103B (en) * 2023-08-04 2024-04-12 山东大学 Image shadow detection method and system
CN116703928A (en) * 2023-08-08 2023-09-05 宁德市天铭新能源汽车配件有限公司 Automobile part production detection method and system based on machine learning
CN116703928B (en) * 2023-08-08 2023-10-27 宁德市天铭新能源汽车配件有限公司 Automobile part production detection method and system based on machine learning
CN117612029A (en) * 2023-12-21 2024-02-27 石家庄铁道大学 Remote sensing image target detection method based on progressive feature smoothing and scale adaptive expansion convolution
CN117612029B (en) * 2023-12-21 2024-05-24 石家庄铁道大学 Remote sensing image target detection method based on progressive feature smoothing and scale adaptive expansion convolution

Similar Documents

Publication Publication Date Title
CN109934200B (en) RGB color remote sensing image cloud detection method and system based on improved M-Net
CN109543606B (en) Human face recognition method with attention mechanism
Ren et al. Unsupervised change detection in satellite images with generative adversarial network
CN113158862B (en) Multitasking-based lightweight real-time face detection method
CN115410081A (en) Multi-scale aggregated cloud and cloud shadow identification method, system, equipment and storage medium
CN111079739B (en) Multi-scale attention feature detection method
CN113591968A (en) Infrared weak and small target detection method based on asymmetric attention feature fusion
CN113609896A (en) Object-level remote sensing change detection method and system based on dual-correlation attention
CN113298815A (en) Semi-supervised remote sensing image semantic segmentation method and device and computer equipment
CN111310766A (en) License plate identification method based on coding and decoding and two-dimensional attention mechanism
CN105243154A (en) Remote sensing image retrieval method and system based on significant point characteristics and spare self-encodings
CN112329771B (en) Deep learning-based building material sample identification method
CN114724155A (en) Scene text detection method, system and equipment based on deep convolutional neural network
CN110827312A (en) Learning method based on cooperative visual attention neural network
Chen et al. ASF-Net: Adaptive screening feature network for building footprint extraction from remote-sensing images
Chong et al. Context union edge network for semantic segmentation of small-scale objects in very high resolution remote sensing images
CN113111716A (en) Remote sensing image semi-automatic labeling method and device based on deep learning
Liu et al. Two-stage underwater object detection network using swin transformer
Song et al. PSTNet: Progressive sampling transformer network for remote sensing image change detection
Sun et al. IRDCLNet: Instance segmentation of ship images based on interference reduction and dynamic contour learning in foggy scenes
CN111126155A (en) Pedestrian re-identification method for generating confrontation network based on semantic constraint
Wang et al. Detection of SAR image multiscale ship targets in complex inshore scenes based on improved YOLOv5
Wu et al. AMR-Net: Arbitrary-oriented ship detection using attention module, multi-scale feature fusion and rotation pseudo-label
Gao et al. PE-Transformer: Path enhanced transformer for improving underwater object detection
CN117079095A (en) Deep learning-based high-altitude parabolic detection method, system, medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination