CN117197462A

CN117197462A - Lightweight foundation cloud segmentation method and system based on multi-scale feature fusion and alignment

Info

Publication number: CN117197462A
Application number: CN202311187455.4A
Authority: CN
Inventors: 李晟; 王敏; 吴佳; 孙硕; 石明航; 马睿; 许永琪; 何瑞; 朱永楠
Original assignee: Nanjing University of Information Science and Technology
Current assignee: Nanjing University of Information Science and Technology
Priority date: 2023-09-14
Filing date: 2023-09-14
Publication date: 2023-12-08

Abstract

The invention discloses a lightweight foundation cloud segmentation method and a lightweight foundation cloud segmentation system based on multi-scale feature fusion and alignment, which relate to the technical field of image processing and comprise the following steps: receiving a foundation cloud image data set, preprocessing the foundation cloud image data set to obtain a foundation cloud image preprocessing data set, and dividing the foundation cloud image preprocessing data set into a training set, a testing set and a verification set; the foundation cloud image data set is composed of cloud images of various categories obtained under different scenes; inputting the training set into a pre-established lightweight foundation cloud segmentation network model based on the multi-scale feature fusion and feature alignment of the improved deep LabV3+, so as to obtain a trained cloud segmentation model; inputting the test set and the verification set into the trained cloud segmentation model, and verifying the segmentation effect of the test cloud segmentation model, so that the foundation cloud of the complex environment is segmented.

Description

Lightweight foundation cloud segmentation method and system based on multi-scale feature fusion and alignment

Technical Field

The invention relates to the technical field of image processing, in particular to a lightweight foundation cloud segmentation method and system based on multi-scale feature fusion and alignment.

Background

The cloud is taken as an important component of the global thermodynamic equilibrium and the water-gas circulation, and the quantity, the form change and the like reflect the stability degree of the atmospheric motion and the energy balance condition of the surface radiation. Cloud is widely researched and applied in the aspects of instant weather forecast, instant precipitation forecast, artificial precipitation, cloud cover forecast, irradiance forecast, aviation control, satellite ground-air communication optimization and the like. In order to fully analyze the cloud image data and obtain important parameters of the cloud image, effective and accurate cloud image segmentation research has important significance.

The ground cloud observation has high flexibility and accessibility, the bottom characteristic acquisition information of the local cloud can be monitored conveniently, local cloud conditions can be accurately acquired by utilizing the high time and spatial resolution of the ground cloud image, meanwhile, accurate cloud image segmentation is a main premise of carrying out cloud image analysis by the ground-based all-day imaging equipment, the accuracy of cloud information acquisition can be improved, and the follow-up further understanding of climate conditions is facilitated.

Because the cloud has fuzzy boundary and complex texture, and the shape characteristics of the cloud graph can change correspondingly when the cloud changes with time and illumination, the classical image segmentation algorithm based on shape prior cannot be applied to the segmentation field of the foundation cloud graph. The machine learning algorithm is sensitive to parameter selection, lacks robustness, and needs to do feature engineering manually when the foreground and the background in the cloud picture are complex. The existing deep learning algorithm has relatively good segmentation effect, but has the problems of poor segmentation effect caused by the problems of large influence of illumination, fuzzy boundary of thin cloud and the like, and has the problems of missing segmentation and incorrect segmentation. The existing model is insufficient in extraction of cloud features, deeper semantic features in a cloud picture cannot be obtained, and multi-scale features cannot be fused.

Disclosure of Invention

In order to solve the above-mentioned shortcomings in the background art, the present invention aims to provide a lightweight ground cloud segmentation method and system based on multi-scale feature fusion and alignment.

The aim of the invention can be achieved by the following technical scheme: a lightweight foundation cloud segmentation method based on multi-scale feature fusion and alignment comprises the following steps:

receiving a foundation cloud image data set, preprocessing the foundation cloud image data set to obtain a foundation cloud image preprocessing data set, and dividing the foundation cloud image preprocessing data set into a training set, a testing set and a verification set; the foundation cloud image data set is composed of cloud images of various categories obtained under different scenes;

inputting the training set into a pre-established lightweight foundation cloud segmentation network model based on the multi-scale feature fusion and feature alignment of the improved deep LabV3+, so as to obtain a trained cloud segmentation model;

inputting the test set and the verification set into the trained cloud segmentation model, and verifying the segmentation effect of the test cloud segmentation model, so that the foundation cloud of the complex environment is segmented.

Preferably, the acquiring process of the foundation cloud image data set adopts a full-sky imager to acquire multiple types of cloud images under different time and different scenes, including night cloud images and cloud images containing interference factors, and the acquired multiple types of cloud images under different time and different scenes are collected into a foundation cloud image data set with three RGB channels and a format of jpg.

Preferably, the process of preprocessing the ground cloud image data set comprises the following steps:

screening cloud pictures in the foundation cloud picture data set to remove samples of large interference factors; labeling elements in the cloud picture to obtain a real label; and (3) carrying out normalization and data enhancement after image cutting, obtaining a ground cloud picture preprocessing data set after expansion, and dividing the ground cloud picture preprocessing data set into a training set, a test set and a verification set according to a proportion.

Preferably, the lightweight ground cloud segmentation network model based on improved deep labv3+ multi-scale feature fusion and feature alignment uses improved lightweight EfficientNet-S as a backbone extraction network: simplifying EfficientNet-S, adjusting the use architecture of Fused-MBConv, and adjusting the combination ratio of Fused-MBConv and MBConv.

Preferably, the heteroreceptive field splicing ASPP module is designed in the Encoder of the light-weight foundation cloud segmentation network model based on the multi-scale feature fusion and feature alignment of the improved deep labv3+: the method comprises the steps of interactively connecting three convolution layer branches with different expansion rates of 6, 12 and 18, extracting and fusing multi-scale characteristic information by using the characteristic information of the last branch by the convolution layer of the next branch, adding 1X 1 convolution to reduce the number of parameters before each branch, and superposing the output receptive fields of all levels of convolution networks by using the receptive fields of the different receptive field spliced ASPP structure, wherein the calculation formula is as follows:

wherein RF represents the size of the receptive field output by the heteroreceptive field ASPP structure, n represents the number of cascade convolutions, and RF _i Indicating the i-th convolution output receptive field size.

Preferably, the lightweight foundation cloud segmentation network model design hybrid strip pooling module based on improved deep labv3+ multi-scale feature fusion and feature alignment replaces global average pooling, uses horizontal pooling and vertical pooling to construct remote dependency relations respectively, and captures remote context information of different dimensions.

The hybrid strip pooling module first inputs the feature map x=r ^C×H×W Performing vertical stripe pooling with pooling core of H×1, namely averaging each column of eigenvalues in the eigenvector x to obtain a row vector y of C× 1×W ^v The calculation formula is as follows:

then horizontal stripe pooling with pooling core of 1 XW is carried out to obtain column vector y of CxH x 1 ^h The calculation formula is as follows:

wherein y is ^v Representing a row vector, y ^h Representing column vectors, C representing channel numbers, H, W representing the height and width of the feature map, i, j representing the ith row and jth column of the feature map;

respectively expanding the results after the vertical strip pooling and the horizontal strip pooling to obtain y ₁ And y ₂ And combining with the input feature images respectively, and finally adding to obtain z, wherein the calculation formula of z is as follows:

y ₁ ＝Scale(x,σ(f(y ^h )))

y ₂ ＝Scale(x,σ(f(y ^v )))

z＝y ₁ +y ₂

wherein y is ₁ And y ₂ The sub table represents an output vector of the transverse vector and the column vector after expansion operation, scale () represents multiplication between elements element by element, σ represents a Sigmoid activation function, and f () represents 1×1 convolution.

Preferably, in the Decoder of the lightweight foundation cloud segmentation network model based on improved deep labv3+ multi-scale feature fusion and feature alignment, two layers of shallow features are extracted from a backbone extraction network afflicientnet-S, and the two layers of shallow features are led into a feature alignment module A-FAM based on an attention mechanism, and 1×1 dimension reduction is used; the deep feature map extracted by the different-experience spliced ASPP is subjected to 4 times up-sampling and is guided into a second-stage feature alignment module A-FAM together, 3X 3 convolution is used for dimension reduction, and the original map size is restored through 4 times up-sampling, so that a prediction map is obtained;

the feature alignment module A-FAM design concept based on the attention mechanism is based on a compression excitation network SENet, the attention mechanism of the parallel channels of the two branches comprises an extrusion part and an excitation part, the extrusion part updates weights by global average pooling and global maximum pooling to obtain a one-dimensional vector z of 1 multiplied by C, and the excitation part calculates the relation among the channels by using a full-connection layer:

s＝σ[g(z,ω)]＝σ[ω ₂ δ(ω ₁ z)]

where s represents a weight vector, σ represents a Sigmoid function, δ represents a ReLu activation function, ω ₁ And omega ₂ Representing the weights, z represents the weight vector after the squeeze part is updated.

Firstly, adopting the weight omega ₁ The full connection layer of (2) reduces the dimension of the channel to the initial 1/H, and the channel is transmitted to the second weight omega after being activated by the intermediate layer ReLu function delta ₂ To the initial number of channels; then, the feature vectors after the two branches are up-scaled are added, and normalized channel weight vector s epsilon R is generated through Sigmoid function sigma ^1×1×C The method comprises the steps of carrying out a first treatment on the surface of the Then and shallow layer feature F ₁ Obtaining the characteristic F of the correction weight after the channel-by-channel multiplication operation _m And deep features F _h The merged output has a deep-shallow alignment feature with "channel attention

F _m ＝F ₁ ·σ{ω ₂ δ{ω ₁ [A _vgpool (F _h )]}+ω ₂ δ{ω ₁ [M _axpool (F _h )]}}

Wherein A is _vgpool Represents global average pooling, M _axpool Representing global maximization, F ₁ Representing shallow features, F _h Representing deep features, F _m Representing the weight corrected feature. The channel attenuation rate H is set to be 16, and the parameter sharing operation is realized by the full connection layer in the parallel channels of the two branches at the front end.

Preferably, the process of inputting the test set and the verification set into the trained cloud segmentation model and verifying the segmentation effect of the test cloud segmentation model includes:

visualizing the segmentation effect, visually comparing cloud feature extraction and decoding capabilities of the model, and displaying the segmentation performance of the model;

comparing actual segmentation effects of the Efficient NetV2-S trunk feature extraction network, the different receptive fields spliced ASPP and the feature alignment A-FAM module in the trained cloud segmentation model based on an attention mechanism;

comparing and analyzing the segmentation effect of the cloud segmentation network and other main stream semantic segmentation networks on the data set; comparing the magnitude of the parameter quantity among the models, and verifying the light weight level of the trained cloud segmentation model;

and verifying the generalization capability of the trained cloud segmentation model on the public data set CCSN without the real label.

In a second aspect, to achieve the above object, the present invention discloses a lightweight foundation cloud segmentation system based on multi-scale feature fusion and alignment, comprising:

and a data processing module: the method comprises the steps of receiving a foundation cloud image data set, preprocessing the foundation cloud image data set to obtain a foundation cloud image preprocessing data set, and dividing the foundation cloud image preprocessing data set into a training set, a testing set and a verification set; the foundation cloud image data set is composed of cloud images of various categories obtained under different scenes;

training module: the training set is input into a pre-established lightweight foundation cloud segmentation network model based on the multi-scale feature fusion and feature alignment of the improved deep LabV3+, so as to obtain a trained cloud segmentation model;

and a test verification module: the method is used for inputting the test set and the verification set into the trained cloud segmentation model, and verifying the segmentation effect of the test cloud segmentation model, so that the foundation cloud of the complex environment is segmented.

In another aspect of the present invention, in order to achieve the above object, there is disclosed an apparatus comprising:

one or more processors;

a memory for storing one or more programs;

when executed by one or more of the processors, causes the one or more processors to implement the lightweight ground based cloud segmentation method based on multi-scale feature fusion and alignment as described above.

The invention has the beneficial effects that:

firstly, the invention designs a lightweight foundation cloud segmentation network model based on multi-scale feature fusion and alignment based on EfficientNet-S and DeepLabV3+, which has low calculation complexity, small parameter quantity and high segmentation precision, and effectively solves the problems of cloud body missing segmentation and false segmentation and thin cloud boundary segmentation ambiguity.

Secondly, the method has better data compared with other segmentation models on mPA, F1-score and mPOU evaluation indexes, and has great advantages of segmentation effect.

Thirdly, the method has excellent generalization capability and robustness, and can be suitable for foundation cloud segmentation in complex environments.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, and it will be obvious to those skilled in the art that other drawings can be obtained according to these drawings without inventive effort;

FIG. 1 is a schematic flow chart of the method of the present invention;

FIG. 2 is a flowchart of the overall operation of the present invention;

FIG. 3 is a diagram of the original DeepLabV3+ network structure in the present invention;

FIG. 4 is a diagram of a lightweight deep LabV3+ network architecture based on multi-scale feature fusion and alignment improvement in the present invention;

FIG. 5 is a block diagram of a hybrid stripe pooling module in accordance with the present invention;

FIG. 6 is a diagram of a feature alignment module A-FAM based on an attention mechanism in accordance with the present invention;

FIG. 7 is a schematic diagram of the system of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

As shown in fig. 1, the lightweight ground based cloud segmentation method for multi-scale feature fusion and alignment comprises the following steps:

in this embodiment, the ground cloud image dataset adopts an all-sky imager to obtain multiple types of cloud images in different scenes at different times, which can include night cloud images and cloud images containing interference factors, and the cloud images are collected as a ground cloud image dataset with three channels of RGB and jpg format.

It should be further noted that the process of preprocessing the ground cloud image dataset includes:

and screening the cloud pictures, selecting the cloud pictures with higher resolution and smaller interference factors as suitable samples of the data set, and marking the cloud, the sky and the background in the selected image by adopting an image marking tool Labelme to obtain a real label corresponding to each cloud picture.

And cutting the cloud picture into 600 multiplied by 600 pixels, carrying out standard normalization on the image, normalizing the gray level of the cloud picture to a certain range, improving the calculation speed and accelerating the convergence of the cloud segmentation model. And then data enhancement is carried out, rotation, horizontal overturning, brightness change and proper noise increase are carried out on the cloud image at random, the expansion of an original data set is realized, the sample size is expanded to 10 times, and the training set, the test set and the verification set are divided according to the proportion of 8:1:1.

it should be further noted that in this embodiment, a lightweight ground cloud segmentation network model based on the multi-scale feature fusion and feature alignment of the improved deep labv3+ is constructed. Improved lightweight Efficient Net-S was used as the backbone extraction network: simplifying EfficientNet-S, adjusting the use architecture of Fused-MBConv, and adjusting the combination ratio of Fused-MBConv and MBConv.

In an Encoder Encoder, an heteroreceptive field splicing ASPP module is designed, three convolution layer branches with different expansion rates of 6, 12 and 18 are connected in an interactive mode, the characteristic information of the upper branch can be utilized by the convolution layer of the lower branch, and the characteristic information of multiple scales is extracted and fused to solve the problem of inaccurate cloud edge detection; each branch is preceded by a 1 x 1 convolution reduction parameter. The common convolution is replaced by the depth separable cavity convolution, so that the problem of increased calculation amount caused by the splicing of the heteroreceptive fields is solved.

The receptive field of the different receptive field spliced ASPP structure is superposition of output receptive fields of all stages of convolution networks, and the calculation formula is as follows:

And a mixed strip pooling module is designed to replace global average pooling, a remote dependency relationship is constructed by using horizontal pooling and vertical pooling respectively, and remote context information of different dimensions is captured.

The module first inputs a characteristic diagram x=r ^C×H×W Vertical stripe pooling with pooling core H×1, i.e. para-tertiaryEach column of eigenvalue in the sign graph x is averaged to obtain a row vector y of Cx1×W ^v The calculation formula is as follows:

wherein y is ^v Representing a row vector, y ^h Representing column vectors, C representing channel number, H, W representing the height and width of the feature map, i, j representing the ith row and jth column of the feature map.

Respectively expanding the results after the vertical strip pooling and the horizontal strip pooling to obtain y ₁ And y ₂ And combining with the input feature images respectively, and finally adding to obtain z. The formula for z is as follows:

y ₁ ＝Scale(x,σ(f(y ^h )))

y ₂ ＝Scale(x,σ(f(y ^v )))

z＝y ₁ +y ₂

In a Decoder, extracting two layers of shallow layer features from a backbone extraction network EfficientNet-S, and guiding the two layers of shallow layer features to a feature alignment module A-FAM based on an attention mechanism, wherein 1 multiplied by 1 is used for dimension reduction; and 4 times of up-sampling is carried out on the deep feature image extracted by the different-experience spliced ASPP, the deep feature image is guided to a second-stage feature alignment module A-FAM together, 3X 3 convolution is used for dimension reduction, and the original image size is restored through 4 times of up-sampling, so that a prediction image is obtained.

The characteristic alignment module A-FAM design concept based on the attention mechanism is based on a compression excitation network SENet, and the attention mechanism of the parallel channels of the two branches comprises an extrusion part and an excitation part. The squeezing part updates the weights using global average pooling and global maximum pooling, respectively, resulting in a one-dimensional vector z of 1 x C. The excitation section calculates the relationship between channels using the fully connected layers:

s＝σ[g(z,ω)]＝σ[ω ₂ δ(ω ₁ z)]

Firstly, adopting the weight omega ₁ The full connection layer of (2) reduces the dimension of the channel to the initial 1/H, and the channel is transmitted to the second weight omega after being activated by the intermediate layer ReLu function delta ₂ To the initial number of channels; then, the feature vectors after the two branches are up-scaled are added, and normalized channel weight vector s epsilon R is generated through Sigmoid function sigma ^1×1×C The method comprises the steps of carrying out a first treatment on the surface of the Then and shallow layer feature F ₁ Obtaining the characteristic F of the correction weight after the channel-by-channel multiplication operation _m And deep features F _m The merged output channel attention depth alignment features.

The cross entropy function is selected to calculate the loss, and the DiceLoss is introduced to calculate the loss from the global.

Wherein X represents a division region of the prediction result, Y represents a true value region, and X n Y represents a superposition portion of the two.

And selecting average pixel precision mPA, F1-score and average cross ratio mPOU as evaluation indexes of cloud image segmentation. The mIoU judges the cloud picture segmentation effect from the coincidence ratio of the real label and the predictive label, and the calculation formula is as follows:

where q+1 represents the number of classes including the background sky, and represents the number of pixels whose true value is i class but predicted and determined as j class.

And analyzing a Loss curve and an mIoU curve graph in the cloud segmentation model training process, and analyzing the fitting period and the fitting reason of the model.

And visualizing the segmentation effect, visually comparing the cloud feature extraction and decoding capabilities of the model, and displaying the segmentation performance of the model.

An ablation experiment is designed, and the actual segmentation effect of an Efficient NetV2-S trunk feature extraction network, an heteroreceptive field spliced ASPP and a feature alignment A-FAM module based on an attention mechanism in an improved deep LabV3+ cloud segmentation model is compared.

Comparing and analyzing the segmentation effect of the cloud segmentation network and other main stream semantic segmentation networks on the data set by adopting a comparison experiment; and comparing the magnitude of the parameter quantity among the models, and verifying the light weight level of the model.

The generalization capability of the cloud segmentation model is verified on a public dataset CCSN without real tags.

As shown in fig. 3 and 4, the invention adopts an improved lightweight network EfficientNetV2-S as a backbone extraction network, so as to reduce the parameter quantity of a model; in the decoder, an heteroreceptive field splicing ASPP module is provided on the basis of the original ASPP, a plurality of convolution layer branches with different expansion rates are interactively connected, the correlation of the characteristics among the branches is improved, the information sharing and fusion of the branches are realized, and the multi-scale characteristic information is fully extracted and fused. The common convolution in the ASPP module is replaced by the depth separable cavity convolution, so that the problem of increased calculation amount of model parameters after the different receptive fields are spliced can be solved, and the training speed is increased. In the decoder, shallow layer characteristics of two branches are selected, the shallow layer characteristic images are up-sampled and then guided to an A-FAM module for splicing, and channels and spatial characteristics of the shallow layer input characteristic images of the foundation cloud image are fully excavated. And then the image and the deemphasis deep feature map are guided into the A-FAM again, and the information borne by the shallow features is captured by using cross-layer connection, so that semantic information and detail information of the image are further enriched. After the features are refined through 3×3 convolution, the feature image size is restored to the original image size by using bilinear interpolation up-sampling, and the problem of partial feature information loss caused by overlarge sampling amplitude is solved.

Table 1 below shows the improved lightweight network EfficientNetV2-S architecture parameters employed in the present invention, divided into 8 stages, 19 layers in total, stages 1, 2 employing Fused-MBConv, stages 3, 4, 5, 6 employing MBConv, replacing the MBConv convolution kernel size to 5×5, adjusting the number of channels to 960, increasing the model progress, and reducing the number of parameters of the model.

TABLE 1

Stage	Operator	Stride	#Channels	#Layers
					0	Conv3×3	2	16	1
1	Fused-MBConv1,k3×3	1	16	1
					2	Fused-MBConv4,k3×3	2	24	2
3	MBConv4,k5×5,SE0.25	2	40	2
					4	MBConv4,k5×5,SE0.25	2	80	3
5	MBConv6,k5×5,SE0.25	1	112	3
					6	MBConv6,k5×5,SE0.25	2	192	4
7	Conv1×1&Pooling&FC	-	960	1

As shown in fig. 5, the hybrid stripe pooling module pooling core is in a strip shape, uses horizontal and vertical pooling to construct remote dependency relationships, and captures remote context information of different dimensions.

As shown in fig. 6, the compressed portion uses global average pooling and global maximum pooling statistical channel information based on the attention-based feature alignment module a-FAM. The excitation part adopts the weight omega ₁ The number of channels is reduced by the full connection layer of (2), and the channel number is transmitted to the weight omega after being activated by ReLu ₂ The number of channels is recovered by the full connection layer of (a). Adding the two paths of feature vectors, normalizing, removing low-level feature multiplication, and finally obtaining corrected feature F _m 。

Table 2 below shows the results of the segmentation performance evaluation indexes of different models on the data set and the parameter values of different models by comparing the cloud segmentation model with other semantic segmentation models.

TABLE 2

In a second aspect, as shown in fig. 7, in order to achieve the above object, an embodiment of the present invention discloses a lightweight foundation cloud segmentation system based on multi-scale feature fusion and alignment, including:

Based on the same inventive concept, the present invention also provides a computer apparatus comprising: one or more processors, and memory for storing one or more computer programs; the program includes program instructions and the processor is configured to execute the program instructions stored in the memory. The processor may be a central processing unit (Central Processing Unit, CPU), but may also be other general purpose processors, digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application SpecificIntegrated Circuit, ASIC), field-Programmable gate arrays (FPGAs) or other Programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc., which are the computational core and control core of the terminal for implementing one or more instructions, in particular for loading and executing one or more instructions within a computer storage medium to implement the methods described above.

It should be further noted that, based on the same inventive concept, the present invention also provides a computer storage medium having a computer program stored thereon, which when executed by a processor performs the above method. The storage media may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electrical, magnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present disclosure. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

The foregoing has shown and described the basic principles, principal features, and advantages of the present disclosure. It will be understood by those skilled in the art that the present disclosure is not limited to the embodiments described above, which have been described in the foregoing and description merely illustrates the principles of the disclosure, and that various changes and modifications may be made therein without departing from the spirit and scope of the disclosure, which is defined in the appended claims.

Claims

1. The lightweight foundation cloud segmentation method based on multi-scale feature fusion and alignment is characterized by comprising the following steps of:

2. The method for segmenting the lightweight foundation cloud based on multi-scale feature fusion and alignment according to claim 1, wherein the process of acquiring the foundation cloud image dataset is that an all-sky imager is adopted to acquire cloud images of various categories under different time and different scenes, including night cloud images and cloud images containing interference factors, and the cloud images of various categories acquired under different time and different scenes are collected into a foundation cloud image dataset with three RGB channels and a format of jpg.

3. The method for lightweight ground cloud segmentation based on multi-scale feature fusion and alignment according to claim 1, wherein the preprocessing of the ground cloud image dataset comprises the following steps:

4. The multi-scale feature fusion and alignment-based lightweight ground cloud segmentation method as claimed in claim 1, wherein the improved deep labv3+ multi-scale feature fusion and feature alignment-based lightweight ground cloud segmentation network model uses an improved lightweight EfficientNet-S as a backbone extraction network: simplifying EfficientNet-S, adjusting the use architecture of Fused-MBConv, and adjusting the combination ratio of Fused-MBConv and MBConv.

5. The method for lightweight foundation cloud segmentation based on multi-scale feature fusion and alignment according to claim 4, wherein a different receptive field splicing ASPP module is designed in an Encoder of the lightweight foundation cloud segmentation network model based on improved deep labv3+. The method comprises the steps of interactively connecting three convolution layer branches with different expansion rates of 6, 12 and 18, extracting and fusing multi-scale characteristic information by using the characteristic information of the last branch by the convolution layer of the next branch, adding 1X 1 convolution to reduce the number of parameters before each branch, and superposing the output receptive fields of all levels of convolution networks by using the receptive fields of the different receptive field spliced ASPP structure, wherein the calculation formula is as follows:

6. The multi-scale feature fusion and alignment-based lightweight ground cloud segmentation method according to claim 5, wherein the multi-scale feature fusion and feature alignment-based lightweight ground cloud segmentation network model design hybrid strip pooling module based on improved deep labv3+ replaces global averaging pooling, uses horizontal pooling and vertical pooling to construct remote dependency relationships respectively, and captures remote context information of different dimensions;

y ₁ ＝Scale(x,σ(f(y ^h )))

y ₂ ＝Scale(x,σ(f(y ^v )))

z＝y ₁ +y ₂

7. The method for lightweight ground cloud segmentation based on multi-scale feature fusion and alignment according to claim 6, wherein in a Decoder of the lightweight ground cloud segmentation network model based on the multi-scale feature fusion and feature alignment of improved deep labv3+, two layers of shallow features are extracted from a backbone extraction network EfficientNet-S, and the two layers of shallow features are guided to a feature alignment module a-FAM based on an attention mechanism, and 1×1 dimension reduction is used; the deep feature map extracted by the different-experience spliced ASPP is subjected to 4 times up-sampling and is guided into a second-stage feature alignment module A-FAM together, 3X 3 convolution is used for dimension reduction, and the original map size is restored through 4 times up-sampling, so that a prediction map is obtained;

s＝σ[g(z,ω)]＝σ[ω ₂ δ(ω ₁ z)]

where s represents a weight vector, σ represents a Sigmoid function, δ represents a ReLu activation function, ω ₁ And omega ₂ Representing the weight, z representing the weight vector after the squeeze part is updated;

Wherein A is _vgpool Represents global average pooling, M _axpool Representing global maximization, F ₁ Representing shallow features, F _h Representing deep features, F _m And the characteristic after the weight correction is represented, the channel attenuation rate H is set to be 16, and the parameter sharing operation is realized by the full-connection layer in the parallel channels of the two branches at the front end.

8. The method for lightweight foundation cloud segmentation based on multi-scale feature fusion and alignment according to claim 1, wherein the process of inputting the test set and the verification set into the trained cloud segmentation model and verifying the segmentation effect of the test cloud segmentation model comprises the following steps:

9. Light-weight foundation cloud segmentation system based on multiscale feature fusion and alignment, characterized by comprising:

10. An apparatus, comprising:

one or more processors;

a memory for storing one or more programs;

when executed by one or more of the processors, causes the one or more processors to implement the multi-scale feature fusion and alignment-based lightweight foundation cloud segmentation method according to any of claims 1-8.