CN115170801A - FDA-deep Lab semantic segmentation algorithm based on double-attention mechanism fusion - Google Patents

FDA-deep Lab semantic segmentation algorithm based on double-attention mechanism fusion Download PDF

Info

Publication number
CN115170801A
CN115170801A CN202210852168.XA CN202210852168A CN115170801A CN 115170801 A CN115170801 A CN 115170801A CN 202210852168 A CN202210852168 A CN 202210852168A CN 115170801 A CN115170801 A CN 115170801A
Authority
CN
China
Prior art keywords
double
model
fda
attention
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210852168.XA
Other languages
Chinese (zh)
Inventor
张小国
滕浩
丁立早
杜文俊
王�琦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202210852168.XA priority Critical patent/CN115170801A/en
Publication of CN115170801A publication Critical patent/CN115170801A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • General Engineering & Computer Science (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The invention designs an FDA-deep Lab semantic segmentation algorithm based on double attention mechanism fusion, which mainly comprises the following steps: building a feature extraction network of a ResNet-50 structure according to a DeepLabv3+ model framework, and building a spatial pyramid pooling ASPP module after the feature extraction network; designing a double-attention mechanism feature fusion module; designing a feature fusion module based on a double-attention machine mechanism feature fusion module, inputting a high-level feature map and a low-level feature map into the feature fusion module to obtain an output image, and obtaining a semantic segmentation result through depth separable convolution and upsampling, wherein the FDA-deep Lab model is completely built; initializing the FDA-DeepLab backbone model by adopting a pre-trained model, training the model, improving a loss function to optimize the training, and performing image segmentation and performance comparison on the test set by using the trained FDA-DeepLab model and the DeepLabv3+ model.

Description

FDA-deep Lab semantic segmentation algorithm based on double-attention mechanism fusion
Technical Field
The invention relates to an FDA-deep Lab semantic segmentation algorithm based on double-attention mechanism fusion, and belongs to the field of image processing.
Background
In the conventional semantic segmentation problem, there are several challenges: the successive downsampling operations in the conventional classification CNN result in a continuous degradation of the resolution of the feature map. The multi-scale detection problem, which is generally rescaling and aggregating feature maps, is computationally expensive. To solve these problems, the deep lab model was developed. The DeepLabv3+ model is obtained by continuous development of the DeepLab model. The DeepLabv3+ model takes DeepLabv3 as an Encoder part to extract multi-scale features. On the basis, a Decoder part is added, a new method for fusing ASPP, encoder and Decoder is formed, and the object boundary of the segmentation structure can be effectively improved. However, in practice, there are several problems:
1. similar objects are prone to misjudgment.
2. Small targets are easily missed.
3. The boundary segmentation error is large.
4. The output is predicted to have holes.
Disclosure of Invention
The invention aims to: aiming at the problems in the prior art, the invention provides an FDA-deep Lab semantic segmentation algorithm based on double-attention mechanism fusion, which mainly aims at solving the problems of fracture and cavities of an object segmented by an original deep Labv3+ model; the problem that the original DeepLabv3+ model has large image boundary segmentation error is solved; the problem that similar objects are easily misjudged by an original DeepLabv3+ model is solved; the problems of unbalanced data set sample categories and unbalanced sample classification difficulty in the actual training process are solved.
The technical scheme is as follows:
an FDA-deep Lab semantic segmentation algorithm based on double-attention mechanism fusion is characterized by comprising the following steps of:
step 1: constructing a feature extraction network and a spatial pyramid pooling ASPP module according to a DeepLabv3+ model framework;
step 2: designing a double-attention machine system feature fusion module;
and step 3: designing a feature fusion module based on a feature fusion module of a double-attention machine;
and 4, step 4: performing depth separable convolution and upsampling on an output image obtained by the feature fusion module, and completing model building;
and 5: and training the model, improving the loss function to optimize training, and comparing the performances of different models.
The step 1 comprises the following steps:
step 1.1: and (3) constructing a feature extraction network by adopting a ResNet-50 convolutional neural network model to obtain low-level feature maps with the down-sampling rates of 4, 8 and 16.
Step 1.2: and after the characteristic extraction network, a spatial pyramid pooling ASPP module is built to obtain an advanced characteristic diagram.
The step 2 comprises the following steps:
step 2.1: let the low resolution feature map input be U for the same dual-attention mechanism fusion module LI The resolution of the feature map is H 'xW', and the input of the high-resolution feature map is U HI The resolution of the characteristic diagram is H multiplied by W;
step 2.2: to U LI Performing an upsampling operation to obtain U L′I′ Make U L′I′ Resolution and U HI The resolution becomes equal, i.e., H × W. The formula is as follows:
U L'I' =f up (U LI ),U L'I' ∈H×W×C
in the formula (f) up Representing the upsampling operation, and generally adopting a bilinear interpolation method;
step 2.3: to U L′I′ Performing channel attention operation to obtain U LI′ To U, to U HI Performing spatial attention maneuversTo obtain a weight F S . Weight F S And U LI′ Multiply to obtain U LO′ . The formula is as follows:
U LI′ =f(W R *z)*U L′I′
F S =[f(s 1,1 ),f(s 1,2 ),…,f(s i,j ),…,f(s H,W )]
Figure BDA0003755070050000021
wherein f () represents Sigmoid function, s is mapping characteristic, W R Z is a compression characteristic for the parameter corresponding to the convolution operation;
step 2.4: handle U LO′ And U HI Add and add a 1 x 1 convolution kernel to reduce the dimension. The formula is as follows:
U O =c(U LO' +U HI )
in the formula, c represents a 1 × 1 convolution operation.
The step 3 comprises the following steps:
step 3.1: obtaining an output characteristic diagram 1 by a characteristic fusion module of a double-attention machine designed in the step 2 by using the low-level characteristic diagram with the down sampling rate of 16 obtained in the step 1.1 and the high-level characteristic diagram obtained in the step 1.2;
step 3.2: obtaining an output characteristic diagram 2 by a double-attention-machine mechanism characteristic fusion module designed in the step 2 by using the low-level characteristic diagram with the down sampling rate of 8 obtained in the step 1.1 and the output characteristic diagram 1 obtained in the step 3.1;
step 3.3: obtaining a low-level feature map with the down-sampling rate of 4 obtained in the step 1.1 and an output feature map 2 obtained in the step 3.2 through a feature fusion module of a double-attention machine designed in the step 2 to obtain an output feature map 3;
the step 4 comprises the following steps:
step 4.1: the output signature 3 obtained in step 3.3 is subjected to a depth separable convolution with a convolution kernel of 3 x 3 and a 4-fold upsampling.
The step 5 comprises the following steps:
step 5.1: and training the model. The FDA-DeepLab backbone model was initialized with the ResNet-50 pre-trained model pre-trained on the ImageNet dataset. Setting the batch processing size to be 10, the iteration step number to be 40000, the total downsampling multiple of the basic feature extraction network to be 16, the initial learning rate to be 0.007, the training data size to be 513 multiplied by 513, and adopting a poly learning rate strategy.
Step 5.2: the training is optimized by improving the loss function. The focus loss function is adopted to replace the conventional cross entropy loss function, and the formula is as follows:
L FL (p t )=-α t (1-p t ) γ log p t
wherein alpha is a weight parameter between classes (class 0-1), and (1-p) t ) y For simple/difficult sample adjustment factor, gamma is the focusing parameter, p t The probability of the corresponding label of the prediction result is obtained. In this experiment, γ =2 and α =0.25 were set.
Step 5.3: the model was tested. MIoU is used as a performance evaluation index, and the MIoU index has the characteristics of simplicity and strong representativeness and is the most common evaluation standard in the semantic segmentation field.
Has the advantages that:
1. the problems of breakage and cavities of the object divided by the original DeepLabv3+ model are solved;
2. the problem that the original DeepLabv3+ model has large image boundary segmentation error is solved;
3. the problem that similar objects are easily misjudged by an original DeepLabv3+ model is solved;
4. the problems of unbalanced data set sample categories and unbalanced sample classification difficulty in the actual training process are solved.
Drawings
FIG. 1 is a network model diagram of the original DeepLabv3+ model;
FIG. 2 is a block diagram of a dual-attention machine mechanism fusion module designed in accordance with the present invention;
FIG. 3 is an overall architecture diagram of the present invention;
FIG. 4 is a graph of results of a comparative attention mechanism experiment;
FIG. 5 is a graph of the results of a multi-feature fusion comparison experiment under a dual-force-of-interest mechanism;
FIG. 6 is a graph of loss function versus experimental results;
FIG. 7 is a graph showing the results of comparative experiments before and after the DeepLabv3+ modification;
FIG. 8 is a graph of the results of comparative experiments with different algorithms.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. Thus, the following detailed description of the embodiments of the invention, provided in the accompanying drawings, is not intended to limit the scope of the invention, as claimed.
As shown in the figure, the FDA-deep Lab semantic segmentation algorithm based on the double-attention mechanism fusion comprises the following steps:
step 1: constructing a feature extraction network and a spatial pyramid pooling ASPP module according to a DeepLabv3+ model framework;
step 2: designing a double-attention machine system feature fusion module;
and step 3: designing a feature fusion module based on a feature fusion module of a double-attention machine;
and 4, step 4: performing depth separable convolution and up-sampling on the output image obtained by the feature fusion module, and finishing model building;
and 5: and training the model, improving a loss function to optimize training, and comparing the performances of different models.
The step 1 comprises the following steps:
step 1.1: and (3) constructing a feature extraction network by adopting a ResNet-50 convolutional neural network model to obtain low-level feature maps with the down-sampling rates of 4, 8 and 16.
Step 1.2: and after the characteristic extraction network, a spatial pyramid pooling ASPP module is built to obtain an advanced characteristic diagram.
The step 2 comprises the following steps:
step 2.1: let the low resolution feature map input be U for the same dual-attention mechanism fusion module LI The resolution of the feature map is H 'multiplied by W', and the input of the high resolution feature map is U HI The resolution of the characteristic diagram is H multiplied by W;
step 2.2: to U LI Performing an upsampling operation to obtain U L′I′ Make U L′I′ Resolution and U HI The resolution becomes uniform, i.e., H × W. The formula is as follows:
U L'I' =f up (U LI ),U L'I' ∈H×W×C
in the formula, f up Representing the upsampling operation, and generally adopting a bilinear interpolation method;
step 2.3: to U L′I′ Performing channel attention operation to obtain U LI′ To U is aligned with HI Performing spatial attention operation to obtain weight F S . Weight F S And U LI′ Multiply to obtain U LO′ . The formula is as follows:
U LI′ =f(W R *z)*U L′I′
F S =[f(s 1,1 ),f(s 1,2 ),…,f(s i,j ),…,f(s H,W )]
Figure BDA0003755070050000041
wherein f () represents Sigmoid function, s is mapping characteristic, W R As parameters for the corresponding convolution operation, z is the compression characteristic;
step 2.4: handle U LO′ And U HI Add and add a 1 x 1 convolution kernel to reduce the dimension. The formula is as follows:
U O =c(U LO' +U HI )
in the formula, c represents a 1 × 1 convolution operation.
The step 3 comprises the following steps:
step 3.1: obtaining an output characteristic diagram 1 by a double-attention-machine mechanism characteristic fusion module designed in the step 2 through the low-level characteristic diagram with the down-sampling rate of 16 obtained in the step 1.1 and the high-level characteristic diagram obtained in the step 1.2;
step 3.2: obtaining an output characteristic diagram 2 by a double-attention-machine mechanism characteristic fusion module designed in the step 2 by using the low-level characteristic diagram with the down sampling rate of 8 obtained in the step 1.1 and the output characteristic diagram 1 obtained in the step 3.1;
step 3.3: obtaining an output characteristic diagram 3 by a double-attention-machine mechanism characteristic fusion module designed in the step 2 through the low-level characteristic diagram with the down-sampling rate of 4 obtained in the step 1.1 and the output characteristic diagram 2 obtained in the step 3.2;
the step 4 comprises the following steps:
step 4.1: the output signature 3 obtained in step 3.3 is subjected to a depth separable convolution with a convolution kernel of 3 x 3 and a 4-fold upsampling.
The step 5 comprises the following steps:
step 5.1: and training the model. The constructed FDA-DeepLab model was trained using the public data set PASCAL VOC 2012 data set. The FDA-DeepLab backbone model was initialized with the ResNet-50 pre-trained model pre-trained on the ImageNet dataset. Setting the batch processing size to be 10, the iteration step number to be 40000, the total downsampling multiple of the basic feature extraction network to be 16, the initial learning rate to be 0.007, the training data size to be 513 multiplied by 513, and adopting a poly learning rate strategy.
Step 5.2: the training is optimized by improving the loss function. The focus loss function is adopted to replace the conventional cross entropy loss function, and the formula is as follows:
L FL (p t )=-α t (1-p t ) γ log p t
wherein alpha is a weight parameter between classes (class 0-1), and (1-p) t ) y For simple/difficult sample adjustment factor, gamma is the focusing parameter, p t The probability of the corresponding label of the prediction result is obtained. In this experiment, γ =2 and α =0.25 were set.
Step 5.3: the model was tested. MIoU is used as a performance evaluation index, and the MIoU index has the characteristics of simplicity and strong representativeness and is the most common evaluation standard in the semantic segmentation field.
According to the method, the defect of the original DeepLabv3+ model on image semantic segmentation is overcome by adding the double-attention mechanism feature fusion module, the feature fusion module and the improved loss function on the DeepLabv3+ model, and the segmentation effect of the image is improved.
Wherein, fig. 1 is a network model diagram of the original deep bv3+ model. The original DeepLabv3+ model is divided into two modules, encoder and Decode, as a whole. Specifically, the Encoder module comprises a backbone network and an ASPP module which are responsible for basic feature extraction, and the effective extraction of image features is the key of high-precision semantic segmentation. The Decoder module is responsible for gradually up-sampling the feature map obtained by the Encoder module, and fusing high and low features by adopting the idea of FPN feature fusion, so that the problem of detail loss in the feature extraction process is solved, and finally, a semantic segmentation result is obtained.
Fig. 2 is a structural diagram of a fusion module of a dual-gravity mechanism designed by the invention. The invention integrates the advantages of two attention mechanisms, effectively fuses low-level spatial details and high-level semantic clues and obtains an attention mechanism model with better effect. In the current stage, a common fusion mode is to perform two attention mechanism operations on the same feature map and fuse the results, and the difference is more different from different feature fusion modes. The low-level characteristic diagram with high resolution is suitable for adopting space attention operation to extract the space position information of the input image and locate important parts from the space position information; the high-level feature map with low resolution is suitable for taking channel attention operation to focus on more relevant feature channels and neglect other interference information. Therefore, the invention adopts different attention mechanisms on the characteristic diagrams with different resolutions and then carries out fusion, thereby improving the fusion effect.
Fig. 3 is the general architecture of the present invention.
Fig. 4, 5, 6, 7, and 8 are the results of the attention mechanism comparison experiment performed on the PASCAL VOC 2012 validation set, the results of the multi-feature fusion comparison experiment performed under the dual attention mechanism, the results of the loss function comparison experiment, the results of the comparison experiment before and after the deeplab v3+ improvement, and the results of the comparison experiment performed by different algorithms, respectively. The experimental result shows that the overall effect of the double-attention machine system module designed by the invention is superior to that of a single channel attention machine system or a space attention machine system and is better than that of an original model; the feature fusion method based on the double-attention machine mechanism is superior to other fusion methods; the focus loss function designed by the invention also promotes the public data set with more balanced data distribution to a certain extent. In general, the invention obtains an FDA-DeepLab semantic segmentation algorithm based on the double-attention mechanism fusion by designing a double-attention mechanism feature fusion module, a feature fusion module and an improved loss function on the DeepLabv3+ model, improves the defects of the original DeepLabv3+ model and improves the segmentation effect.
The technical means disclosed in the invention scheme are not limited to the technical means disclosed in the above embodiments, but also include the technical scheme formed by any combination of the above technical features. It should be noted that those skilled in the art can make various improvements and modifications without departing from the principle of the present invention, and such improvements and modifications are also considered to be within the scope of the present invention.

Claims (6)

1. An FDA-deep Lab semantic segmentation algorithm based on double-attention mechanism fusion is characterized by comprising the following steps of:
step 1: constructing a feature extraction network and a spatial pyramid pooling ASPP module according to a DeepLabv3+ model framework;
step 2: designing a double-attention mechanism feature fusion module;
and 3, step 3: designing a feature fusion module based on a feature fusion module of a double-attention machine;
and 4, step 4: performing depth separable convolution and up-sampling on the output image obtained by the feature fusion module, and finishing model building;
and 5: and training the model, improving the loss function to optimize training, and comparing the performances of different models.
2. The FDA-DeepLab semantic segmentation algorithm based on the double-attention mechanism fusion as claimed in claim 1, wherein the step 1 comprises the following steps:
step 1.1: a ResNet-50 convolutional neural network model is adopted to build a feature extraction network, and low-level feature maps with down-sampling rates of 4, 8 and 16 are obtained;
step 1.2: and after the characteristic extraction network, a spatial pyramid pooling ASPP module is built to obtain an advanced characteristic diagram.
3. The FDA-DeepLab semantic segmentation algorithm based on the double-attention mechanism fusion as claimed in claim 2, wherein the step 2 comprises the following steps:
step 2.1: let the low resolution feature map input be U for the same dual-attention mechanism fusion module LI The resolution of the feature map is H 'xW', and the input of the high-resolution feature map is U HI The resolution of the characteristic diagram is H multiplied by W;
step 2.2: to U LI Performing an upsampling operation to obtain U L′I′ Make U L′I′ Resolution and U HI The resolution becomes equal, i.e., H × W. (ii) a The formula is as follows:
U L'I' =f up (U LI ),U L'I' ∈H×W×C
in the formula (f) up Representing the upsampling operation, generally adopting a bilinear interpolation method;
step 2.3: to U L′I′ Performing channel attention operation to obtain U LI′ To U, to U HI Performing spatial attention operation to obtain weight F S . (ii) a Weight F S And U LI′ Multiply to obtain U LO′ . (ii) a The formula is as follows:
U LI′ =f(W R *z)*U L′I′
F S =[f(s 1,1 ),f(s 1,2 ),…,f(s i,j ),…,f(s H,W )]
Figure FDA0003755070040000011
wherein f () represents Sigmoid function, s is mapping characteristic, W R Z is a compression characteristic for the parameter corresponding to the convolution operation; step 2.4: handle U LO′ And U HI Adding, and adding a 1 × 1 convolution kernel to reduce the dimension; the formula is as follows:
U O =c(U LO' +U HI )
in the formula, c represents a 1 × 1 convolution operation.
4. The FDA-DeepLab semantic segmentation algorithm based on the double-attention mechanism fusion as claimed in claim 2, wherein the step 3 comprises the following steps:
step 3.1: obtaining an output characteristic diagram 1 by a double-attention-machine mechanism characteristic fusion module designed in the step 2 through the low-level characteristic diagram with the down-sampling rate of 16 obtained in the step 1.1 and the high-level characteristic diagram obtained in the step 1.2;
step 3.2: obtaining an output characteristic diagram 2 by a double-attention-machine mechanism characteristic fusion module designed in the step 2 from the low-level characteristic diagram with the down-sampling rate of 8 obtained in the step 1.1 and the output characteristic diagram 1 obtained in the step 3.1;
step 3.3: and (3) obtaining the output characteristic diagram 3 by a double-attention-machine mechanism characteristic fusion module designed in the step (2) through the low-level characteristic diagram with the down-sampling rate of 4 obtained in the step (1.1) and the output characteristic diagram 2 obtained in the step (3.2).
5. The FDA-DeepLab semantic segmentation algorithm based on the double-attention mechanism fusion as claimed in claim 4, wherein the step 4 comprises the following steps:
step 4.1: the output feature map 3 obtained in step 3.3 is subjected to depth separable convolution with a convolution kernel of 3 x 3 and up-sampling by a factor of 4.
6. The FDA-DeepLab semantic segmentation algorithm based on the double-attention mechanism fusion as claimed in claim 1, wherein the step 5 comprises the following steps:
step 5.1: training the model; initializing an FDA-deep Lab backbone model by adopting a ResNet-50 pre-training model which is pre-trained on an ImageNet data set; setting the batch processing size to be 10, the iteration step number to be 40000, the total downsampling multiple of a basic feature extraction network to be 16, the initial learning rate to be 0.007, the training data size to be 513 multiplied by 513, and adopting a poly learning rate strategy;
step 5.2: improving a loss function to optimize training; the focus loss function is adopted to replace the conventional cross entropy loss function, and the formula is as follows:
L FL (p t )=-α t (1-p t ) γ logp t
wherein alpha is a weight parameter between classes, (1-p) t ) y For simple/difficult sample adjustment factor, gamma is the focusing parameter, p t The probability of the corresponding label of the prediction result is obtained; setting γ =2, α =0.25;
step 5.3: testing the model; MIoU is adopted as a performance evaluation index.
CN202210852168.XA 2022-07-20 2022-07-20 FDA-deep Lab semantic segmentation algorithm based on double-attention mechanism fusion Pending CN115170801A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210852168.XA CN115170801A (en) 2022-07-20 2022-07-20 FDA-deep Lab semantic segmentation algorithm based on double-attention mechanism fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210852168.XA CN115170801A (en) 2022-07-20 2022-07-20 FDA-deep Lab semantic segmentation algorithm based on double-attention mechanism fusion

Publications (1)

Publication Number Publication Date
CN115170801A true CN115170801A (en) 2022-10-11

Family

ID=83494735

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210852168.XA Pending CN115170801A (en) 2022-07-20 2022-07-20 FDA-deep Lab semantic segmentation algorithm based on double-attention mechanism fusion

Country Status (1)

Country Link
CN (1) CN115170801A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117237644A (en) * 2023-11-10 2023-12-15 广东工业大学 Forest residual fire detection method and system based on infrared small target detection
CN117409208A (en) * 2023-12-14 2024-01-16 武汉纺织大学 Real-time clothing image semantic segmentation method and system

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117237644A (en) * 2023-11-10 2023-12-15 广东工业大学 Forest residual fire detection method and system based on infrared small target detection
CN117237644B (en) * 2023-11-10 2024-02-13 广东工业大学 Forest residual fire detection method and system based on infrared small target detection
CN117409208A (en) * 2023-12-14 2024-01-16 武汉纺织大学 Real-time clothing image semantic segmentation method and system
CN117409208B (en) * 2023-12-14 2024-03-08 武汉纺织大学 Real-time clothing image semantic segmentation method and system

Similar Documents

Publication Publication Date Title
CN108492286B (en) Medical image segmentation method based on dual-channel U-shaped convolutional neural network
CN111798416B (en) Intelligent glomerulus detection method and system based on pathological image and deep learning
CN112258526B (en) CT kidney region cascade segmentation method based on dual attention mechanism
CN115170801A (en) FDA-deep Lab semantic segmentation algorithm based on double-attention mechanism fusion
CN111950453A (en) Optional-shape text recognition method based on selective attention mechanism
CN111882620B (en) Road drivable area segmentation method based on multi-scale information
CN113642390A (en) Street view image semantic segmentation method based on local attention network
Bai et al. NHL Pathological Image Classification Based on Hierarchical Local Information and GoogLeNet‐Based Representations
CN113191969A (en) Unsupervised image rain removing method based on attention confrontation generation network
CN114092815B (en) Remote sensing intelligent extraction method for large-range photovoltaic power generation facility
CN113240683B (en) Attention mechanism-based lightweight semantic segmentation model construction method
CN110517272B (en) Deep learning-based blood cell segmentation method
CN114092742B (en) Multi-angle-based small sample image classification device and method
CN117237733A (en) Breast cancer full-slice image classification method combining self-supervision and weak supervision learning
CN111612789A (en) Defect detection method based on improved U-net network
CN111914654A (en) Text layout analysis method, device, equipment and medium
CN113870286A (en) Foreground segmentation method based on multi-level feature and mask fusion
CN116229106A (en) Video significance prediction method based on double-U structure
CN114913424A (en) Improved U-net model based collapsing extraction method and system
CN111612803B (en) Vehicle image semantic segmentation method based on image definition
CN117372413A (en) Wafer defect detection method based on generation countermeasure network
CN115294417A (en) Method, apparatus and storage medium for image processing
CN117274355A (en) Drainage pipeline flow intelligent measurement method based on acceleration guidance area convolutional neural network and parallel multi-scale unified network
CN114120202B (en) Multi-scale target model and feature fusion-based semi-supervised video target segmentation method
CN116091763A (en) Apple leaf disease image semantic segmentation system, segmentation method, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination