CN113240683A - Attention mechanism-based lightweight semantic segmentation model construction method - Google Patents

Attention mechanism-based lightweight semantic segmentation model construction method Download PDF

Info

Publication number
CN113240683A
CN113240683A CN202110638043.2A CN202110638043A CN113240683A CN 113240683 A CN113240683 A CN 113240683A CN 202110638043 A CN202110638043 A CN 202110638043A CN 113240683 A CN113240683 A CN 113240683A
Authority
CN
China
Prior art keywords
attention
semantic segmentation
stage
path
construction method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110638043.2A
Other languages
Chinese (zh)
Other versions
CN113240683B (en
Inventor
张霖
杨源
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beihang University
Original Assignee
Beihang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beihang University filed Critical Beihang University
Priority to CN202110638043.2A priority Critical patent/CN113240683B/en
Publication of CN113240683A publication Critical patent/CN113240683A/en
Application granted granted Critical
Publication of CN113240683B publication Critical patent/CN113240683B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30061Lung
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses a lightweight semantic segmentation model construction method based on an attention mechanism, which is applied to the technical field of image processing, wherein an image I is given, and a corresponding real label image GT forms a training set: step 1, establishing a model; step 2, training a model; and 3, testing the model, namely inputting the test set image into the trained network model to obtain a test result. The invention realizes the improvement of the image segmentation accuracy and the segmentation speed; the segmentation process is not easy to overfit; the efficiency is high, and the actual deployment is convenient; in the event of a deficiency in the annotation data, it is trained quickly to further improve performance.

Description

Attention mechanism-based lightweight semantic segmentation model construction method
Technical Field
The invention relates to the technical field of image processing, in particular to a lightweight semantic segmentation model construction method based on an attention mechanism.
Background
Image segmentation refers to a computer vision task that labels a designated area according to the content of an image, and specifically, the purpose of image semantic segmentation is to label each point pixel in an image and associate the pixel with its corresponding class. The method has important practical application value in the aspects of scene understanding, medical images, unmanned driving and the like.
The classical semantic segmentation model comprises:
the full convolutional neural network (FCN) is a classic manufacture of a semantic segmentation network in deep learning, the traditional classification network structure is used for reference, the FCN is different from the traditional classification network, and a full connection layer of the traditional classification network is converted into a convolutional layer. Then, by performing up-sampling through deconvolution (deconvolution), the detailed information of the image is gradually restored and the size of the feature map is enlarged. In the process of restoring the detail information of the image, the FCN is realized by deconvolution that can be learned on one hand, and on the other hand, a skip-connection (skip-connection) mode is adopted to fuse the feature information obtained in the down-sampling process with the corresponding feature map in the up-sampling process. However, FCN has technical drawbacks such as loss of semantic information and lack of research on correlation between pixels.
SegNet, which adopts the FCN coding-decoding architecture, does not use a hopping connection structure unlike FCN, and uses an unpuncturing operation instead of deconvolution during upsampling. Those stored indices are used in the decoder to perform a depoling operation on the corresponding feature map. Thus, the integrity of high frequency information is ensured, but when the unpolluting is carried out on the feature map with lower resolution, the information among the pixel neighbors is also ignored.
The deep series is a series of semantic segmentation network models designed by the Google team, and the processing of the hole convolution and the CRF is adopted. The scope of the receptive field is expanded by using hole convolution without increasing parameters. And the post-processing of the CRF can better improve the accuracy of semantic segmentation. depllabv 2 adds an ASPP (hole space pyramid pooling) module on the basis of v 1.
PSPnet, called Pyramid matching Network, uses Pyramid pooling module to fuse the context information of the image, focusing on the relevance between pixels. After the pre-training model is used for extracting the features, the pyramid pooling module is used for extracting the context information of the image, and after the context information and the extracted features are stacked, the final output is obtained through up-sampling. The process of feature stacking is a process of fusing a detail feature of an object, where the detail feature refers to a shallow feature, that is, a feature extracted by a shallow network, and a global feature refers to a deep feature, that is, a contextual feature in general. The corresponding is the feature extracted by the deep network.
The network model has more layers and larger model parameter quantity, and the segmentation based on the pixel level is the mainstream direction of image classification along with the development of the technology and the continuous improvement of hardware conditions.
Therefore, it is an urgent technical problem to be solved by those skilled in the art to introduce a lightweight model for semantic segmentation and to provide a lightweight semantic segmentation model construction method based on an attention mechanism to improve the image segmentation accuracy and the segmentation speed.
Disclosure of Invention
In view of the above, the invention provides a lightweight semantic segmentation model construction method based on an attention mechanism, which is used for improving the image segmentation accuracy and the segmentation speed.
In order to achieve the purpose, the invention adopts the following technical scheme:
the attention mechanism-based lightweight semantic segmentation model construction method comprises the following steps:
given an image I, a corresponding real label graph GT forms a training set:
step 1, establishing a model, namely constructing a coding stage by adopting an AHSP module, a Channel Attention Sum, a channels-Cross Attention Sum, a Channel Split and a Concat, constructing a decoding stage by using an FFM, a Channel Attention Sum, a channels-Cross Attention Sum, a ReLU function and a Final predition, and connecting the coding stage and the decoding stage through the Channel Attention Sum to obtain an ultra-lightweight semantic segmentation network based on an Attention mechanism;
step 2, model training, namely inputting a training set image I into an ultra-lightweight semantic segmentation network of an attention mechanism to obtain a predicted image, comparing the predicted image with a real label image GT, calculating a cross entropy function as a loss function, and measuring the error between a predicted value and a real value; performing iterative optimization training on the network model parameters defined in the step 1 through a back propagation algorithm until the whole model converges;
and 3, testing the model, namely inputting the test set image into the trained network model to obtain a test result.
Preferably, in step 1, the coding network includes n stages, and the AHSP module is used as a basic module, and Criss-Cross Attention Sum, Channel Split, and Concat fuse Split are introduced to construct a first path and a second path which are connected with each other; n times of downsampling are carried out on the training set image I, and the size of the feature map output at each stage is 1/2, 1/4, 1/2 of the original sizen
Preferably, the first path includes k AHSP modules, and the transfer function of the k-th module at the i-th stage of the first path is expressed as
Figure BDA0003105971080000031
Output is as
Figure BDA0003105971080000032
Wherein i belongs to {1,2, 3.., n }, and k belongs to {1 };
the second path comprises j AHSP modules, and the conversion function of the j-th module at the i-th stage of the second path is expressed as
Figure BDA0003105971080000033
Output is as
Figure BDA0003105971080000034
Wherein i belongs to {1,2, 3.,. n }, j belongs to {1,2}, and CiIs the number of characteristic channels in the i-th stage.
Preferably, the calculation formula of the first AHSP module output characteristic diagram of the first path and the second path at each stage is as follows:
Figure BDA0003105971080000035
Figure BDA0003105971080000036
Figure BDA0003105971080000037
wherein i ∈ {1,2, 3.., n },
Figure BDA0003105971080000041
and
Figure BDA0003105971080000042
down-sampling with step size of 2; f1×1(. h) is a convolution function with a convolution kernel of 1 × 1, and Split () divides the received feature map into two parts along the channel dimension and sends the two parts into the channel dimension
Figure BDA0003105971080000043
And
Figure BDA0003105971080000044
and obtaining first path characteristic information and second path characteristic information.
Preferably, the calculation formula of the output characteristic diagram of the 2 nd AHSP module of the second path at each stage is as follows:
Figure BDA0003105971080000045
wherein i ∈ {1,2, 3.
Preferably, in step 1, the decoding network includes n stages, and based on the FFM module, a Channel Attention Sum and a Cross-Cross Attention Sum are introduced to form the decoding network, and a ReLU function is introduced as an activation function of a final output prediction result.
Preferably, the FFM module has a transfer function of Di(. to) the profile of the output is represented as
Figure BDA0003105971080000046
Wherein i ∈ {1,2, 3.., n };
S′i=F1×1(X) (5)
Figure BDA0003105971080000047
wherein, S'iFor the output result of the down-sampled final output X after a 1X 1 convolution function operation, F1×1(. cndot.) is a convolution function with a convolution kernel of 1 x 1,
Figure BDA0003105971080000048
the separable convolution network transfer function with a convolution kernel of 3 × 3, BatchNorm (·) is a batch normalization function.
Preferably, the feature map output obtained through the encoding stage is:
Figure BDA0003105971080000049
then D isiThe calculation formula process of (2) is as follows:
S″i=Di(Upsample(CAM(Di+1),2)) (8)
Figure BDA00031059710800000410
wherein Upesample (-) and t denote that the characteristic diagram is sampled by the coefficient of t by using a bilinear interpolation method, CAM (-) denotes that a channel attention mechanism is used, S ″iAnd the feature graph D of the next stage is output after CAM, up-sampling and FFM operations.
Preferably, by DiTest result P was obtained by 1X 1 convolutioniThe method comprises the following steps:
Pi=Soft max(Upsample(F1×1(Di),2i)) (10)
wherein, Pi∈RH×WFor the predicted class label graph, Soft max (·) is the activation function, i ∈ {1,2, 3.
According to the technical scheme, compared with the prior art, the invention provides a lightweight semantic segmentation model construction method based on an attention mechanism, which comprises the following steps: the image segmentation accuracy and the segmentation speed are improved; the segmentation process is not easy to overfit; the efficiency is high, and the actual deployment is convenient; in the event of a deficiency in the annotation data, it is trained quickly to further improve performance.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a diagram of an ultra lightweight semantic segmentation network architecture based on an attention mechanism according to the present invention;
FIG. 2 is a block diagram of an FFM module of the present invention;
FIG. 3 is an Image of an embodiment of the present invention, wherein 3.1 is CT Image, 3.2 is a prediction map, and 3.3 is a real label map.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the invention discloses a lightweight semantic segmentation model construction method based on an attention mechanism, which comprises the following steps:
given an image I, a corresponding real label graph GT forms a training set:
step 1, establishing a model, namely constructing a coding stage by adopting an AHSP (advanced high performance protocol) module, Channel Attention Sum, Cross-Cross Attention Sum, Channel Split and Concat full Split, constructing a decoding stage by using an FFM (feature fusion module), the Channel Attention Sum, the Cross-Cross Attention Sum, a ReLU function and a Final Prediction, and connecting the coding stage and the decoding stage through the Channel Attention Sum to obtain an ultra-lightweight semantic segmentation network based on an Attention mechanism;
step 2, model training, namely inputting a training set image I into an ultra-lightweight semantic segmentation network of an attention mechanism to obtain a predicted image, comparing the predicted image with a real label image GT, calculating a cross entropy function as a loss function, and measuring the error between a predicted value and a real value; performing iterative optimization training on the network model parameters defined in the step 1 through a back propagation algorithm until the whole model converges;
and 3, testing the model, namely inputting the test set image into the trained network model to obtain a test result.
In one embodiment, the Channel Attention Sum is used in a network architecture
Figure BDA0003105971080000061
To show that Criss-Cross Attention Sum is used in network architecture
Figure BDA0003105971080000062
That the Concat fuse split is used in the network
Figure BDA0003105971080000063
To indicate.
In a specific embodiment, step 1, the encoding network comprises n stages, a first path and a second path which are connected with each other are constructed by introducing Criss-Cross Attention Sum, Channel Split and Concat fuse Split with an AHSP module as a basic module; n times of downsampling are carried out on the training set image I, and the size of the feature map output at each stage is 1/2, 1/4, 1/2 of the original sizen
In one embodiment, the first path includes k AHSP modules, and the transfer function of the k-th module at the i-th stage of the first path is expressed as
Figure BDA0003105971080000071
Output is as
Figure BDA0003105971080000072
Wherein i belongs to {1,2, 3.., n }, and k belongs to {1 };
the second path comprises j AHSP modules, and the conversion function of the j-th module at the i-th stage of the second path is expressed as
Figure BDA0003105971080000073
Output is as
Figure BDA0003105971080000074
Wherein i belongs to {1,2, 3.,. n }, j belongs to {1,2}, and CiIs the number of characteristic channels in the i-th stage.
In one particular embodiment, for phase 0, one may obtain:
Figure BDA0003105971080000075
in a specific embodiment, the calculation formula of the first AHSP module output characteristic map of the first path and the second path at each stage is as follows:
Figure BDA0003105971080000076
Figure BDA0003105971080000077
Figure BDA0003105971080000078
wherein i ∈ {1,2, 3.., n },
Figure BDA0003105971080000079
and
Figure BDA00031059710800000710
down-sampling with step size of 2; f1×1(. h) is a convolution function with a convolution kernel of 1 × 1, and Split () divides the received feature map into two parts along the channel dimension and sends the two parts into the channel dimension
Figure BDA00031059710800000711
And
Figure BDA00031059710800000712
and obtaining first path characteristic information and second path characteristic information.
In one embodiment, the calculation formula of the output characteristic diagram of the 2 nd AHSP module of the second path at each stage is as follows:
Figure BDA00031059710800000713
wherein i ∈ {1,2, 3.
In a specific embodiment, in step 1, the decoding network includes n stages, based on the FFM module, a Channel assignment Sum and a Cross-Cross assignment Sum are introduced to form the decoding network, and a ReLU function is introduced as an activation function of a final output prediction result.
In one embodiment, referring to FIG. 2, the FFM module has a transfer function of Di(. to) the profile of the output is represented as
Figure BDA00031059710800000714
Wherein i ∈ {1,2, 3.., n };
S′i=F1×1(X) (5)
Figure BDA0003105971080000081
wherein S isi' is the output result of the 1 × 1 convolution function operation on the down-sampled final output X, F1×1(. cndot.) is a convolution function with a convolution kernel of 1 x 1,
Figure BDA0003105971080000082
BatchNorm (-) is a batch normalization function for separable convolution functions with a convolution kernel of 3 × 3.
In one embodiment, the signature graph output obtained through the encoding stage is:
Figure BDA0003105971080000083
then D isiThe calculation formula process of (2) is as follows:
S″i=Di(Upsample(CAM(Di+1),2)) (8)
Figure BDA0003105971080000084
wherein Upesample (-) and t denote that the characteristic diagram is sampled by the coefficient of t by using a bilinear interpolation method, CAM (-) denotes that a channel attention mechanism is used, S ″iAnd (4) outputting a result obtained by performing 1 × 1 convolution function operation on the down-sampled final output X.
In one embodiment, D is utilizediTest result P was obtained by 1X 1 convolutioniThe method comprises the following steps:
Pi=Soft max(Upsample(F1×1(Di),2i)) (10)
wherein, Pi∈RH×WFor the predicted class label graph, Soft max (·) is the activation function, i ∈ {1,2, 3.., n };
definition of Softmax function (taking the ith node output as an example):
Figure BDA0003105971080000085
wherein: ziAnd C is the output value of the ith node, and the number of output nodes, namely the number of classified categories.
In one embodiment, the lung image is taken as an example for experiment, as shown in fig. 3, 3.1 is CTImage, 3.2 is a prediction map, 3.3 is a real label map, and table 1 is a comparison between the parameters of the model and the parameters of other models:
TABLE 1
Methods Backbone Param. FLOPs Dice Sen. Spec.
U-Net VGG16 7.853M 38.116G 0.4 0.5 0.8
Attention-UNet VGG16 8.727M 31.73G 0.5 0.6 0.9
U-Net++ VGG16 9.163M 65.938G 0.5 0.6 0.9
Minimum-seg 36.98K 209.043M 0.663 0.704 0.935
As can be seen from the param column in table 1, the parameter quantity of the ultra-lightweight semantic segmentation network model based on the attention mechanism is only a parameter quantity close to 37K, and the parameter quantities of other models are at least M-level, so that the volume quantity of the model to be modified is small, and the image segmentation accuracy and the segmentation speed are improved; the segmentation process is not easy to overfit; the efficiency is high, and the actual deployment is convenient; in the event of a deficiency in the annotation data, it is trained quickly to further improve performance.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention in a progressive manner. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (9)

1. The attention mechanism-based lightweight semantic segmentation model construction method is characterized by comprising the following steps of:
given an image I, a corresponding real label graph GT forms a training set:
step 1, establishing a model, namely constructing a coding stage by adopting an AHSP module, a Channel Attention Sum, a channels-Cross Attention Sum, a Channel Split and a Concat, constructing a decoding stage by using an FFM, a Channel Attention Sum, a channels-Cross Attention Sum, a ReLU function and a Final predition, and connecting the coding stage and the decoding stage through the Channel Attention Sum to obtain an ultra-lightweight semantic segmentation network based on an Attention mechanism;
step 2, model training, namely inputting a training set image I into an ultra-lightweight semantic segmentation network of an attention mechanism to obtain a predicted image, comparing the predicted image with a real label image GT, calculating a cross entropy function as a loss function, and measuring the error between a predicted value and a real value; performing iterative optimization training on the network model parameters defined in the step 1 through a back propagation algorithm until the whole model converges;
and 3, testing the model, namely inputting the test set image into the trained network model to obtain a test result.
2. The attention-based lightweight semantic segmentation model construction method according to claim 1,
in the step 1, the coding network comprises n stages, an AHSP module is used as a basic module, Criss-Cross Activity Sum, Channel Split and Concat fuse Split are introduced, and a first path and a second path which are connected with each other are constructed; n times of downsampling are carried out on the training set image I, and the size of the feature graph output at each stage is the original scaleCun 1/2, 1/4.., 1/2n
3. The attention-based lightweight semantic segmentation model construction method according to claim 2,
the first path includes k AHSP modules, and the transfer function of the k-th module at the i-th stage of the first path is expressed as
Figure FDA0003105971070000011
Output is as
Figure FDA0003105971070000012
Wherein i belongs to {1,2, 3.., n }, and k belongs to {1 };
the second path comprises j AHSP modules, and the conversion function of the j-th module at the i-th stage of the second path is expressed as
Figure FDA0003105971070000021
Output is as
Figure FDA0003105971070000022
Figure FDA0003105971070000023
Wherein i belongs to {1,2, 3.,. n }, j belongs to {1,2}, and CiIs the number of characteristic channels in the i-th stage.
4. The attention-based lightweight semantic segmentation model construction method according to claim 3,
the calculation formula of the first AHSP module output characteristic diagram of the first path and the second path at each stage is as follows:
Figure FDA0003105971070000024
Figure FDA0003105971070000025
Figure FDA0003105971070000026
wherein i ∈ {1,2, 3.., n },
Figure FDA0003105971070000027
and
Figure FDA0003105971070000028
down-sampling with step size of 2; f1×1(. h) is a convolution network transfer function with a convolution kernel of 1 × 1, and Split (-) divides the received feature graph into two parts along the channel dimension and sends the two parts into the two parts
Figure FDA0003105971070000029
And
Figure FDA00031059710700000210
and obtaining first path characteristic information and second path characteristic information.
5. The attention-based lightweight semantic segmentation model construction method according to claim 3,
the calculation formula of the 2 nd AHSP module output characteristic diagram of the second path at each stage is as follows:
Figure FDA00031059710700000211
wherein i ∈ {1,2, 3.
6. The attention-based lightweight semantic segmentation model construction method according to claim 1,
in step 1, the decoding network comprises n stages, a Channel Attention Sum and a Criss-Cross Attention Sum are introduced to form the decoding network based on the FFM module, and a ReLU function is introduced as an activation function of a final output prediction result.
7. The attention-based lightweight semantic segmentation model construction method according to claim 6,
the FFM module has a transfer function of Di(. to) the profile of the output is represented as
Figure FDA0003105971070000031
Wherein i ∈ {1,2, 3.., n };
S'i=F1×1(X) (5)
Figure FDA0003105971070000032
wherein, S'iFor the output result of the down-sampled final output X after a 1X 1 convolution function operation, F1×1(. cndot.) is a convolution function with a convolution kernel of 1 x 1,
Figure FDA0003105971070000033
BatchNorm (-) is a batch normalization function for separable convolution functions with a convolution kernel of 3 × 3.
8. The attention-based lightweight semantic segmentation model construction method according to claim 7,
the feature map output obtained through the encoding stage is:
Figure FDA0003105971070000034
then D isiThe calculation formula process of (2) is as follows: wherein i is 1,2, …, n-1
S”i=Di(Upsample(CAM(Di+1),2)) (8)
Figure FDA0003105971070000035
Wherein Upesample (·, t) represents sampling the feature graph with the coefficient of t using bilinear interpolation method, CAM (·) represents using channel attention mechanism, S "iAnd the feature graph D of the next stage is output after CAM, up-sampling and FFM operations.
9. The attention-based lightweight semantic segmentation model construction method according to claim 8,
by using DiTest result P was obtained by 1X 1 convolutioniThe method comprises the following steps:
Pi=Soft max(Upsample(F1×1(Di),2i)) (10)
wherein, Pi∈RH×WFor the predicted class label graph, Soft max (·) is the activation function, i ∈ {1,2, 3.
CN202110638043.2A 2021-06-08 2021-06-08 Attention mechanism-based lightweight semantic segmentation model construction method Active CN113240683B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110638043.2A CN113240683B (en) 2021-06-08 2021-06-08 Attention mechanism-based lightweight semantic segmentation model construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110638043.2A CN113240683B (en) 2021-06-08 2021-06-08 Attention mechanism-based lightweight semantic segmentation model construction method

Publications (2)

Publication Number Publication Date
CN113240683A true CN113240683A (en) 2021-08-10
CN113240683B CN113240683B (en) 2022-09-20

Family

ID=77137265

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110638043.2A Active CN113240683B (en) 2021-06-08 2021-06-08 Attention mechanism-based lightweight semantic segmentation model construction method

Country Status (1)

Country Link
CN (1) CN113240683B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114140469A (en) * 2021-12-02 2022-03-04 北京交通大学 Depth hierarchical image semantic segmentation method based on multilayer attention
CN114241203A (en) * 2022-02-24 2022-03-25 科大天工智能装备技术(天津)有限公司 Workpiece length measuring method and system
CN114255350A (en) * 2021-12-23 2022-03-29 四川大学 Method and system for measuring thickness of soft and hard tissues of palate part
CN116721420A (en) * 2023-08-10 2023-09-08 南昌工程学院 Semantic segmentation model construction method and system for ultraviolet image of electrical equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490884A (en) * 2019-08-23 2019-11-22 北京工业大学 A kind of lightweight network semantic segmentation method based on confrontation
CN111079649A (en) * 2019-12-17 2020-04-28 西安电子科技大学 Remote sensing image ground feature classification method based on lightweight semantic segmentation network
CN112183360A (en) * 2020-09-29 2021-01-05 上海交通大学 Lightweight semantic segmentation method for high-resolution remote sensing image
CN112330681A (en) * 2020-11-06 2021-02-05 北京工业大学 Attention mechanism-based lightweight network real-time semantic segmentation method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110490884A (en) * 2019-08-23 2019-11-22 北京工业大学 A kind of lightweight network semantic segmentation method based on confrontation
CN111079649A (en) * 2019-12-17 2020-04-28 西安电子科技大学 Remote sensing image ground feature classification method based on lightweight semantic segmentation network
CN112183360A (en) * 2020-09-29 2021-01-05 上海交通大学 Lightweight semantic segmentation method for high-resolution remote sensing image
CN112330681A (en) * 2020-11-06 2021-02-05 北京工业大学 Attention mechanism-based lightweight network real-time semantic segmentation method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ZHENGNING ZHANG ET AL: "Bidirectional Parallel Feature Pyramid Network for Object Detection", 《IEEE ACCESS》 *
宁芊等: "基于多尺度特征和注意力机制的航空图像分割", 《控制理论与应用》 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114140469A (en) * 2021-12-02 2022-03-04 北京交通大学 Depth hierarchical image semantic segmentation method based on multilayer attention
CN114255350A (en) * 2021-12-23 2022-03-29 四川大学 Method and system for measuring thickness of soft and hard tissues of palate part
CN114255350B (en) * 2021-12-23 2023-08-04 四川大学 Method and system for measuring thickness of soft and hard tissues of palate
CN114241203A (en) * 2022-02-24 2022-03-25 科大天工智能装备技术(天津)有限公司 Workpiece length measuring method and system
CN116721420A (en) * 2023-08-10 2023-09-08 南昌工程学院 Semantic segmentation model construction method and system for ultraviolet image of electrical equipment
CN116721420B (en) * 2023-08-10 2023-10-20 南昌工程学院 Semantic segmentation model construction method and system for ultraviolet image of electrical equipment

Also Published As

Publication number Publication date
CN113240683B (en) 2022-09-20

Similar Documents

Publication Publication Date Title
CN113240683B (en) Attention mechanism-based lightweight semantic segmentation model construction method
CN111462126B (en) Semantic image segmentation method and system based on edge enhancement
CN111325165B (en) Urban remote sensing image scene classification method considering spatial relationship information
CN111259904B (en) Semantic image segmentation method and system based on deep learning and clustering
CN112381097A (en) Scene semantic segmentation method based on deep learning
CN110728192A (en) High-resolution remote sensing image classification method based on novel characteristic pyramid depth network
CN113870335A (en) Monocular depth estimation method based on multi-scale feature fusion
CN113435253B (en) Multi-source image combined urban area ground surface coverage classification method
CN112699899A (en) Hyperspectral image feature extraction method based on generation countermeasure network
CN112329801B (en) Convolutional neural network non-local information construction method
CN112733768A (en) Natural scene text recognition method and device based on bidirectional characteristic language model
CN113378938B (en) Edge transform graph neural network-based small sample image classification method and system
CN113642445B (en) Hyperspectral image classification method based on full convolution neural network
CN113516133A (en) Multi-modal image classification method and system
CN114821050A (en) Named image segmentation method based on transformer
CN113807340A (en) Method for recognizing irregular natural scene text based on attention mechanism
CN115761735A (en) Semi-supervised semantic segmentation method based on self-adaptive pseudo label correction
CN112508181A (en) Graph pooling method based on multi-channel mechanism
CN117237559A (en) Digital twin city-oriented three-dimensional model data intelligent analysis method and system
CN117576402B (en) Deep learning-based multi-scale aggregation transducer remote sensing image semantic segmentation method
CN113096133A (en) Method for constructing semantic segmentation network based on attention mechanism
CN116704506A (en) Cross-environment-attention-based image segmentation method
CN114494284B (en) Scene analysis model and method based on explicit supervision area relation
CN112990336B (en) Deep three-dimensional point cloud classification network construction method based on competitive attention fusion
CN113688783B (en) Face feature extraction method, low-resolution face recognition method and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant