CN111210432A - Image semantic segmentation method based on multi-scale and multi-level attention mechanism - Google Patents

Image semantic segmentation method based on multi-scale and multi-level attention mechanism Download PDF

Info

Publication number
CN111210432A
CN111210432A CN202010030667.1A CN202010030667A CN111210432A CN 111210432 A CN111210432 A CN 111210432A CN 202010030667 A CN202010030667 A CN 202010030667A CN 111210432 A CN111210432 A CN 111210432A
Authority
CN
China
Prior art keywords
image
follows
attention mechanism
feature
model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010030667.1A
Other languages
Chinese (zh)
Other versions
CN111210432B (en
Inventor
许海霞
黄云佳
刘用
周维
王帅龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiangtan University
Original Assignee
Xiangtan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiangtan University filed Critical Xiangtan University
Priority to CN202010030667.1A priority Critical patent/CN111210432B/en
Publication of CN111210432A publication Critical patent/CN111210432A/en
Application granted granted Critical
Publication of CN111210432B publication Critical patent/CN111210432B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4007Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/60Rotation of whole images or parts thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20112Image segmentation details
    • G06T2207/20132Image cropping
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an image semantic segmentation method based on a multi-scale and multi-level attention mechanism. The invention comprises the following steps: 1. and carrying out data preprocessing on the image and the real label image. 2. And establishing a neural network structure of the multi-scale attention mechanism model, and extracting and fusing image features. 3. And establishing a neural network structure of the multi-level attention mechanism model, and performing feature fusion of the multi-level images. 4. And (4) model training, namely training neural network parameters by using a back propagation algorithm until the network converges. The invention relates to a neural network model for image semantic segmentation, in particular to a unified modeling method for extracting self attention information of an image on multiple scales and a network structure for fusing image features of different levels on a multi-level, and a better segmentation effect in the field of semantic segmentation is obtained.

Description

Image semantic segmentation method based on multi-scale and multi-level attention mechanism
Technical Field
The invention belongs to the technical field of computer vision, relates to a deep neural network model for image semantic segmentation, and particularly relates to a method for uniformly modeling image feature data and a method for learning relevance among pixel points on image features so as to establish a deep model for image semantic segmentation.
Background
The image semantic segmentation technology is that a machine automatically segments and identifies the content of an image. Semantic segmentation of 2D images, videos, and even 3D data is a key issue in the field of computer vision. Semantic segmentation is a highly difficult task aimed at scene understanding. Scene understanding, as a core problem of computer vision, is particularly important today when the number of applications for extracting knowledge from images is dramatically enhanced. These applications include: autopilot, human-computer interaction, computer photography, image search engines, and augmented reality. These problems have been solved in the past using a variety of computer vision and machine learning methods. Despite the popularity of these approaches, deep learning changes this situation and many computer vision problems, including semantic segmentation, are being addressed by the deep framework. Typically a deep convolutional neural network, which can significantly improve accuracy and efficiency. Deep learning is then far less sophisticated than machine learning and other branches of computer vision. In view of this, there is still a lot of research space for semantic segmentation of images under the deep learning framework.
With the rapid development of deep learning in recent years, end-to-end problem modeling using a deep Neural network (CNN) and a full Convolutional Neural network (FCN) has become a mainstream research method in the computer visual direction. In the image semantic segmentation algorithm, the idea of end-to-end modeling is introduced, meanwhile, the end-to-end modeling is carried out on the characteristic image by using a proper network structure, and the problem of directly inputting the predicted semantic image is a problem worthy of deep discussion.
Because the content of the image in a natural scene is complex and the main body is various, semantic analysis on the image pixel by pixel is too laborious and inefficient, finding the relation between the middle pixel points of the characteristic image is a cut-in of several key difficulties of the task.
In summary, it is necessary to introduce attention learning (connection between pixel points) into an image semantic segmentation method based on end-to-end modeling, which is a direction worth of deep research.
Disclosure of Invention
Aiming at the problems in the prior art, the invention provides an image semantic segmentation method based on a multi-scale and multi-level attention mechanism.
The technical scheme adopted by the invention for solving the technical problems is as follows:
given an image I, the corresponding real label map Gt constitutes a training set.
Step (1), preprocessing a data set, and extracting the characteristics of image data
Preprocessing an image I: firstly, horizontally rotating the image I, randomly scaling the size, cutting the image I into uniform size, and extracting the features of the image by using a full convolution neural network to obtain the image If1、If2、If3And If4
Step (2), establishing a multi-scale attention mechanism model (MSM) and further extracting characteristics
Input image characteristics If4Zooming in different degrees is carried out on the image through bilinear interpolation, and finally, channel fusion is carried out to obtain a characteristic image I with specified dimensionalityf4_att
Step (3), establishing a multi-stage attention mechanism model (MCM) for feature fusion
Input image characteristics If1、If2And If4_attThe provided multi-stage attention mechanism model is used for effectively fusing the three characteristics to obtain a characteristic diagram I with strong characteristic information and good robustnessF
Step (4), model training
Input feature map IF、If2And (3) carrying out spatial cross entropy calculation with the real label graph Gt to obtain the difference with a real solution, and training the model parameters of the full convolution neural network defined in the step (2) and the step (3) by using a back propagation algorithm until the whole network model can be converged.
The data preprocessing and the image feature extraction in the step (1) are carried out:
extracting the features of the image I, and extracting the image features by using the conventional full convolution neural network (FCN) to form the image features If1、If2、If3And If4Wherein
Figure BDA0002363573340000021
And
Figure BDA0002363573340000022
wherein
Figure BDA0002363573340000023
Where c is the number of channels of the image feature and h and w are the height and width of the image feature, respectively.
The multi-scale attention mechanism model (MSM) for image semantic segmentation in the step (2) is used for feature fusion, and the specific formula is as follows:
2-1. for
Figure BDA0002363573340000024
Extracting characteristic information on different scales, wherein the specific formula is as follows:
x=Conv(If4) (1)
xs=Attention(bilinear interpolation(x,size(s));s=1,2,3,4;size=[48,32,16,8](2)
Ys=Concat(bilinear interpolation(xs,64),If4) (3)
where Conv is a 1 × 1 convolution, for If4Reducing the dimension of the channel; the bilinear interpolation function refers to feature amplification by bilinear interpolationShrinking; the Concat function refers to the feature performing the splicing operation. The Attention function is specifically disclosed as follows:
for the Attention function input feature image x, the specific formula is as follows:
xquery=Conv(x);xkey=Conv(x);xvalue=Conv(x) (4)
Figure BDA0002363573340000031
xcontext=xt value×xattention(6)
xout=μ×xcontext+x (7)
wherein μ denotes a learnable coefficient and
Figure BDA0002363573340000032
xtrefers to matrix transposition.
2-2, performing dimension reduction on the Concat output result, and extracting characteristic information, wherein the specific formula is as follows:
If4_att=Conv(Ys) (8)
where Conv is a 1 × 1 convolution for YsReducing the dimension of the channel;
the multi-stage attention mechanism model (MCM) for image semantic segmentation in the step (3) specifically comprises the following steps:
firstly, a multi-level attention mechanism model for image semantic segmentation is described, and the model is specifically realized as follows:
inputting low-order characteristic image x to multi-stage attention mechanism modellAnd higher order feature image xhThe concrete formula is as follows:
3-1, carrying out unified dimension and size operation on the two input feature graphs:
xl=Conv(xl) (9)
xh=bilinear interpolation(xh,size(xl)) (10)
where the Conv function is a 1 × 1 convolution for xlPerforming channel dimensionality reduction; blinThe ear interpolation function is a bilinear interpolation pair xhPerforming size enlargement to obtain a product of xlUniform size.
3-2, carrying out splicing and normalization operation on the two characteristic images with the same dimensionality to obtain attention information:
xlh=Concat(xl,xh) (11)
xatt=Softmax(Normalize(GAP(xlh))) (12)
wherein GAP is global average pooling, and Softmax formula is as follows
Figure BDA0002363573340000033
3-3, performing Hadamardproduct operation on the attention information image and the low-order characteristic image, wherein the specific formula is as follows:
Figure BDA0002363573340000041
and 3-4, performing summation operation on the Hadamardroduct input and the high-order characteristic image, wherein the specific formula is as follows:
Fa=fa+xh(15)
then sequentially adding If4_att、If2And If1The method is input into a multi-stage attention mechanism model, and the specific formula is as follows:
IF=MCM(If4_att,If2) (16)
IF=MCM(IF,If1) (17)
wherein the MCM function refers to a multi-stage attention mechanism model.
The training model in the step (4) is as follows:
the predicted image I generated in the step (3) is processedFThe characteristic image I generated in the step (1)f3And inputting the real tag graph Gt into a defined Loss function CrossEntropyLoss to obtain a Loss value Loss, wherein the Loss value Loss is specifically disclosed as follows:
Loss=CrossEntropyLoss(IF,If3,Gt) (18)
wherein the formula of Cross EntropyLoss is as follows:
Figure BDA0002363573340000042
Figure BDA0002363573340000043
Loss=L1+λ×L2(21)
wherein B refers to the number of images input into the neural network, C refers to the number of channels of the characteristic images, and λ refers to the weight values of the two loss functions.
And adjusting parameters in the network by using a back propagation algorithm according to the calculated Loss value Loss.
The invention has the following beneficial effects:
compared with other methods, the method provided by the invention has relatively better performance in precision aiming at the problem of image semantic segmentation: firstly, the parameter quantity of the model is greatly reduced, the overfitting of the model is effectively prevented, and the training time of the model is reduced; second, it is simpler and easier to implement than other models. According to the invention, an attention mechanism is introduced into the end-to-end-based full convolution neural network, and image features are extracted at multiple scales and multiple levels, so that a better effect in an image semantic segmentation task is obtained.
Drawings
Fig. 1 is a general structural view of the present invention.
FIG. 2 is a multi-scale attention mechanism model of the present invention.
FIG. 3 is a multi-stage attention mechanism model of the present invention.
Fig. 4 is a visualization result of the model experiment of the present invention.
Detailed Description
In order to make the purpose and technical solution of the present invention more clearly understood, the following detailed description is made with reference to the accompanying drawings and examples, and the application principle of the present invention is described in detail.
As shown in fig. 1, fig. 2 and fig. 3, the present invention provides a deep neural network structure for Image semantic segmentation (Image semantic segmentation), which comprises the following specific steps:
the data preprocessing and the feature extraction of the image in the step (1) are specifically as follows:
the Pascal VOC2012 data set is used here as training and testing data.
For image data, the image features are extracted here using the existing 101-layer depth residual network (Resnet-101) model. Specifically, we uniformly scale the image data to 513 × 513 and input it into the depth residual network, and extract the output of res2c layer therein as the image feature
Figure BDA0002363573340000051
Extracting the output of res3c layer as image feature
Figure BDA0002363573340000052
Extracting the output of res4c layer as image feature
Figure BDA0002363573340000053
Extracting the output of res5c layer as image feature
Figure BDA0002363573340000054
The multi-scale attention mechanism model (MSM) in the step (2) fuses image features, and the method specifically comprises the following steps:
2-1 for If4Extraction of feature information at different scales is performed. First using convolution operation pair If4And performing dimension reduction operation to 512 channels.
2-2, carrying out bilinear interpolation operation on the dimension reduction output result to obtain characteristic images x with the dimensions of 48,32,16 and 8 respectivelys
And 2-3, performing Attention operation on the feature images with 4 scales, extracting the relevance among pixel points, and then outputting a result by sampling the Attention through bilinear interpolation. Wherein the Attention operation has the following specific formula:
for the Attention function input feature image x, the specific formula is as follows:
xquery=Conv(x);xkey=Conv(x);xvalue=Conv(x) (22)
Figure BDA0002363573340000061
xcontext=xt value×xattention(24)
xout=μ×xcontext+x (25)
wherein μ denotes a learnable coefficient and
Figure BDA0002363573340000062
xtrefers to matrix transposition.
2-4, outputting the result and I of 4 multi-scale attentionsf4Carrying out splicing and dimensionality reduction operation to obtain a characteristic image I with attention informationf4_att
The relevant operation of the multi-scale attention mechanism model is completed.
Fusing the image characteristics by the multi-stage attention mechanism model (MCM) in the step (3), which comprises the following specific steps:
3-1. for input features If4_attAnd If2Unification in dimension and scale is performed.
3-2, splicing the two characteristic images with uniform dimension, and sequentially carrying out global average pooling, regularization and normalization on the spliced output result to obtain a characteristic image x with attention informationatt
3-3. image x of attention informationattAnd a low-order feature image If2F is obtained by Hadamardroduct operationa
3-4, outputting result f for HadamardproductaAnd a high-order characteristic image If3_attDoing a summation operation to obtain Fa
3-5, mixing If1As a low-order feature image, FaPerforming the above operations 3-1 to 3-4 as high-order characteristic images to obtain final imagesOutput characteristic image IF
Thus, the multi-stage attention mechanism model operation is completed.
The training model in the step (4) is specifically as follows:
for the prediction characteristic image generated in the step (3)
Figure BDA0002363573340000063
And the characteristic image generated in the step (1)
Figure BDA0002363573340000064
An upsample operation is performed to the original size 513 × 513 and the dimensions are reduced to the number of classes of the Pascal VOC2012 data set by a convolution operation (21). Comparing the loss value with a real tag graph Gt of a data set, calculating to obtain the difference between a predicted value and an actual correct value through a defined loss function Cross EntropyLoss and forming a loss value, and then adjusting the parameter value of the whole network by using a Back-Propagation (BP) algorithm according to the loss value until the network converges.
The following table shows the accuracy of the process of the invention in Pascal VOC 2012. Our is the depth model proposed by the invention, aero, bike represents the class object to be semantically segmented in the data set, and mIoU represents the average accuracy of all classes on the semantic segmentation task.
Figure BDA0002363573340000071

Claims (4)

1. An image semantic segmentation method based on a multi-scale and multi-level attention mechanism is characterized by comprising the following steps:
given an image I, the corresponding real label map Gt, constitutes a training set:
step (1): data set preprocessing, feature extraction of image data
Preprocessing an image I: firstly, horizontally rotating the image I, randomly scaling the size, cutting the image I into uniform size, and extracting the features of the image by using a full convolution neural network to obtain the image If1、If2、If3And If4
Step (2): establishing a multi-scale attention mechanism model (MSM) and further extracting characteristics
Input image characteristics If4Zooming in different degrees is carried out on the image through bilinear interpolation, and finally, channel fusion is carried out to obtain image characteristics I with specified dimensionalityf4_att
And (3): establishing a multi-level attention mechanism model (MCM) for feature fusion
Input image characteristics If1、If2And If4_attEffectively fusing the three characteristics by using the proposed multi-stage attention mechanism model to obtain a characteristic diagram I with strong characteristic information and good robustnessF
And (4): model training
Input feature map IF、If2And (3) carrying out spatial cross entropy calculation with the real label graph Gt to obtain the difference with a real solution, and training the model parameters of the full convolution neural network defined in the step (2) and the step (3) by using a back propagation algorithm until the whole network model can be converged.
2. The image semantic segmentation method based on the multi-scale and multi-level attention mechanism according to claim 1, characterized in that the image preprocessing of step (1) and the feature fusion of the multi-scale attention mechanism model (MSM) of step (2) are as follows:
2-1, extracting the features of the image I, and extracting the image features by using the existing full convolution neural network (FCN) to form the image features If1、If2、If3And If4Which is
Figure FDA0002363573330000011
And
Figure FDA0002363573330000012
wherein
Figure FDA0002363573330000013
Where c is the number of channels of the image feature and h and w are the height and width of the image feature, respectively.
2-2 for If4Extracting characteristic information on different scales, wherein a specific formula is as follows:
x=Conv(If4) (1)
xs=Attention(bilinear interpolation(x,size(s));s=1,2,3,4;size=[48,32,16,8](2)
Ys=Concat(bilinear interpolation(xs,64),If4) (3)
where Conv is a 1 × 1 convolution, for If4Reducing the dimension of the channel; the bilinear interpolation function refers to feature scaling by bilinear interpolation; the Concat function refers to the splicing operation of the feature images. The Attention function is specifically disclosed as follows:
for the Attention function input feature image x, the specific formula is as follows:
xquery=Conv(x);xkey=Conv(x);xvalue=Conv(x) (4)
Figure FDA0002363573330000021
xcontext=xt value×xattention(6)
xout=μ×xcontext+x (7)
wherein μ denotes a learnable coefficient and
Figure FDA0002363573330000022
xtrefers to matrix transposition.
2-3, reducing the dimension of the Concat output result, and extracting characteristic information, wherein the specific formula is as follows:
If4_att=Conv(Ys) (8)
where Conv is a 1 × 1 convolution for YsAnd (5) reducing the dimension of the channel.
3. The image semantic segmentation method based on the multi-scale multi-stage attention mechanism as claimed in claim 1, wherein the multi-stage attention mechanism model (MCM) for image semantic segmentation in step (3) is specifically as follows:
firstly, the specific implementation of the multi-level attention mechanism model for image semantic segmentation is described as follows:
inputting low-order characteristic image x to multi-stage attention mechanism modellAnd higher order feature image xhThe concrete formula is as follows:
3-1, carrying out unified dimension and size operation on the two input feature graphs:
xl=Conv(xl) (9)
xh=bilinear interpolation(xh,size(xl)) (10)
where the Conv function is a 1 × 1 convolution for xlPerforming channel dimensionality reduction; the bilinear interpolation function is a bilinear interpolation pair xhPerforming size enlargement to obtain a product of xlUniform size.
3-2, carrying out splicing and normalization operation on the two characteristic images with the same dimensionality to obtain attention information:
xlh=Concat(xl,xh) (11)
xatt=Softmax(Normalize(GAP(xlh))) (12)
wherein GAP is global average pooling, and Softmax formula is as follows:
Figure FDA0002363573330000023
3-3. image x of attention informationattAnd low-order feature image xlThe Hadamardproduct operation is carried out, and the specific formula is as follows:
Figure FDA0002363573330000031
3-4, outputting the result and the high-order characteristic image x to the HadamardroducthAnd (3) performing summation operation, wherein the specific formula is as follows:
Fa=fa+xh(15)
then sequentially adding If4_att、If2And If1The input is input into a multi-stage attention mechanism model (MCM), and the specific formula is as follows:
IF=MCM(If4_att,If2) (16)
IF=MCM(IF,If1) (17)
wherein the MCM function refers to a multi-stage attention mechanism model.
4. The image semantic segmentation method based on the multi-scale and multi-level attention mechanism according to claim 1, wherein the training model in the step (4) is as follows:
the predicted image I generated in the step (3) is processedFThe characteristic image I generated in the step (1)f3And inputting the real tag graph Gt into a defined Loss function CrossEntropyLoss to obtain a Loss value Loss, wherein the Loss value Loss is specifically disclosed as follows:
Loss=CrossEntropyLoss(IF,If3,Gt) (18)
wherein the formula of Cross EntropyLoss is as follows:
Figure FDA0002363573330000032
Figure FDA0002363573330000033
Loss=L1+λ×L2(21)
wherein B refers to the number of images input into the neural network, C refers to the number of channels of the characteristic images, and λ refers to the weight values of the two loss functions.
And adjusting parameters in the network by using a back propagation algorithm according to the calculated Loss value Loss.
CN202010030667.1A 2020-01-12 2020-01-12 Image semantic segmentation method based on multi-scale multi-level attention mechanism Active CN111210432B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010030667.1A CN111210432B (en) 2020-01-12 2020-01-12 Image semantic segmentation method based on multi-scale multi-level attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010030667.1A CN111210432B (en) 2020-01-12 2020-01-12 Image semantic segmentation method based on multi-scale multi-level attention mechanism

Publications (2)

Publication Number Publication Date
CN111210432A true CN111210432A (en) 2020-05-29
CN111210432B CN111210432B (en) 2023-07-25

Family

ID=70786703

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010030667.1A Active CN111210432B (en) 2020-01-12 2020-01-12 Image semantic segmentation method based on multi-scale multi-level attention mechanism

Country Status (1)

Country Link
CN (1) CN111210432B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111667495A (en) * 2020-06-08 2020-09-15 北京环境特性研究所 Image scene analysis method and device
CN111860517A (en) * 2020-06-28 2020-10-30 广东石油化工学院 Semantic segmentation method under small sample based on decentralized attention network
CN112233129A (en) * 2020-10-20 2021-01-15 湘潭大学 Deep learning-based parallel multi-scale attention mechanism semantic segmentation method and device
CN112465828A (en) * 2020-12-15 2021-03-09 首都师范大学 Image semantic segmentation method and device, electronic equipment and storage medium
WO2022227913A1 (en) * 2021-04-25 2022-11-03 浙江师范大学 Double-feature fusion semantic segmentation system and method based on internet of things perception

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018153322A1 (en) * 2017-02-23 2018-08-30 北京市商汤科技开发有限公司 Key point detection method, neural network training method, apparatus and electronic device
CN110163878A (en) * 2019-05-28 2019-08-23 四川智盈科技有限公司 A kind of image, semantic dividing method based on dual multiple dimensioned attention mechanism
CN110188685A (en) * 2019-05-30 2019-08-30 燕山大学 A kind of object count method and system based on the multiple dimensioned cascade network of double attentions

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018153322A1 (en) * 2017-02-23 2018-08-30 北京市商汤科技开发有限公司 Key point detection method, neural network training method, apparatus and electronic device
CN110163878A (en) * 2019-05-28 2019-08-23 四川智盈科技有限公司 A kind of image, semantic dividing method based on dual multiple dimensioned attention mechanism
CN110188685A (en) * 2019-05-30 2019-08-30 燕山大学 A kind of object count method and system based on the multiple dimensioned cascade network of double attentions

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张东波;易良玲;许海霞;张莹: "多尺度局部结构主导二值模式学习图像表示" *
赵斐: "基于金字塔注意力机制的遥感图像语义分割" *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111667495A (en) * 2020-06-08 2020-09-15 北京环境特性研究所 Image scene analysis method and device
CN111860517A (en) * 2020-06-28 2020-10-30 广东石油化工学院 Semantic segmentation method under small sample based on decentralized attention network
CN111860517B (en) * 2020-06-28 2023-07-25 广东石油化工学院 Semantic segmentation method under small sample based on distraction network
CN112233129A (en) * 2020-10-20 2021-01-15 湘潭大学 Deep learning-based parallel multi-scale attention mechanism semantic segmentation method and device
CN112465828A (en) * 2020-12-15 2021-03-09 首都师范大学 Image semantic segmentation method and device, electronic equipment and storage medium
CN112465828B (en) * 2020-12-15 2024-05-31 益升益恒(北京)医学技术股份公司 Image semantic segmentation method and device, electronic equipment and storage medium
WO2022227913A1 (en) * 2021-04-25 2022-11-03 浙江师范大学 Double-feature fusion semantic segmentation system and method based on internet of things perception

Also Published As

Publication number Publication date
CN111210432B (en) 2023-07-25

Similar Documents

Publication Publication Date Title
CN111210432A (en) Image semantic segmentation method based on multi-scale and multi-level attention mechanism
CN111858954B (en) Task-oriented text-generated image network model
Zhang et al. Weakly supervised semantic segmentation for large-scale point cloud
CN111079532B (en) Video content description method based on text self-encoder
US11328172B2 (en) Method for fine-grained sketch-based scene image retrieval
JP7291183B2 (en) Methods, apparatus, devices, media, and program products for training models
CN111242844B (en) Image processing method, device, server and storage medium
CN111340814A (en) Multi-mode adaptive convolution-based RGB-D image semantic segmentation method
US20220270384A1 (en) Method for training adversarial network model, method for building character library, electronic device, and storage medium
CN112990116A (en) Behavior recognition device and method based on multi-attention mechanism fusion and storage medium
Kakillioglu et al. 3D capsule networks for object classification with weight pruning
WO2023173552A1 (en) Establishment method for target detection model, application method for target detection model, and device, apparatus and medium
CN110633706B (en) Semantic segmentation method based on pyramid network
Yang et al. Xception-based general forensic method on small-size images
CN115482387A (en) Weak supervision image semantic segmentation method and system based on multi-scale class prototype
CN114333062A (en) Pedestrian re-recognition model training method based on heterogeneous dual networks and feature consistency
CN117033609B (en) Text visual question-answering method, device, computer equipment and storage medium
CN110110775A (en) A kind of matching cost calculation method based on hyper linking network
CN113837290A (en) Unsupervised unpaired image translation method based on attention generator network
CN117036699A (en) Point cloud segmentation method based on Transformer neural network
CN116861022A (en) Image retrieval method based on combination of deep convolutional neural network and local sensitive hash algorithm
CN116778164A (en) Semantic segmentation method for improving deep V < 3+ > network based on multi-scale structure
CN116485892A (en) Six-degree-of-freedom pose estimation method for weak texture object
EP4170547A1 (en) Method for extracting data features, and related apparatus
CN114722902A (en) Unmarked video Hash retrieval method and device based on self-supervision learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant