CN111553391A - Feature fusion method in semantic segmentation technology - Google Patents

Feature fusion method in semantic segmentation technology Download PDF

Info

Publication number
CN111553391A
CN111553391A CN202010274552.7A CN202010274552A CN111553391A CN 111553391 A CN111553391 A CN 111553391A CN 202010274552 A CN202010274552 A CN 202010274552A CN 111553391 A CN111553391 A CN 111553391A
Authority
CN
China
Prior art keywords
feature fusion
network
training
semantic segmentation
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010274552.7A
Other languages
Chinese (zh)
Inventor
杨绿溪
顾恒瑞
朱紫辉
王路
李春国
黄永明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Southeast University
Original Assignee
Southeast University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Southeast University filed Critical Southeast University
Priority to CN202010274552.7A priority Critical patent/CN111553391A/en
Publication of CN111553391A publication Critical patent/CN111553391A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Abstract

A feature fusion method in semantic segmentation technology. A semantic segmentation network based on an encoder-decoder structure proposes a feature fusion mode in a decoder part. The feature fusion method comprises operations of splicing, pooling, convolution, activation, addition and the like, and can effectively fuse different features, improve the expression capability of the network on the features and finally achieve the purpose of improving the accuracy of network segmentation. Meanwhile, in the network training process, the method can accelerate the convergence process of the loss function and shorten the training time. The invention tests on the semantic segmentation public data set and obtains good experimental results. Compared with a network without feature fusion or using other feature fusion methods, the network using the feature fusion method provided by the invention has higher accuracy in testing and the loss function can be converged faster during training.

Description

Feature fusion method in semantic segmentation technology
Technical Field
The invention relates to the field of computer vision, in particular to a feature fusion method in a semantic segmentation technology.
Background
In recent years, with continuous breakthrough of parallel computing theory and hardware implementation level, the field of computer vision has been greatly developed. Especially, in the il svrc (image net Large Scale Visual recognitionchallenge) game in 2012, AlexNet based on the convolutional neural network takes the champion of the classified item, which causes the hot tide of deep learning, and the deep learning technology starts to play a great splendid. At present, in the field of computer vision, deep learning, especially convolutional neural networks, play an increasingly important role in various visual recognition tasks.
Semantic segmentation technology has been attracting attention as one of the key technologies in the field of computer vision. Semantic segmentation is a task applied to scene understanding, and is classified at the pixel level. Semantic segmentation has wide application prospect and covers a plurality of fields including automatic driving, man-machine interaction, robots, augmented reality and the like.
At present, in the semantic segmentation task, most of the mainstream algorithms are based on deep learning, especially the convolutional neural network.
In 2015, Jonathan Long et al of UC Berkeley proposed a full convolution neural network (FCN) for semantic segmentation, which performed pioneering work in semantic segmentation and solved the problem of pixel segmentation. The full convolutional neural network proposes to replace all the fully connected layers behind the traditional neural network with convolutional layers, which is also the origin of the full convolutional name. Semantic segmentation techniques based on convolutional neural networks have also evolved rapidly from this.
In the same year, U-Net network also proposes that the U-Net network is a typical 'coder-decoder' structure, and the structure is a semantic segmentation structure which is more mainstream at present, and SegNet is also adopted in a similar structure. The semantic segmentation networks based on the structure of the encoder-decoder of U-Net and SegNet have good performance and better performance in segmentation tasks.
In general, in a semantic segmentation network based on an "encoder-decoder" structure, feature fusion is performed in a "decoder" part, for example, in an FCN, the feature fusion adopts an addition (add) mode, and in a U-Net, the feature fusion adopts a concatenation (concat) mode. The feature fusion is to improve the expression ability of the network to the features, so that the network can obtain more accurate segmentation results.
Different feature fusion methods have different effects, how to search for more effective feature methods has further improvement on network performance, and the method is a hot problem of current semantic segmentation research.
Disclosure of Invention
In order to solve the existing problems, the invention provides a feature fusion method in a semantic segmentation technology, which can improve the expression capability of a network on features so as to achieve the purpose of improving the accuracy of semantic segmentation results.
To achieve the object, the present invention provides a feature fusion method in semantic segmentation technology, comprising the following steps:
step 1: constructing a semantic segmentation network based on an encoder-decoder structure;
step 2: performing feature fusion in a decoder part;
and step 3: the feature fusion method comprises splicing, pooling, convolution, activation and addition operations;
and 4, step 4: training the network subjected to the feature fusion on a training set;
and 5: training a network which is not subjected to feature fusion and uses other feature fusion methods on a training set;
step 6: testing the trained network on a test set;
and 7: analyzing and comparing the effects of the characteristic fusion method.
As a further improvement of the present invention, in the step 1, a semantic segmentation network based on an "encoder-decoder" is constructed, in this architecture, the "encoder" part extracts features through a convolutional layer and a pooling layer, the depth of a feature map is continuously deepened, but the size is also continuously reduced; after receiving the features extracted by the encoder, the decoder performs up-sampling by deconvolution to restore the size of the feature map, and finally obtains a semantic segmentation result equivalent to the size of the original image.
As a further improvement of the present invention, in the step 2, feature fusion is performed in the "decoder" part, where feature fusion means that the network layer in the "decoder" not only takes the output of the last network layer in the "decoder" as an output, but also receives the output of the corresponding network layer in the "encoder" as an input, so that different features can be fused together, and the feature expression capability of the network can be improved.
As a further improvement of the present invention, in the step 3, the proposed feature fusion method includes operations of splicing, pooling, convolution, activation, and addition, where the output of the previous network layer in the "decoder" and the output of the corresponding network layer in the "encoder" are used as inputs in the network layer in the "decoder", the two inputs are spliced in channel dimension, and then subjected to pooling, convolution, and activation operations, and then added to the feature map that is not processed after splicing, so as to obtain the output after feature fusion.
As a further improvement of the present invention, in step 4, the network after feature fusion is trained on a training set, an open data set is used during training, a change curve of a loss function in the training process, an accuracy curve of the network on a verification set and training duration are recorded, and an influence of the feature fusion method on the network training process is studied.
As a further improvement of the present invention, in the step 5, the network without feature fusion and using other feature fusion methods is trained on a training set, the training set is consistent with the training set in the step 4, and a change curve of a loss function in the training process, an accuracy curve of the network on a verification set and a training time length are also recorded.
As a further improvement of the present invention, in step 6, the networks trained in step 4 and step 5 are tested on the same test set, the test set uses an open data set, and records test results respectively, including accuracy and average cross-over ratio indexes, and outputs semantic segmentation results at the same time.
As a further improvement of the present invention, in step 7, the proposed feature fusion method is analyzed and compared in terms of its effects, and compared with a network without feature fusion and using other feature fusion methods, the proposed feature fusion method is analyzed in terms of its performances during testing and training, and the indexes of segmentation accuracy and average cross-over ratio are compared during testing; and comparing the convergence speed of the loss function with the training time length during training.
The invention provides a feature fusion method in a semantic segmentation technology, which constructs a semantic segmentation network based on an encoder-decoder structure and provides a feature fusion mode in a decoder part. The feature fusion method comprises operations of splicing, pooling, convolution, activation, addition and the like, and can effectively fuse different features, improve the expression capability of the network on the features and finally achieve the purpose of improving the accuracy of network segmentation. Meanwhile, in the network training process, the method can accelerate the convergence process of the loss function and shorten the training time. The invention tests on the semantic segmentation public data set and obtains good experimental results. Compared with a network without feature fusion or using other feature fusion methods, the network using the feature fusion method provided by the invention has higher accuracy in testing and the loss function can be converged faster during training.
Drawings
FIG. 1 is a diagram of a semantically partitioned network architecture;
FIG. 2 is a schematic view of feature fusion;
FIG. 3 is an additive feature fusion approach;
FIG. 4 is a feature fusion approach to stitching;
FIG. 5 is a feature fusion approach of the present invention;
FIG. 6 is a diagram of semantic segmentation results;
FIG. 7 is a IoU calculation diagram;
fig. 8 is a comparison of the training process.
Detailed Description
The invention is described in further detail below with reference to the following detailed description and accompanying drawings:
the invention provides a feature fusion method in a semantic segmentation technology, aiming at improving the expression capability of a network on features so as to improve the accuracy of a segmentation result.
The specific embodiment of the invention is as follows:
step 1: and constructing a semantic segmentation network based on an encoder-decoder. Fig. 1 shows a semantic-partitioned network structure diagram constructed in the present invention, in an "encoder" part, a network mainly consists of a convolutional layer and a pooling layer, and as the network deepens, the depth of a feature diagram also continuously increases, but the size thereof continuously decreases; in the 'decoder' part, the network mainly consists of a deconvolution layer, the deconvolution layer can continuously recover the size of the feature graph, and finally a semantic segmentation result with the size equivalent to that of the input is output.
Step 2: feature fusion is performed in the "decoder" part. Fig. 2 gives a schematic representation of feature fusion. The feature fusion means that the network layer in the "decoder" not only takes the output of the last network layer in the "decoder" as the output, but also receives the output of the network layer in the corresponding "encoder" as the input, so that different features can be fused, and the feature expression capability of the network can be improved.
At present, the common feature fusion methods include addition and splicing. The feature fusion method of addition is, as shown in fig. 3, to obtain an output result by adding values at corresponding positions in two feature maps. The feature fusion method of splicing is shown in fig. 4, and means that two feature maps are spliced in channel dimension to obtain an output result.
And step 3: the feature fusion method proposed by the present invention is shown in fig. 5, and includes operations of splicing, pooling, convolution, activation, addition, and the like. Recording the feature map of two inputs as featureAAnd featureB,featureAAnd featureBSplicing on the dimension of the channel to obtain featureC,featureCObtaining feature after global pooling (global pooling), convolution (conv), ReLU activation function, convolution (conv) and sigmoid activation functionD,featureDRe-harmonizing featureCAnd adding to obtain an output.
And 4, step 4: training the network subjected to feature fusion on a training set, wherein an open data set CamVid is used during training, the CamVid is a road and driving scene understanding data set and comprises 5 video sequences, and the data comes from a 960 x 720 resolution camera installed on an automobile instrument panel. The data set samples a total of 701 frames from the video sequences (4 at 1 frame per second and 1 at 15 frames per second), which were manually labeled with 32 categories. Sturgess et al divided the data set into 367 training sets, 100 validation sets, and 233 test sets.
In the training process, a change curve of a loss function in the network, an accuracy curve of the network on a verification set, training time and the like are recorded, and the influence of the characteristic fusion method on the network training process is researched.
And 5: and (4) training the network which is not subjected to the feature fusion and uses other feature fusion methods on a training set, wherein the training set is consistent with the training set in the step 4, and a change curve of a loss function in the training process, an accuracy curve of the network on a verification set, training duration and the like are also recorded.
Step 6: and (5) testing the networks trained in the step (4) and the step (5) on the same test set, wherein the test set uses an open data set, respectively records test results including indexes such as accuracy and average cross-over ratio, and simultaneously outputs semantic segmentation results. FIG. 6 is a diagram illustrating a set of semantic segmentation results.
And 7: analyzing and comparing the effect of the characteristic fusion method provided by the invention. Compared with a network which does not perform characteristic fusion and uses other characteristic fusion methods, the performance of the characteristic fusion method provided by the invention is analyzed during testing and training, and indexes of segmentation accuracy and average cross-over ratio are compared during testing; and during training, comparing the convergence speed of the loss function, the training time and the like.
Figure BDA0002444305220000041
Figure BDA0002444305220000051
TABLE 1
Table 1 shows the comparison of the accuracy of different feature fusion modes. The accuracy index uses a mean intersection over Union (mlou). The Intersection over Union (IoU) refers to the area of Intersection between the predicted value and the true value and the ratio of the areas of Intersection between the predicted value and the true value, and the calculation diagram is given in fig. 7. mlou refers to the average of IoU for each class.
It can be seen that without feature fusion, the average intersection-to-parallel ratio of the network on the test set is 64, 72%, while with feature fusion, the value of mlio u can be increased, the addition mode can be increased by 2.81% to 57.53%, and the splicing mode can be increased by 2.91% to 57.62%. The addition is similar to how much the concatenation improves mlou. In comparison, when the network using the feature fusion mode provided by the invention is tested, the mIoU value can reach 58.94%, and is increased by 4.02%. Therefore, the method for fusing the characteristics can better improve the accuracy of network segmentation.
FIG. 8 is a graph showing loss functions for a network without feature fusion, a network with additive feature fusion, a network with splice feature fusion, and a network with the feature fusion method of the present invention during training. The loss functions are all cross-entropy loss functions, defined as follows:
Figure BDA0002444305220000052
wherein, yiA label representing the authenticity of the tag,
Figure BDA0002444305220000053
indicates the predicted label, and N is the total number of categories.
It can be seen that, in a network without feature fusion, the loss function curve converges slowest; the network adopting the mode of adding and splicing feature fusion ensures that the convergence of the loss function curve is quicker; and by adopting the network of the characteristic fusion mode in the invention, the loss function curve is converged fastest, so the effect of shortening the training time can be achieved.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, but any modifications or equivalent variations made according to the technical spirit of the present invention are within the scope of the present invention as claimed.

Claims (8)

1. The feature fusion method in the semantic segmentation technology is characterized by comprising the following steps of:
step 1: constructing a semantic segmentation network based on an encoder-decoder structure;
step 2: performing feature fusion in a decoder part;
and step 3: the feature fusion method comprises splicing, pooling, convolution, activation and addition operations;
and 4, step 4: training the network subjected to the feature fusion on a training set;
and 5: training a network which is not subjected to feature fusion and uses other feature fusion methods on a training set;
step 6: testing the trained network on a test set;
and 7: analyzing and comparing the effects of the characteristic fusion method.
2. The method for feature fusion in semantic segmentation technology according to claim 1, characterized in that: in the step 1, a semantic segmentation network based on an encoder-decoder is constructed, in the framework, the encoder part extracts features through a convolutional layer and a pooling layer, the depth of a feature map is continuously deepened, and the size is also continuously reduced; after receiving the features extracted by the encoder, the decoder performs up-sampling by deconvolution to restore the size of the feature map, and finally obtains a semantic segmentation result equivalent to the size of the original image.
3. The method for feature fusion in semantic segmentation technology according to claim 1, characterized in that: in the step 2, feature fusion is performed in the "decoder" part, where the feature fusion means that the network layer in the "decoder" not only takes the output of the last network layer in the "decoder" as an output, but also receives the output of the network layer in the corresponding "encoder" as an input, so that different features can be fused together, and the feature expression capability of the network can be improved.
4. The method for feature fusion in semantic segmentation technology according to claim 1, characterized in that: in the step 3, the proposed feature fusion method includes operations of splicing, pooling, convolution, activation and addition, the network layer in the "decoder" takes the output of the last network layer in the "decoder" and the output of the corresponding network layer in the "encoder" as inputs, the two inputs are spliced in channel dimension, and then subjected to pooling, convolution and activation operations, and then added to the feature map which is not processed after splicing, so as to obtain the output after feature fusion.
5. The method for feature fusion in semantic segmentation technology according to claim 1, characterized in that: in the step 4, the network after feature fusion is trained on a training set, an open data set is used during training, a change curve of a loss function in the training process, an accuracy curve of the network on a verification set and training duration are recorded, and the influence of a feature fusion method on the network training process is researched.
6. The method for feature fusion in semantic segmentation technology according to claim 1, characterized in that: in the step 5, training is performed on the network which is not subjected to the feature fusion and uses other feature fusion methods on the training set, the training set is kept consistent with the training set in the step 4, and the change curve of the loss function in the training process, the accuracy curve of the network on the verification set and the training duration are also recorded.
7. The method for feature fusion in semantic segmentation technology according to claim 1, characterized in that: in the step 6, the networks trained in the step 4 and the step 5 are tested on the same test set, the test set uses an open data set, test results including accuracy and average cross-over ratio indexes are respectively recorded, and meanwhile, semantic segmentation results are output.
8. The method for feature fusion in semantic segmentation technology according to claim 1, characterized in that: in the step 7, the functions of the proposed feature fusion method are analyzed and compared, and compared with a network which does not perform feature fusion and uses other feature fusion methods, the performance of the proposed feature fusion method is analyzed during testing and training, and indexes of segmentation accuracy and average cross-over ratio are compared during testing; and comparing the convergence speed of the loss function with the training time length during training.
CN202010274552.7A 2020-04-09 2020-04-09 Feature fusion method in semantic segmentation technology Pending CN111553391A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010274552.7A CN111553391A (en) 2020-04-09 2020-04-09 Feature fusion method in semantic segmentation technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010274552.7A CN111553391A (en) 2020-04-09 2020-04-09 Feature fusion method in semantic segmentation technology

Publications (1)

Publication Number Publication Date
CN111553391A true CN111553391A (en) 2020-08-18

Family

ID=72005723

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010274552.7A Pending CN111553391A (en) 2020-04-09 2020-04-09 Feature fusion method in semantic segmentation technology

Country Status (1)

Country Link
CN (1) CN111553391A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113393466A (en) * 2021-06-18 2021-09-14 中国石油大学(华东) Semantic segmentation network model for MODIS sea fog detection

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109190626A (en) * 2018-07-27 2019-01-11 国家新闻出版广电总局广播科学研究院 A kind of semantic segmentation method of the multipath Fusion Features based on deep learning
CN109447994A (en) * 2018-11-05 2019-03-08 陕西师范大学 In conjunction with the remote sensing image segmentation method of complete residual error and Fusion Features
CN110110692A (en) * 2019-05-17 2019-08-09 南京大学 A kind of realtime graphic semantic segmentation method based on the full convolutional neural networks of lightweight
CN110298361A (en) * 2019-05-22 2019-10-01 浙江省北大信息技术高等研究院 A kind of semantic segmentation method and system of RGB-D image
CN110852199A (en) * 2019-10-28 2020-02-28 中国石化销售股份有限公司华南分公司 Foreground extraction method based on double-frame coding and decoding model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109190626A (en) * 2018-07-27 2019-01-11 国家新闻出版广电总局广播科学研究院 A kind of semantic segmentation method of the multipath Fusion Features based on deep learning
CN109447994A (en) * 2018-11-05 2019-03-08 陕西师范大学 In conjunction with the remote sensing image segmentation method of complete residual error and Fusion Features
CN110110692A (en) * 2019-05-17 2019-08-09 南京大学 A kind of realtime graphic semantic segmentation method based on the full convolutional neural networks of lightweight
CN110298361A (en) * 2019-05-22 2019-10-01 浙江省北大信息技术高等研究院 A kind of semantic segmentation method and system of RGB-D image
CN110852199A (en) * 2019-10-28 2020-02-28 中国石化销售股份有限公司华南分公司 Foreground extraction method based on double-frame coding and decoding model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马震环: "基于增强特征融合解码器的语义分割算法" *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113393466A (en) * 2021-06-18 2021-09-14 中国石油大学(华东) Semantic segmentation network model for MODIS sea fog detection

Similar Documents

Publication Publication Date Title
CN109543502B (en) Semantic segmentation method based on deep multi-scale neural network
CN112465008B (en) Voice and visual relevance enhancement method based on self-supervision course learning
CN111127493A (en) Remote sensing image semantic segmentation method based on attention multi-scale feature fusion
CN113807355A (en) Image semantic segmentation method based on coding and decoding structure
CN115063373A (en) Social network image tampering positioning method based on multi-scale feature intelligent perception
CN112163490A (en) Target detection method based on scene picture
CN115712740B (en) Method and system for multi-modal implication enhanced image text retrieval
CN117079139B (en) Remote sensing image target detection method and system based on multi-scale semantic features
CN114913493A (en) Lane line detection method based on deep learning
CN111563373B (en) Attribute-level emotion classification method for focused attribute-related text
CN112016406A (en) Video key frame extraction method based on full convolution network
CN116310305A (en) Coding and decoding structure semantic segmentation model based on tensor and second-order covariance attention mechanism
Lu et al. Mfnet: Multi-feature fusion network for real-time semantic segmentation in road scenes
CN111553391A (en) Feature fusion method in semantic segmentation technology
CN115147641A (en) Video classification method based on knowledge distillation and multi-mode fusion
CN109543519B (en) Depth segmentation guide network for object detection
CN114661951A (en) Video processing method and device, computer equipment and storage medium
CN114399661A (en) Instance awareness backbone network training method
CN114519107A (en) Knowledge graph fusion method combining entity relationship representation
CN111612803B (en) Vehicle image semantic segmentation method based on image definition
CN111539922B (en) Monocular depth estimation and surface normal vector estimation method based on multitask network
CN114911930A (en) Global and local complementary bidirectional attention video question-answering method and system
CN114782983A (en) Road scene pedestrian detection method based on improved feature pyramid and boundary loss
CN113255574A (en) Urban street semantic segmentation method and automatic driving method
Xiong et al. Vehicle detection algorithm based on lightweight YOLOX

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200818