CN116229056A - Semantic segmentation method, device and equipment based on double-branch feature fusion - Google Patents

Semantic segmentation method, device and equipment based on double-branch feature fusion Download PDF

Info

Publication number
CN116229056A
CN116229056A CN202211623747.3A CN202211623747A CN116229056A CN 116229056 A CN116229056 A CN 116229056A CN 202211623747 A CN202211623747 A CN 202211623747A CN 116229056 A CN116229056 A CN 116229056A
Authority
CN
China
Prior art keywords
detail
semantic
branch
information
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211623747.3A
Other languages
Chinese (zh)
Inventor
周书仁
晏周荃
朱俣键
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changsha University of Science and Technology
Original Assignee
Changsha University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changsha University of Science and Technology filed Critical Changsha University of Science and Technology
Priority to CN202211623747.3A priority Critical patent/CN116229056A/en
Publication of CN116229056A publication Critical patent/CN116229056A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to a semantic segmentation method, a semantic segmentation device and semantic segmentation equipment based on double-branch feature fusion. The method comprises the following steps: the patent provides a two-channel semantic segmentation network, which is divided into two branches, namely a detail branch and a semantic branch, wherein the detail branch strengthens semantic information by utilizing edge characteristics, and the semantic branch extracts advanced characteristics, so that the problem of ignoring detailed information around object boundaries and small objects is solved. The space pyramid module is embedded at the tail end of the semantic branch to capture multi-scale features and further improve the semantic information extraction capability of the high-dimensional features. Notably, we have studied the way of feature fusion modules to fuse high-level semantic and detail information to enhance feature representation. Furthermore, the fusion module employs an attention mechanism for handling feature mappings from both branches to establish context dependencies of spatial and channel dimensions, which can help the network focus on more meaningful features.

Description

Semantic segmentation method, device and equipment based on double-branch feature fusion
The invention relates to the technical field of computer vision, in particular to a semantic segmentation method, a semantic segmentation device and semantic segmentation equipment based on double-branch feature fusion.
Background
Semantic segmentation is one of the key tasks in computer vision, whose purpose is to assign dense labels, i.e., a concrete to abstract process, to all pixels in an image. In recent years, the structure of convolutional neural networks has been innovated, and impressive effects are obtained. The full convolutional network proves that the end-to-end, pixel-to-pixel convolutional neural network exceeds the most advanced prior art, and comprises the steps of converting a full connection layer into a convolutional layer and up-sampling through deconvolution; the use of a jump connection structure allows semantic information to be combined with characterization information, resulting in accurate and fine segmentation. As the depth of the network increases, the receptive field of the full convolution grows slowly, and this limited receptive field cannot fully mimic the long distance relationship between pixels in the image. The Unet combines low resolution information (object type) and high resolution information (accurate segmentation and positioning), is perfectly suitable for medical image segmentation, and convolution is good in many recognition tasks, but is not applied to specific tasks all the time, the training scale is limited, the network scale is not guaranteed, and therefore the Unet is difficult to attach to importance. And the PSANT introduces an attention mechanism in the decoder, and the pixels at each position of the feature map are connected through the self-adaptive attention mechanism to promote information transfer and improve the segmentation effect in a complex scene. Deep labv3+ uses a void space pyramid pool module to extract multi-scale features of objects, explicitly preserving a high resolution representation. Nevertheless, the network merges only one layer. The Bisenet provides that a space path can encode rich space information and detail information, a feature fusion module is utilized to fuse the space information and the detail information, new thinking is provided, the space information is required to be paid attention to while the speed is improved, the design is also one-time thinking of a semantic segmentation backbone network, and the method can be applied to not only real-time semantic segmentation algorithms, but also other fields, and especially under the condition that space detail and context information are simultaneously required. But these methods tend to involve expensive computational costs. In addition, they typically take the original high resolution image as input, which further increases the computational effort, and previous network structures have easily ignored the detailed appearance around boundaries and small objects, thereby missing high resolution information.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a double-channel network which is divided into two branches: detail branches and semantic branches, which allow the network to obtain low-level detail information and high-level semantic information, respectively. Notably, the two branches in our network are not independently performed, we obtain detail edge features by extracting texture information from the input pictures, the detail branches are complements of semantic branches, the detail graphs are added to the semantic feature graphs to complement the detail features, and the two optimized features are combined into the final segmented representation.
Comprises the following steps of;
s1, constructing a network frame, wherein the network consists of two parts, namely a detail branch and a semantic branch; inputting a given image into a backbone network to extract semantic features, firstly, reducing the size of the input image by 16 times through an encoder;
s2, the extracted features in the backbone network are subjected to a cavity space convolution pooling pyramid, the core idea is to gather receptive fields with different scales, and the cavity space convolution pooling pyramid is also used for solving the problem of different scales of different segmentation targets; the method consists of 3X 3 cavity convolutions with three expansion rates of 3, 6 and 12 in sequence of 1X 1 convolution kernels;
s3, reducing the channel number by using 1 multiplied by 1 convolution, and then connecting with a BN, a ReLU activation function and a Dropout; 4 times up-sampling it using bilinear interpolation;
s4, extracting texture information from an input picture by a detail branch to obtain detail edge characteristics, wherein the purpose is to extract space detail information, then, the edge characteristics are utilized to strengthen semantic information, and the detail branch is added with a semantic feature map to supplement the detail characteristics as the supplement of the semantic branch;
s5, a fusion module (FFM) is used for fusing high-level semantic and detail information, the fusion module (FFM) is shown in FIG. 3, semantic information is introduced into the bottom-layer features, detail information is introduced into the high-level features, and subsequent fusion is enabled to be more effective, so that feature representation is enhanced;
s6, inserting the feature map part obtained through fusion into a detail header to generate a classification detail label, and guiding the detail features of the bottom learning space by using the classification detail label as the guide of the detail feature map; finally, the number of channels is reduced in the feature map part through 1X 1 convolution in sequence, and then a BN and ReLU activation function is connected;
s7, splicing the feature graphs of the S3 and the S5; then input to the decoder, which is shown in fig. 1 (b), i.e., subjected to 3×3 convolution, BN, reLU, dropout; up-sampling by 4 times to obtain a final result;
s8, combining and optimizing detail learning by Binary cross entropy and Dice Loss, focal Loss and CE Loss.
The invention provides a semantic segmentation method, device and equipment based on double-branch feature fusion. Compared with the prior art, the method has the following beneficial effects:
the actual structure network can respectively obtain low-level detail information and high-level semantic information, detail edge characteristics are obtained through texture information extracted from an input picture, detail branches are complemented with semantic branches, the detail information is obtained from a low-level network, the semantic information is obtained from a high-level network, and then fusion is carried out, so that the situation of missing certain information is avoided. The high-order semantic information has the effect of optimizing the low-order edge information, and then two optimized features are combined into the final segmentation representation. Second, we propose a fusion module (FFM) for fusing high-level semantic and detail information to enhance feature representation. Furthermore, the fusion module employs an attention mechanism for handling feature mappings from both branches to establish context dependencies of spatial and channel dimensions, which can help the network focus on more meaningful features. The frame detail extraction branch extraction feature map generates final prediction through a detail segmentation head so as to improve the performance of the frame detail extraction branch extraction feature map, and the cost is negligible. Meanwhile, the shallow layer information is guided to encode the space information by adopting the joint loss of Binary cross entropy and Dice, and the loss of the feedback iteration optimization model is carried out, so that the final loss of the model reaches the minimum value, and the accuracy and the robustness of the characteristics are improved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a general network architecture diagram of a semantic segmentation method model based on dual-branch feature fusion in the patent of the invention, which comprises a decoder and a detail head.
Fig. 2 is a block diagram of the spatial and channel attention mechanism of the present invention patent.
Fig. 3 is a block diagram of a feature fusion module in the present patent.
Fig. 4 is a device of the present invention.
Fig. 5 is an apparatus in the patent of the invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention will be described in detail with reference to the drawings and the detailed description. The semantic segmentation method, the semantic segmentation device and the semantic segmentation equipment for the double-branch feature fusion comprise the steps S1-S8:
s1, constructing a network frame, wherein the network consists of two parts, namely a detail branch and a semantic branch; inputting a given image into a backbone network to extract semantic features, firstly, reducing the size of the input image by 16 times through an encoder;
s2, the extracted features in the backbone network are subjected to a cavity space convolution pooling pyramid, the core idea is to gather receptive fields with different scales, and the cavity space convolution pooling pyramid is also used for solving the problem of different scales of different segmentation targets; the method consists of 3X 3 cavity convolutions with three expansion rates of 3, 6 and 12 in sequence of 1X 1 convolution kernels;
s3, reducing the channel number by using 1 multiplied by 1 convolution, and then connecting with a BN, a ReLU activation function and a Dropout; 4 times up-sampling it using bilinear interpolation;
s4, extracting texture information from an input picture by a detail branch to obtain detail edge characteristics, wherein the purpose is to extract space detail information, then, the edge characteristics are utilized to strengthen semantic information, and the detail branch is added with a semantic feature map to supplement the detail characteristics as the supplement of the semantic branch;
s5, a fusion module (FFM) is used for fusing high-level semantic and detail information, semantic information is introduced into the bottom-layer features, and detail information is introduced into the high-layer features, so that subsequent fusion is more effective, and feature representation is enhanced;
s6, inserting a detail header into the feature map part obtained through fusion to generate a classification detail label, wherein the detail header is shown in the figure 1 (c), and then guiding the detail features of the bottom learning space by using the classification detail label as the guide of the detail feature map; finally, the number of channels is reduced in the feature map part through 1X 1 convolution in sequence, and then a BN and ReLU activation function is connected;
s7, splicing the feature graphs of the S3 and the S5; then input to the decoder, which is shown in fig. 1 (b), i.e., subjected to 3×3 convolution, BN, reLU, dropout; up-sampling by 4 times to obtain a final result;
s8, combining and optimizing detail learning by Binary cross entropy and Dice Loss, focal Loss and CE Loss.
The respective steps are described in detail below.
In step S1, a network architecture is constructed, as shown in fig. 1, and a network frame is constructed, where the network is composed of two parts, namely a semantic branch and a detail branch. The method comprises the following steps:
the backbone network used in the feature extraction section was xception, VGGNet, resnet18.xception, VGGNet, resnet18 appears in many classical network architectures, has been widely accepted and demonstrated, and we need to construct a lightweight network, so xception, VGGNet, resnet is used to feature extraction of images while demonstrating the effectiveness of our proposed method.
In step S2, features extracted from the backbone network are subjected to ASPP, and the ASPP is also proposed to solve the problem of different dimensions of different segmentation targets, where the steps specifically include:
s201, a core idea of a cavity space convolution pooling pyramid is to concentrate a multiscale receptive field, and the cavity space convolution pooling pyramid is composed of 1 piece of 1*1 convolution and three pieces of 3*3 cavity convolution with 3, 6 and 12 cavity rates in sequence.
And S202, adding the characteristic map obtained by the 1*1 convolution and the characteristics obtained by the 3*3 cavity convolution with the cavity rates of 3, 6 and 12 in sequence, wherein the obtained characteristic map is the output map of the cavity space convolution pooling pyramid.
In step S3, the number of channels is reduced by using 1X 1 convolution, and then a BN, a ReLU activation function and Dropout are connected; 4 times up-sampling it using bilinear interpolation;
in step S4, the detail branches extract texture information from the input picture to obtain detail edge features, so as to extract spatial detail information, and then use the edge features to strengthen semantic information, and the detail branches, as a supplement to the semantic branches, can be added with the semantic feature map to supplement the detail features. The method comprises the following specific steps:
in the detail branch of S401, firstly, the texture information of the picture is extracted, and learning robust texture representation is crucial to texture recognition. In this patent, one-step Sobel, laplace and local binary pattern methods are used.
S402, merging a plurality of convolution layers for texture representation, and extracting complementary relations between the convolution layers by utilizing multi-texture information.
In step S5, a fusion module (FFM) is configured to fuse high-level semantic and detail information, as shown in fig. 3, introduce semantic information into the bottom-level features, and introduce detail information into the high-level features, so that subsequent fusion is more effective, and the specific steps are as follows:
s501, attention can help the model to give different weights to each input part, more key and important information is extracted, so that the model can make more accurate judgment, and meanwhile, larger expenditure is not brought to calculation and storage of the model. For the shallow feature map S extracted by the backbone network, to further enrich the space details thereof, we weight it to obtain S' by using a space attention mechanism, the space attention mechanism is shown in fig. 2 (b), and the calculation formula is as follows:
Figure BDA0004003096000000081
s502, for a feature diagram D extracted by a detail branch, the attention weight is calculated by using the attention of a channel, and is multiplied by D to obtain D', so as to enhance the detail distinction between different channels, the attention mechanism of the channel is shown in fig. 2 (a), and the calculation formula is as follows:
Figure BDA0004003096000000082
s503, S 'and D' have the same channel number, and the fusion of the shallow layer characteristic diagram and the detail characteristic diagram can be realized by adding elements one by one. The S 'and the D' are added to further enhance the detail information contained in the shallow feature map, the fused feature map is used as the input of a backbone network, the feature map of the semantic branch and the detail feature are added to be used as the input of the semantic branch, and the whole calculation process of the module can be expressed by the following formula:
Si+1=S’i+D’i (3)
in step S6, inserting a detail header into the feature map part obtained by fusion to generate a two-class detail label, wherein the detail header is shown in FIG. 1 (c), and then guiding the detail features of the bottom learning space by using the two-class detail label as the guide of the detail feature map; finally, the number of channels is reduced by 1×1 convolution in the feature map part, and then a BN and ReLU activation function is connected, as shown in fig. 1 (a), the specific steps are as follows:
s601, we first generate a two-class detail tag from the semantically partitioned real tag by laplace convolution. The detail head is inserted into the shallow feature part to generate a classification detail label, the detail head is shown in fig. 1 (c), and then the classification detail label is used as the guide of a detail feature diagram to guide the bottom learning space detail feature. The two-class detail label graph with detail guidance may encode more spatial detail than low-level feature results.
S602, generating binary detail background labels from semantic segmentation background labels through a detail aggregation module. The operation may be implemented by a two-dimensional convolution kernel Laplacian convolution kernel and a 1 x 1 convolution. We generate detail feature maps with different step sizes using the laplacian convolution kernel shown in fig. 1 to obtain multi-scale detail information. We then upsample the detail feature map to the original size and fuse it with a trainable 1 x 1 convolution.
And S603, finally, converting the predicted detail into a final binary detail label with boundary and corner information by adopting a threshold value of 0.1. Detail prediction is a classical balancing problem since the number of detail pixels is much smaller than non-detail pixels. Since weighted cross entropy always leads to coarse results, we use Binary cross entropy and Dice to jointly optimize detail learning. The Dice measures the degree of overlap between the prediction graph and the real labels. Furthermore, it is insensitive to the number of foreground/background pixels, which means that it can alleviate the class imbalance problem.
Thus, for a predicted detail map having a height H and a width W, detail loss L detail The formula of (2) is as follows:
L_ detail( p_ d ,g_ d) =L _bce (p_ d ,g_ d )+L_dice(p_ d ,g_ d ) (4)
wherein p/u d ∈R H×W Representing prediction details and g/u d ∈R H×W Representing the corresponding real label, L\u bce Representing binary cross entropy loss, l\u dice Represents Dice, as follows:
Figure BDA0004003096000000091
where i denotes the ith pixel and e is the laplace smoothing term, we set e=1 to estimate the probability that no phenomenon has occurred. As shown in fig. 1, we use the detail header to generate a detail profile that directs shallow information to encode spatial information. The detail header includes a 3 x 3Conv and BN and ReLu followed by a 1 x 1 convolution to obtain the output detail. The detail head can effectively enhance the feature representation, and finally, the learned detail features are fused with the context features of the deep blocks of the decoder to perform segmentation prediction. But this branch is discarded during the inference phase. Therefore, the side information can easily improve the accuracy of the segmentation task without any inference cost.
In step S7, the feature maps of S3 and S5 are spliced; then input to the decoder, which is shown in fig. 1 (b), i.e., subjected to 3×3 convolution, BN, reLU, dropout; up-sampling by 4 times to obtain a final result;
in step S8, the detail learning is jointly optimized with Binary cross entropy and Dice Loss, focal Loss, CE Loss. Because the training process of the network model is a continuous loss optimizing process, the loss obtained at present is fed back to the network model for continuous iterative optimization so as to reduce the loss, thereby obtaining more robust features.
The invention improves the accuracy and the robustness of the model by improving the extraction mode of the characteristics. The actual structure network can respectively obtain low-level detail information and high-level semantic information, the detail edge characteristics are obtained through texture information extracted from the input picture, the detail branches are complemented with the semantic branches, the detail information is obtained from the low-level network, the semantic information is obtained from the high-level network, and then the semantic information is fused, so that the condition of missing certain information is avoided. The high-order semantic information has the effect of optimizing the low-order edge information, and then two optimized features are combined into the final segmentation representation. Second, we propose a fusion module (FFM) for fusing high-level semantic and detail information to enhance feature representation. Furthermore, the fusion module employs an attention mechanism for handling feature mappings from both branches to establish context dependencies of spatial and channel dimensions, which can help the network focus on more meaningful features. The frame detail extraction branch extraction feature map generates final prediction through a detail segmentation head so as to improve the performance of the frame detail extraction branch extraction feature map, and the cost is negligible. The method constructs a new and effective method for semantic segmentation, and provides a more efficient framework for semantic segmentation in practical application.
The invention also provides a device, as shown in fig. 4, comprising a semantic segmentation network training module for fusion of the double-branch features, and further used for inputting the fusion feature map into a decoder to finally obtain a sample prediction result.
The invention also proposes a computer device, as in fig. 5, comprising a processor, a memory, a network interface, a display and input means, said processor implementing the steps of the method described above when executing said computer program. Wherein the processor of the device is configured to provide computing and control capabilities. The network interface of the device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a semantic segmentation method based on dual-branch feature fusion. The display screen of the device can be a liquid crystal display screen or an electronic ink display screen, and the input device of the device can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer device, and can also be an external keyboard, a touch pad or a mouse and the like.
The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the invention, and all equivalent structural changes made by the description of the present invention and the accompanying drawings or direct/indirect application in other related technical fields are included in the scope of the invention.

Claims (9)

1. Semantic segmentation method, device and equipment based on double-branch feature fusion, and the method is characterized by comprising the following steps:
obtaining a high-resolution image, and labeling the image to obtain a training, verifying and testing sample;
constructing a network frame, wherein the network consists of two parts, namely a detail branch and a semantic branch, inputting a given image into a backbone network to extract semantic features, and firstly reducing the size of the input image by 16 times;
the method comprises the steps that extracted features in a backbone network are subjected to a cavity space convolution pooling pyramid, and the pyramid is formed by 3 multiplied by 3 cavity convolutions with three expansion rates of 3, 6 and 12 in sequence, wherein the expansion rates are 1 multiplied by 1 convolution kernel;
reducing the channel number by using 1×1 convolution, and then connecting with a BN, a ReLU activation function and a Dropout; 4 times up-sampling is carried out on the semantic branch by using bilinear interpolation to obtain a feature map of the semantic branch;
the detail branches are used for obtaining detail edge characteristics from texture information extracted from an input picture, the purpose is to extract space detail information, then the edge characteristics are utilized to strengthen semantic information, and the detail branches are added with a semantic feature map to supplement the detail characteristics as the supplement of semantic branches;
the fusion module (FFM) is used for fusing high-level semantic and detail information, semantic information is introduced into the bottom-layer features, and detail information is introduced into the high-layer features, so that subsequent fusion is more effective, and feature representation is enhanced;
inserting the feature map part obtained by fusion into a detail header to generate a classification detail label, and guiding the detail feature of the bottom learning space by using the classification detail label as the guide of the detail feature map; finally, the number of channels is reduced in the feature map part through 1X 1 convolution in sequence, and then a BN and ReLU activation function is connected;
combining the feature graphs of the fusion module and the semantic branches; then input into the decoder, i.e. subjected to 3 x 3 convolution, BN, reLU, dropout; up-sampling by 4 times to obtain a final result;
combining Binary cross entropy with the Dice Loss, the Focal Loss and the CE Loss to optimize detail learning;
training the network to obtain a trained algorithm model based on semantic segmentation of double-branch feature fusion; and obtaining an image to be tested, and inputting the image to be tested into a trained segmentation model to obtain a prediction result of the image.
2. The semantic segmentation method, device and equipment based on the dual-branch feature fusion according to claim 1, wherein the establishing an image dataset comprises:
collecting an image sample;
the collected image samples are marked by an image marking tool, and the result of semantic segmentation is that the images are changed into color blocks with certain semantic information. The semantic segmentation technology can identify the semantic category of each color block and label each pixel with the corresponding label;
and constructing a sample data set by using the marked picture sample, dividing the sample data set into a training data set, a verification set and a test data set, and preprocessing the training data set.
3. The semantic segmentation method, device and equipment based on the dual-branch feature fusion according to claim 1, wherein a backbone network used by the feature extraction part is Xception, VGGNet, resnet.
4. The semantic segmentation method, device and equipment based on the double-branch feature fusion according to claim 1, wherein the network consists of two parts, namely a semantic branch and a detail branch. The detail branches are used for obtaining detail edge characteristics from texture information extracted from an input picture, the purpose is to extract space detail information, then the edge characteristics are utilized to strengthen semantic information, and the detail branches are added with a semantic feature map to supplement the detail characteristics as the supplement of the semantic branches.
5. The semantic segmentation method, device and equipment based on the double-branch feature fusion according to claim 4, wherein in the detail branches, firstly, texture information of pictures is extracted, and robust texture representation is learned, wherein the texture representation comprises a gradient Sobel, laplacian and a local binary mode method. We fuse the convolution layers from multiple texture representations, extract the complementary relationship between them using multi-texture information, and then use the fusion module to fuse semantic information and detail information.
6. The semantic segmentation method, device and equipment based on the dual-branch feature fusion according to claim 5, wherein a fusion module (FFM) is used for fusing high-level semantic and detail information, semantic information is introduced into the bottom-level features, and detail information is introduced into the high-level features, so that subsequent fusion is more effective, and feature representation is enhanced. For the shallow feature map S extracted by the backbone network, in order to further enrich the space details thereof, a space attention mechanism is used for weighting the shallow feature map S to obtain S ', for the feature map D extracted by the detail branch, the attention weight is calculated by using the channel attention and is multiplied by D to obtain D', so as to enhance the detail distinction degree among different channels of the shallow feature map S, the S 'and the D' have the same channel number, and the fusion of the shallow feature map and the detail feature map can be realized by adding elements one by one. And inserting the fused feature map part into a detail header to generate a classification detail label, and guiding the detail features of the bottom learning space by using the classification detail label as the guide of the detail feature map.
7. The semantic segmentation method, device and equipment based on the double-branch feature fusion according to claim 6, wherein Binary cross entropy and Diceloss, focalloss are adopted in the training process to jointly optimize detail learning, the training process of the network model is a continuous loss optimizing process, and the currently obtained loss is fed back to the network model to perform continuous iterative optimization.
8. Semantic segmentation method, device and equipment based on double-branch feature fusion, and is characterized in that the device comprises the following steps:
obtaining a high-resolution image, and labeling the image to obtain a training, verifying and testing sample;
the semantic segmentation method based on double-branch feature fusion is constructed, the segmentation network comprises a backbone network, semantic branches, detail branches and a fusion module, a given image is input into the backbone network to extract semantic features, and firstly, the size of the input image is reduced by 16 times;
the method comprises the steps that extracted features in a backbone network are subjected to a cavity space convolution pooling pyramid, and the pyramid is formed by 3 multiplied by 3 cavity convolutions with three expansion rates of 3, 6 and 12 in sequence, wherein the expansion rates are 1 multiplied by 1 convolution kernel;
reducing the channel number by using 1×1 convolution, and then connecting with a BN, a ReLU activation function and a Dropout; 4 times up-sampling it using bilinear interpolation;
the detail branches are used for obtaining detail edge characteristics from texture information extracted from an input picture, the purpose is to extract space detail information, then the edge characteristics are utilized to strengthen semantic information, and the detail branches are added with a semantic feature map to supplement the detail characteristics as the supplement of semantic branches;
the fusion module (FFM) is used for fusing high-level semantic and detail information, semantic information is introduced into the bottom-layer features, and detail information is introduced into the high-layer features, so that subsequent fusion is more effective, and feature representation is enhanced;
inserting the feature map part obtained by fusion into a detail header to generate a classification detail label, and guiding the detail feature of the bottom learning space by using the classification detail label as the guide of the detail feature map; finally, the number of channels is reduced in the feature map part through 1X 1 convolution in sequence, and then a BN and ReLU activation function is connected;
combining the feature graphs of the fusion module and the semantic branches; then input into the decoder, i.e. subjected to 3 x 3 convolution, BN, reLU, dropout; up-sampling by 4 times to obtain a final result;
the detail learning is jointly optimized with Binary cross entropy and Dice Loss, focal Loss, CE Loss.
Training the network to obtain a trained algorithm model based on semantic segmentation of double-branch feature fusion; and obtaining an image to be tested, and inputting the image to be tested into a trained segmentation model to obtain a prediction result of the image.
9. An apparatus comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 8 when the computer program is executed.
CN202211623747.3A 2022-12-16 2022-12-16 Semantic segmentation method, device and equipment based on double-branch feature fusion Pending CN116229056A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211623747.3A CN116229056A (en) 2022-12-16 2022-12-16 Semantic segmentation method, device and equipment based on double-branch feature fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211623747.3A CN116229056A (en) 2022-12-16 2022-12-16 Semantic segmentation method, device and equipment based on double-branch feature fusion

Publications (1)

Publication Number Publication Date
CN116229056A true CN116229056A (en) 2023-06-06

Family

ID=86577546

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211623747.3A Pending CN116229056A (en) 2022-12-16 2022-12-16 Semantic segmentation method, device and equipment based on double-branch feature fusion

Country Status (1)

Country Link
CN (1) CN116229056A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116895023A (en) * 2023-09-11 2023-10-17 中国石油大学(华东) Method and system for recognizing mesoscale vortex based on multitask learning
CN116958556A (en) * 2023-08-01 2023-10-27 东莞理工学院 Dual-channel complementary spine image segmentation method for vertebral body and intervertebral disc segmentation
CN117115668A (en) * 2023-10-23 2023-11-24 安徽农业大学 Crop canopy phenotype information extraction method, electronic equipment and storage medium
CN117456191A (en) * 2023-12-15 2024-01-26 武汉纺织大学 Semantic segmentation method based on three-branch network structure under complex environment
CN117690107A (en) * 2023-12-15 2024-03-12 上海保隆汽车科技(武汉)有限公司 Lane boundary recognition method and device
CN117911908A (en) * 2024-03-20 2024-04-19 湖北经济学院 Enhancement processing method and system for aerial image of unmanned aerial vehicle

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116958556A (en) * 2023-08-01 2023-10-27 东莞理工学院 Dual-channel complementary spine image segmentation method for vertebral body and intervertebral disc segmentation
CN116958556B (en) * 2023-08-01 2024-03-19 东莞理工学院 Dual-channel complementary spine image segmentation method for vertebral body and intervertebral disc segmentation
CN116895023A (en) * 2023-09-11 2023-10-17 中国石油大学(华东) Method and system for recognizing mesoscale vortex based on multitask learning
CN116895023B (en) * 2023-09-11 2024-02-09 中国石油大学(华东) Method and system for recognizing mesoscale vortex based on multitask learning
CN117115668A (en) * 2023-10-23 2023-11-24 安徽农业大学 Crop canopy phenotype information extraction method, electronic equipment and storage medium
CN117115668B (en) * 2023-10-23 2024-01-26 安徽农业大学 Crop canopy phenotype information extraction method, electronic equipment and storage medium
CN117456191A (en) * 2023-12-15 2024-01-26 武汉纺织大学 Semantic segmentation method based on three-branch network structure under complex environment
CN117456191B (en) * 2023-12-15 2024-03-08 武汉纺织大学 Semantic segmentation method based on three-branch network structure under complex environment
CN117690107A (en) * 2023-12-15 2024-03-12 上海保隆汽车科技(武汉)有限公司 Lane boundary recognition method and device
CN117690107B (en) * 2023-12-15 2024-04-26 上海保隆汽车科技(武汉)有限公司 Lane boundary recognition method and device
CN117911908A (en) * 2024-03-20 2024-04-19 湖北经济学院 Enhancement processing method and system for aerial image of unmanned aerial vehicle
CN117911908B (en) * 2024-03-20 2024-05-28 湖北经济学院 Enhancement processing method and system for aerial image of unmanned aerial vehicle

Similar Documents

Publication Publication Date Title
CN107945204B (en) Pixel-level image matting method based on generation countermeasure network
CN116229056A (en) Semantic segmentation method, device and equipment based on double-branch feature fusion
CN110176027B (en) Video target tracking method, device, equipment and storage medium
Zhou et al. Salient object detection in stereoscopic 3D images using a deep convolutional residual autoencoder
Wang et al. Cliffnet for monocular depth estimation with hierarchical embedding loss
CN113158862B (en) Multitasking-based lightweight real-time face detection method
CN108399386A (en) Information extracting method in pie chart and device
CN110782420A (en) Small target feature representation enhancement method based on deep learning
CN113673338B (en) Automatic labeling method, system and medium for weak supervision of natural scene text image character pixels
Liu et al. RGB-D joint modelling with scene geometric information for indoor semantic segmentation
Zhao et al. Depth-distilled multi-focus image fusion
US10936938B2 (en) Method for visualizing neural network models
CN111739037B (en) Semantic segmentation method for indoor scene RGB-D image
CN112257665A (en) Image content recognition method, image recognition model training method, and medium
CN112036260A (en) Expression recognition method and system for multi-scale sub-block aggregation in natural environment
CN109815931A (en) A kind of method, apparatus, equipment and the storage medium of video object identification
CN113743417A (en) Semantic segmentation method and semantic segmentation device
CN113554032A (en) Remote sensing image segmentation method based on multi-path parallel network of high perception
Jia et al. Effective meta-attention dehazing networks for vision-based outdoor industrial systems
CN111179272B (en) Rapid semantic segmentation method for road scene
Yu et al. Unbiased multi-modality guidance for image inpainting
Liu et al. Dunhuang murals contour generation network based on convolution and self-attention fusion
Guo et al. Decoupling semantic and edge representations for building footprint extraction from remote sensing images
Han et al. LIANet: Layer interactive attention network for RGB-D salient object detection
Zong et al. A cascaded refined rgb-d salient object detection network based on the attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination