CN113657388A - Image semantic segmentation method fusing image super-resolution reconstruction - Google Patents
Image semantic segmentation method fusing image super-resolution reconstruction Download PDFInfo
- Publication number
- CN113657388A CN113657388A CN202110780769.XA CN202110780769A CN113657388A CN 113657388 A CN113657388 A CN 113657388A CN 202110780769 A CN202110780769 A CN 202110780769A CN 113657388 A CN113657388 A CN 113657388A
- Authority
- CN
- China
- Prior art keywords
- image
- feature
- semantic segmentation
- resolution
- feature map
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 63
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000005070 sampling Methods 0.000 claims abstract description 29
- 230000004927 fusion Effects 0.000 claims abstract description 13
- 239000013598 vector Substances 0.000 claims abstract description 13
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 7
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 19
- 238000010586 diagram Methods 0.000 claims description 14
- 238000000605 extraction Methods 0.000 claims description 9
- 230000008569 process Effects 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 4
- 230000009466 transformation Effects 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000013508 migration Methods 0.000 claims description 2
- 230000005012 migration Effects 0.000 claims description 2
- 238000013135 deep learning Methods 0.000 description 6
- 238000013507 mapping Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000013526 transfer learning Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4053—Super resolution, i.e. output image resolution higher than sensor resolution
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Biology (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an image semantic segmentation method fusing image super-resolution reconstruction, which comprises the following steps: initializing parameters of a convolutional neural network based on a pre-trained ResNet-50 network model; preprocessing the data set, and inputting the preprocessed data set into a down-sampling coding stage of the initialized network model to extract image features; performing super-resolution reconstruction on the image by using the extracted image features to obtain a high-resolution feature map; performing feature fusion on the extracted image features and the reconstructed high-resolution feature map, inputting the feature fusion into a feature decoder of a network model, building a guided upsampling module by using the reconstructed high-resolution feature map, making an offset vector of each pixel point as an offset table, and performing upsampling operation by using the offset table as a guide to obtain an image semantic segmentation result; and defining a loss function and optimizing the network model. The invention can improve the precision of the semantic segmentation algorithm.
Description
Technical Field
The invention relates to the technical field of image processing, computer vision, deep learning and image semantic segmentation, in particular to an image semantic segmentation method fusing image super-resolution reconstruction.
Background
In the field of computer vision, semantic segmentation is one of the most important tasks. The picture is divided into different semantic regions according to different object classes in the image, and the semantic regions can be interpreted as specific classification classes, such as buildings, pedestrians, trees and the like. The robot environment understanding based on the scene is an important intersection point of a plurality of research fields such as computer vision, artificial intelligence and the like. The robot recognizes what types of objects exist in the work environment and where the objects are, which is a basic capability to achieve scene understanding. The semantic segmentation technology can enable the robot to quickly and accurately obtain the external scene information.
The semantic segmentation algorithm aims at predicting the label of the category corresponding to each pixel in an input image, and realizes the identification and segmentation precision of the pixel level, namely, the category of the object of each pixel position in the input image is classified and labeled, so that the region segmentation result of the position of the object of different categories in the image is obtained, and a large amount of visual information and inference information are provided.
Image super-resolution reconstruction techniques use a low resolution image to generate a corresponding estimate of a high resolution image. Image super-resolution reconstruction techniques involve increasing the size of small images while preventing degradation as much as possible. The image size is increased and the image is made to be clear, which is one of the inherent requirements of image processing, however, when the resolution of the existing image is fixed, the expansion from low resolution to high resolution is often accompanied by the problems of blurring and noise, so the super-resolution reconstruction of the image under the deep learning architecture is a hot point studied in recent years. Deep learning based methods can efficiently learn the mapping between a given image and its corresponding high quality image using either the internal similarity of the two or using a low resolution image dataset.
Therefore, the image super-resolution reconstruction based on the deep learning is taken as a branch of the main image semantic segmentation network to improve the semantic segmentation accuracy, and the method has important research significance.
Disclosure of Invention
The invention provides an image semantic segmentation method fusing image super-resolution reconstruction aiming at the challenges and problems of an image semantic segmentation algorithm, and the segmentation precision of the semantic segmentation algorithm can be improved.
To solve the above technical problem, an embodiment of the present invention provides the following solutions:
an image semantic segmentation method fusing image super-resolution reconstruction comprises the following steps:
step 2, preprocessing the data set, inputting the preprocessed data set into a down-sampling coding stage of the initialized network model for image feature extraction, wherein the down-sampling coding stage of the network model comprises a feature coder;
step 3, performing super-resolution reconstruction on the image by using the extracted image features to obtain a high-resolution feature map;
step 4, performing feature fusion on the image features extracted in the step 2 and the high-resolution feature map reconstructed in the step 3, inputting the feature fusion into a feature decoder of the network model, building a guided up-sampling module by using the high-resolution feature map reconstructed in the step 3, making an offset vector of each pixel point as an offset table, and performing up-sampling operation by using the offset table as a guide to obtain an image semantic segmentation result;
and 5, defining a loss function of the semantic segmentation network and a loss function of the image super-resolution reconstruction network, and optimizing the network model by using an Adam gradient descent method.
Preferably, in step 1, a migration learning method is used, and a pre-trained weight of the ImageNet data set is selected to initialize the network model.
Preferably, in step 2, the pictures in the data set are preprocessed, the resolution of the pictures is changed to 224 × 224, and the pictures are input into the initialized network model.
Preferably, in the step 2, the feature encoder is composed of a modified ResNet-50 network, which is called a core network; the core network comprises five modules, the first three modules are the same as ResNet-50, the fourth module uses the cavity convolution as a convolution kernel to construct a pyramid module output characteristic diagram based on the cavity convolution, and the fifth module is the same as the fourth module.
Preferably, the constructing a pyramid module based on hole convolution includes the following four steps:
firstly, d convolution kernels of 1 × M are used for reducing the input feature map of the dimension M to the dimension d;
secondly, performing convolution on the feature map output in the last step in parallel by using K convolution kernels with different expansion rates to obtain K feature maps with the same size;
thirdly, overlapping the K feature graphs with the same size obtained in the last step from the feature graph output by the minimum expansion convolution kernel step by step for splicing to obtain an output feature graph;
and fourthly, using one branch to perform global pooling on the feature map of 1 × M, taking the feature map as a global feature, and adding the global feature to the feature map obtained in the third step to obtain a final output feature map.
Preferably, in step 3, the image is subjected to super-resolution reconstruction by using an improved RED method, the improved RED feature extraction network is the same as the feature extraction network of the image semantic segmentation network, the weight of the networks is shared, and the rest of the networks are the same as the RED method.
Preferably, in the step 4, an input feature map G is giveniOutputting the characteristic mapResulting from a linear transformation, for the specific case of upsampling,is defined as:
is the original coordinate point of the coordinate system,is the target coordinate point, θ represents the up-sampling factor;
given ViOutput feature map and UnmInputting a characteristic diagram, and defining a guided upsampling module as follows:
piand q isiRepresenting two offsets of the sample coordinates for each grid element shifted in the x and y directions, respectively, as a function phiiThe output of the tutorial module, is defined as:
the guided upsampling module includes two steps:
the method comprises the following steps that firstly, a guidance offset table is predicted through a guidance module, the offset table is a two-dimensional grid guiding an upsampling process, the prediction process is realized through a function, and the output of the prediction process is a tensor with a specific dimensionality: h × W × C, where H and W denote the width and height of the high resolution output semantic graph, and C ═ 2 is the dimension containing the two offset coordinates;
secondly, performing bilinear interpolation upsampling by using the offset table as a guide, and adding each two-dimensional coordinate vector of the regular sampling grid with a corresponding two-dimensional vector in the guide offset table;
the feature decoder gradually increases the size of the feature map through the guided up-sampling module, restores the feature map to twice the size of the original image, and obtains the image semantic segmentation result through category color correspondence.
Preferably, in the step 5, the whole target loss function is composed of the conventional multi-class cross entropy loss L for semantic segmentationceAnd mean square error loss L for image super-resolution reconstructionmseComposition is carried out;
L=Lce+wLmse
SISR(Xi) And YiRepresenting super-resolution output and its corresponding background fidelity, yiAnd piIs the segmentation prediction probability and corresponding class of pixel i, N represents the number of pixels, w is set to 0.1, making these ranges of penalty values comparable, minimizing the entire objective penalty function end-to-end.
The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:
according to the invention, a pyramid module based on the cavity convolution is designed in the feature extraction module of the semantic segmentation network by utilizing the cavity convolution, so that the spatial information of the image can be effectively utilized. The rough segmentation graph is optimized by using an image resolution reconstruction method, an image super-resolution reconstruction network is used as a branch of a semantic segmentation network, and the reconstructed high-resolution feature graph and the image features extracted by the semantic segmentation network are used for feature fusion, so that the segmentation precision is improved. And adding an upward sampling guiding module into an upward sampling module of the network, and performing upward sampling operation to restore the image by using the offset vector of the reconstructed high-resolution characteristic diagram as a guide, thereby further improving the segmentation precision.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flowchart of an image semantic segmentation method for fusion image super-resolution reconstruction according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an image semantic segmentation network model for fused image super-resolution reconstruction according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an image semantic segmentation network feature extractor provided by an embodiment of the present invention;
fig. 4 is a schematic diagram of a pyramid module based on hole convolution according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The embodiment of the invention provides an image semantic segmentation method for fusion image super-resolution reconstruction, wherein FIG. 1 is a flow schematic diagram of the image semantic segmentation method, and FIG. 2 is an image semantic segmentation network model schematic diagram for fusion image super-resolution reconstruction proposed by the method. With reference to the accompanying drawings, the method comprises the following steps:
Specifically, the method of transfer learning is used for selecting the weight of the network trained by using the image classification data set with rich label types to initialize the network model. In this embodiment, a pre-trained weight of the ImageNet data set is selected to initialize the network model.
And 2, preprocessing the data set, and inputting the preprocessed data set into a down-sampling coding stage of the initialized network model to extract image features, wherein the down-sampling coding stage of the network model comprises a feature coder.
Specifically, the pictures in the training data set are preprocessed, the resolution of the pictures is changed to 224x224, and the pictures are input into the initialized network model.
Fig. 3 is a schematic diagram of a feature encoder in the network model. The feature encoder consists of an improved ResNet-50 network, which is called a core network; the core network comprises five modules, the first three modules are the same as ResNet-50, the fourth module uses the cavity convolution as a convolution kernel to construct a pyramid module output characteristic diagram based on the cavity convolution, and the fifth module is the same as the fourth module.
FIG. 4 is a schematic diagram of the pyramid module based on hole convolution, and the building of the pyramid module based on hole convolution includes the following four steps:
a first step of reducing an input feature map (feature map) of dimension M to dimension d using d convolution kernels of dimension 1 × M;
secondly, carrying out parallel convolution on the feature maps (feature maps) output in the last step by using K convolution kernels with different expansion rates to obtain K feature maps (feature maps) with the same size;
thirdly, overlapping and splicing the K feature maps (feature maps) with the same size obtained in the previous step by step from the feature map (feature map) output by the minimum expansion convolution kernel to obtain an output feature map (feature map);
and fourthly, using one branch to perform global pooling on the feature map (feature map) of 1 × M, taking the global feature as a global feature, and adding the global feature to the feature map (feature map) obtained in the third step to obtain a final output feature map (feature map).
And 3, performing super-resolution reconstruction on the image by using the extracted image features to obtain a high-resolution feature map.
Specifically, a Super-Resolution technique (Super-Resolution) based on deep learning directly learns an end-to-end mapping function from a Resolution image to a high Resolution image through a neural network. At present, more novel SR methods based on deep learning include SRCNN, DRCN, ESPCN, VESPCN, RED, DRRN, SRGAN and the like. In this embodiment, an improved RED-based method is adopted to perform super-resolution reconstruction on an image, an improved RED feature extraction network is the same as that of an image semantic segmentation network, weights are shared among networks, and other parts are the same as those of the RED method.
The structure of the RED network is symmetrical, with each convolutional layer having a corresponding anti-convolutional layer. The convolution layer is used to obtain an abstract content of the image and the deconvolution layer is used to enlarge the feature size and restore image details. After the convolution layer reduces the size of the input image, the convolution layer performs up-sampling to increase the size, so that the input and output sizes are the same. The convolution layer and the deconvolution layer corresponding to each group of mirror images have jumper connection structures, and the two parts of features with the same size (the features of the convolution layer to be input and the features of the corresponding deconvolution layer to be output) are added and then input into the next deconvolution layer. Such structure can let the back propagation signal directly transmit the bottom, has solved the gradient and has disappeared the problem, can transmit the detail of convolution layer for anti-convolution layer simultaneously, can resume cleaner picture. One line in the RED network is to connect the input image to the output of the last layer of deconvolution layer and the feature learned by the middle of RED and deconvolution layers is the residual between the target image and the low quality image.
And 4, performing feature fusion on the image features extracted in the step 2 and the high-resolution feature map reconstructed in the step 3, inputting the feature fusion into a feature decoder of the network model, building a guided up-sampling module by using the high-resolution feature map reconstructed in the step 3, making an offset vector of each pixel point as an offset table, and performing up-sampling operation by using the offset table as a guide to obtain an image semantic segmentation result.
Specifically, the idea behind the guided upsampling module is to guide the upsampling operator through a guide table of offset vectors, enriching the upsampling operation by introducing a learnable transformation of semantic mappings, which guides the sampling to the correct semantic class. Typically the decoder generates the output segmentation map by using a parameter-free operation such as bilinear or nearest neighbor upsampling. Bilinear or nearest neighbor upsampling is performed by superimposing a regular grid on the input feature map to perform the upscaling.
In this step, an input feature map G is giveniOutputting the characteristic mapResulting from a linear transformation, for the specific case of upsampling,is defined as:
is the original coordinate point of the coordinate system,is the target coordinate point, θ represents the up-sampling factor;
given ViOutput feature map and UnmInputting a characteristic diagram, and defining a guided upsampling module as follows:
piand q isiRepresenting two offsets of the sample coordinates for each grid element shifted in the x and y directions, respectively, as a function phiiThe output of the tutorial module, is defined as:
the guided upsampling module includes two steps:
the method comprises the following steps that firstly, a guidance offset table is predicted through a guidance module, the offset table is a two-dimensional grid guiding an upsampling process, the prediction process is realized through a function, and the output of the prediction process is a tensor with a specific dimensionality: h × W × C, where H and W denote the width and height of the high resolution output semantic graph, and C ═ 2 is the dimension containing the two offset coordinates;
and secondly, performing bilinear interpolation upsampling by using the offset table as a guide, and adding each two-dimensional coordinate vector of the regular sampling grid to a corresponding two-dimensional vector in the guide offset table.
The feature decoder gradually increases the size of the feature map through the guided up-sampling module, restores the feature map to twice the size of the original image, and obtains the image semantic segmentation result through category color correspondence.
And 5, defining a loss function of the semantic segmentation network and a loss function of the image super-resolution reconstruction network, and optimizing the network model by using an Adam gradient descent method.
In particular, the entire target loss function is lost by the conventional multi-class cross entropy for semantic segmentation, LceAnd mean square error loss L for image super-resolution reconstructionmseComposition is carried out;
L=Lce+wLmse
SISR(Xi) And YiRepresenting super-resolution output and method thereofCorresponding background truth, yiAnd piIs the segmentation prediction probability and corresponding class of pixel i, N represents the number of pixels, w is set to 0.1, making these ranges of penalty values comparable, minimizing the entire objective penalty function end-to-end.
In summary, in the embodiment of the present invention, a pyramid module based on the hole convolution is designed in the feature extraction module of the semantic segmentation network by using the hole convolution, so that spatial information of an image can be effectively utilized. The rough segmentation graph is optimized by using an image resolution reconstruction method, an image super-resolution reconstruction network is used as a branch of a semantic segmentation network, and the reconstructed high-resolution feature graph and the image features extracted by the semantic segmentation network are used for feature fusion, so that the segmentation precision is improved. And adding a guide up-sampling module in an up-sampling module of the network, and performing semantic segmentation up-sampling operation by using the offset vector of the reconstructed high-resolution feature map as a guide to restore the image, thereby further improving the segmentation precision.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.
Claims (8)
1. An image semantic segmentation method fusing image super-resolution reconstruction is characterized by comprising the following steps:
step 1, initializing parameters of a convolutional neural network based on a pre-trained ResNet-50 network model, and setting training parameters of the convolutional neural network;
step 2, preprocessing the data set, inputting the preprocessed data set into a down-sampling coding stage of the initialized network model for image feature extraction, wherein the down-sampling coding stage of the network model comprises a feature coder;
step 3, performing super-resolution reconstruction on the image by using the extracted image features to obtain a high-resolution feature map;
step 4, performing feature fusion on the image features extracted in the step 2 and the high-resolution feature map reconstructed in the step 3, inputting the feature fusion into a feature decoder of the network model, building a guided up-sampling module by using the high-resolution feature map reconstructed in the step 3, making an offset vector of each pixel point as an offset table, and performing up-sampling operation by using the offset table as a guide to obtain an image semantic segmentation result;
and 5, defining a loss function of the semantic segmentation network and a loss function of the image super-resolution reconstruction network, and optimizing the network model by using an Adam gradient descent method.
2. The image semantic segmentation method according to claim 1, wherein in the step 1, a migration learning method is used, and a pre-trained weight of an ImageNet data set is selected for initializing the network model.
3. The image semantic segmentation method according to claim 1, wherein in the step 2, the pictures in the data set are preprocessed, the resolution of the pictures is changed to 224x224, and the preprocessed pictures are input into the initialized network model.
4. The image semantic segmentation method according to claim 1, wherein in the step 2, the feature encoder is composed of a modified ResNet-50 network, which is called a core network; the core network comprises five modules, the first three modules are the same as ResNet-50, the fourth module uses the cavity convolution as a convolution kernel to construct a pyramid module output characteristic diagram based on the cavity convolution, and the fifth module is the same as the fourth module.
5. The image semantic segmentation method according to claim 4, wherein the constructing a pyramid module based on the hole convolution comprises the following four steps:
firstly, d convolution kernels of 1 × M are used for reducing the input feature map of the dimension M to the dimension d;
secondly, performing convolution on the feature map output in the last step in parallel by using K convolution kernels with different expansion rates to obtain K feature maps with the same size;
thirdly, overlapping the K feature graphs with the same size obtained in the last step from the feature graph output by the minimum expansion convolution kernel step by step for splicing to obtain an output feature graph;
and fourthly, using one branch to perform global pooling on the feature map of 1 × M, taking the feature map as a global feature, and adding the global feature to the feature map obtained in the third step to obtain a final output feature map.
6. The image semantic segmentation method according to claim 1, wherein in the step 3, the image is super-resolution reconstructed by using an improved RED method, the improved RED feature extraction network is the same as the feature extraction network of the image semantic segmentation network, the weight of the networks is shared, and the rest of the networks are the same as the RED method.
7. The image semantic segmentation method according to claim 1, characterized in that in the step 4, an input feature map G is giveniOutputting the characteristic mapResulting from a linear transformation, for the specific case of upsampling,is defined as:
is the original coordinate point of the coordinate system,is a target coordinate point, theta denotesAn upsampling factor;
given ViOutput feature map and UnmInputting a characteristic diagram, and defining a guided upsampling module as follows:
piand q isiRepresenting two offsets of the sample coordinates for each grid element shifted in the x and y directions, respectively, as a function phiiThe output of the tutorial module, is defined as:
the guided upsampling module includes two steps:
the method comprises the following steps that firstly, a guidance offset table is predicted through a guidance module, the offset table is a two-dimensional grid guiding an upsampling process, the prediction process is realized through a function, and the output of the prediction process is a tensor with a specific dimensionality: h × W × C, where H and W denote the width and height of the high resolution output semantic graph, and C ═ 2 is the dimension containing the two offset coordinates;
secondly, performing bilinear interpolation upsampling by using the offset table as a guide, and adding each two-dimensional coordinate vector of the regular sampling grid with a corresponding two-dimensional vector in the guide offset table;
the feature decoder gradually increases the size of the feature map through the guided up-sampling module, restores the feature map to twice the size of the original image, and obtains the image semantic segmentation result through category color correspondence.
8. The image semantic segmentation method according to claim 1, wherein in the step 5, the whole objective loss function is composed of conventional multi-class cross entropy loss L for semantic segmentationceAnd mean square error loss L for image super-resolution reconstructionmseComposition is carried out;
L=Lce+wLmse
SISR(Xi) And YiRepresenting super-resolution output and its corresponding background fidelity, yiAnd piIs the segmentation prediction probability and corresponding class of pixel i, N represents the number of pixels, w is set to 0.1, making these ranges of penalty values comparable, minimizing the entire objective penalty function end-to-end.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110780769.XA CN113657388B (en) | 2021-07-09 | 2021-07-09 | Image semantic segmentation method for super-resolution reconstruction of fused image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110780769.XA CN113657388B (en) | 2021-07-09 | 2021-07-09 | Image semantic segmentation method for super-resolution reconstruction of fused image |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113657388A true CN113657388A (en) | 2021-11-16 |
CN113657388B CN113657388B (en) | 2023-10-31 |
Family
ID=78477218
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110780769.XA Active CN113657388B (en) | 2021-07-09 | 2021-07-09 | Image semantic segmentation method for super-resolution reconstruction of fused image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113657388B (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114426069A (en) * | 2021-12-14 | 2022-05-03 | 哈尔滨理工大学 | Indoor rescue vehicle based on real-time semantic segmentation and image semantic segmentation method |
CN114913072A (en) * | 2022-05-16 | 2022-08-16 | 中国第一汽车股份有限公司 | Image processing method and device, storage medium and processor |
CN115206331A (en) * | 2022-06-13 | 2022-10-18 | 华南理工大学 | Voice super-resolution method based on tapered residual dense network |
CN115239564A (en) * | 2022-08-18 | 2022-10-25 | 中国矿业大学 | Mine image super-resolution reconstruction method combining semantic information |
CN115810139A (en) * | 2022-12-16 | 2023-03-17 | 西北民族大学 | Target area identification method and system of SPECT image |
CN116309274A (en) * | 2022-12-12 | 2023-06-23 | 湖南红普创新科技发展有限公司 | Method and device for detecting small target in image, computer equipment and storage medium |
CN116416261A (en) * | 2023-06-09 | 2023-07-11 | 南京航空航天大学 | CT image super-resolution segmentation method assisted by super-resolution reconstruction |
CN116453104A (en) * | 2023-06-15 | 2023-07-18 | 安徽容知日新科技股份有限公司 | Liquid level identification method, liquid level identification device, electronic equipment and computer readable storage medium |
CN117745746A (en) * | 2024-02-19 | 2024-03-22 | 中国人民解放军总医院第四医学中心 | Image segmentation method and device based on deformable nnUNet |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108268870A (en) * | 2018-01-29 | 2018-07-10 | 重庆理工大学 | Multi-scale feature fusion ultrasonoscopy semantic segmentation method based on confrontation study |
CN108876793A (en) * | 2018-04-13 | 2018-11-23 | 北京迈格威科技有限公司 | Semantic segmentation methods, devices and systems and storage medium |
CN109191392A (en) * | 2018-08-09 | 2019-01-11 | 复旦大学 | A kind of image super-resolution reconstructing method of semantic segmentation driving |
CN110136141A (en) * | 2019-04-24 | 2019-08-16 | 佛山科学技术学院 | A kind of image, semantic dividing method and device towards complex environment |
CN110136062A (en) * | 2019-05-10 | 2019-08-16 | 武汉大学 | A kind of super resolution ratio reconstruction method of combination semantic segmentation |
CN110210485A (en) * | 2019-05-13 | 2019-09-06 | 常熟理工学院 | The image, semantic dividing method of Fusion Features is instructed based on attention mechanism |
KR20190119261A (en) * | 2018-04-12 | 2019-10-22 | 가천대학교 산학협력단 | Apparatus and method for segmenting of semantic image using fully convolutional neural network based on multi scale image and multi scale dilated convolution |
CN110689061A (en) * | 2019-09-19 | 2020-01-14 | 深动科技(北京)有限公司 | Image processing method, device and system based on alignment feature pyramid network |
CN111259905A (en) * | 2020-01-17 | 2020-06-09 | 山西大学 | Feature fusion remote sensing image semantic segmentation method based on downsampling |
CN111401517A (en) * | 2020-02-21 | 2020-07-10 | 华为技术有限公司 | Method and device for searching perception network structure |
US20200273192A1 (en) * | 2019-02-26 | 2020-08-27 | Baidu Usa Llc | Systems and methods for depth estimation using convolutional spatial propagation networks |
CN111709882A (en) * | 2020-08-06 | 2020-09-25 | 南京理工大学 | Super-resolution fusion calculation method based on sub-pixel convolution and feature segmentation |
CN111915627A (en) * | 2020-08-20 | 2020-11-10 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Semantic segmentation method, network, device and computer storage medium |
CN112132834A (en) * | 2020-09-18 | 2020-12-25 | 中山大学 | Ventricular image segmentation method, system, device and storage medium |
CN112396607A (en) * | 2020-11-18 | 2021-02-23 | 北京工商大学 | Streetscape image semantic segmentation method for deformable convolution fusion enhancement |
-
2021
- 2021-07-09 CN CN202110780769.XA patent/CN113657388B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108268870A (en) * | 2018-01-29 | 2018-07-10 | 重庆理工大学 | Multi-scale feature fusion ultrasonoscopy semantic segmentation method based on confrontation study |
KR20190119261A (en) * | 2018-04-12 | 2019-10-22 | 가천대학교 산학협력단 | Apparatus and method for segmenting of semantic image using fully convolutional neural network based on multi scale image and multi scale dilated convolution |
CN108876793A (en) * | 2018-04-13 | 2018-11-23 | 北京迈格威科技有限公司 | Semantic segmentation methods, devices and systems and storage medium |
CN109191392A (en) * | 2018-08-09 | 2019-01-11 | 复旦大学 | A kind of image super-resolution reconstructing method of semantic segmentation driving |
US20200273192A1 (en) * | 2019-02-26 | 2020-08-27 | Baidu Usa Llc | Systems and methods for depth estimation using convolutional spatial propagation networks |
CN110136141A (en) * | 2019-04-24 | 2019-08-16 | 佛山科学技术学院 | A kind of image, semantic dividing method and device towards complex environment |
CN110136062A (en) * | 2019-05-10 | 2019-08-16 | 武汉大学 | A kind of super resolution ratio reconstruction method of combination semantic segmentation |
CN110210485A (en) * | 2019-05-13 | 2019-09-06 | 常熟理工学院 | The image, semantic dividing method of Fusion Features is instructed based on attention mechanism |
CN110689061A (en) * | 2019-09-19 | 2020-01-14 | 深动科技(北京)有限公司 | Image processing method, device and system based on alignment feature pyramid network |
CN111259905A (en) * | 2020-01-17 | 2020-06-09 | 山西大学 | Feature fusion remote sensing image semantic segmentation method based on downsampling |
CN111401517A (en) * | 2020-02-21 | 2020-07-10 | 华为技术有限公司 | Method and device for searching perception network structure |
CN111709882A (en) * | 2020-08-06 | 2020-09-25 | 南京理工大学 | Super-resolution fusion calculation method based on sub-pixel convolution and feature segmentation |
CN111915627A (en) * | 2020-08-20 | 2020-11-10 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Semantic segmentation method, network, device and computer storage medium |
CN112132834A (en) * | 2020-09-18 | 2020-12-25 | 中山大学 | Ventricular image segmentation method, system, device and storage medium |
CN112396607A (en) * | 2020-11-18 | 2021-02-23 | 北京工商大学 | Streetscape image semantic segmentation method for deformable convolution fusion enhancement |
Non-Patent Citations (4)
Title |
---|
DAVIDE MAZZINI等: "Guided Upsampling Network for Real-Time Semantic Segmentation", ARXIV, pages 1 - 12 * |
ZHANPENG ZHANG等: "FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale Context Aggregation and Feature Space Super-resolution", 2020 IEEE INTERNATIONAL CONFERENCE ON ROBOTICS AND AUTOMATION (ICRA), pages 8411 - 8417 * |
李帅等: "基于下采样的特征融合遥感图像语义分割", 测试技术学报, vol. 34, no. 4, pages 331 - 337 * |
王恩德等: "基于神经网络的遥感图像语义分割方法", 光学学报, no. 12, pages 93 - 104 * |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114426069B (en) * | 2021-12-14 | 2023-08-25 | 哈尔滨理工大学 | Indoor rescue vehicle based on real-time semantic segmentation and image semantic segmentation method |
CN114426069A (en) * | 2021-12-14 | 2022-05-03 | 哈尔滨理工大学 | Indoor rescue vehicle based on real-time semantic segmentation and image semantic segmentation method |
CN114913072A (en) * | 2022-05-16 | 2022-08-16 | 中国第一汽车股份有限公司 | Image processing method and device, storage medium and processor |
CN115206331A (en) * | 2022-06-13 | 2022-10-18 | 华南理工大学 | Voice super-resolution method based on tapered residual dense network |
CN115206331B (en) * | 2022-06-13 | 2024-04-05 | 华南理工大学 | Voice super-resolution method based on conical residual dense network |
CN115239564A (en) * | 2022-08-18 | 2022-10-25 | 中国矿业大学 | Mine image super-resolution reconstruction method combining semantic information |
CN116309274A (en) * | 2022-12-12 | 2023-06-23 | 湖南红普创新科技发展有限公司 | Method and device for detecting small target in image, computer equipment and storage medium |
CN116309274B (en) * | 2022-12-12 | 2024-01-30 | 湖南红普创新科技发展有限公司 | Method and device for detecting small target in image, computer equipment and storage medium |
CN115810139B (en) * | 2022-12-16 | 2023-09-01 | 西北民族大学 | Target area identification method and system for SPECT image |
CN115810139A (en) * | 2022-12-16 | 2023-03-17 | 西北民族大学 | Target area identification method and system of SPECT image |
CN116416261B (en) * | 2023-06-09 | 2023-09-12 | 南京航空航天大学 | CT image super-resolution segmentation method assisted by super-resolution reconstruction |
CN116416261A (en) * | 2023-06-09 | 2023-07-11 | 南京航空航天大学 | CT image super-resolution segmentation method assisted by super-resolution reconstruction |
CN116453104A (en) * | 2023-06-15 | 2023-07-18 | 安徽容知日新科技股份有限公司 | Liquid level identification method, liquid level identification device, electronic equipment and computer readable storage medium |
CN116453104B (en) * | 2023-06-15 | 2023-09-08 | 安徽容知日新科技股份有限公司 | Liquid level identification method, liquid level identification device, electronic equipment and computer readable storage medium |
CN117745746A (en) * | 2024-02-19 | 2024-03-22 | 中国人民解放军总医院第四医学中心 | Image segmentation method and device based on deformable nnUNet |
Also Published As
Publication number | Publication date |
---|---|
CN113657388B (en) | 2023-10-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113657388B (en) | Image semantic segmentation method for super-resolution reconstruction of fused image | |
CN113469094B (en) | Surface coverage classification method based on multi-mode remote sensing data depth fusion | |
CN110335290B (en) | Twin candidate region generation network target tracking method based on attention mechanism | |
CN110111366B (en) | End-to-end optical flow estimation method based on multistage loss | |
CN108665496B (en) | End-to-end semantic instant positioning and mapping method based on deep learning | |
CN113033570B (en) | Image semantic segmentation method for improving void convolution and multilevel characteristic information fusion | |
CN115601549B (en) | River and lake remote sensing image segmentation method based on deformable convolution and self-attention model | |
CN114863573B (en) | Category-level 6D attitude estimation method based on monocular RGB-D image | |
CN112396607A (en) | Streetscape image semantic segmentation method for deformable convolution fusion enhancement | |
CN113888547A (en) | Non-supervision domain self-adaptive remote sensing road semantic segmentation method based on GAN network | |
CN113313732A (en) | Forward-looking scene depth estimation method based on self-supervision learning | |
CN114048822A (en) | Attention mechanism feature fusion segmentation method for image | |
CN110633706B (en) | Semantic segmentation method based on pyramid network | |
CN113554032A (en) | Remote sensing image segmentation method based on multi-path parallel network of high perception | |
CN114638842B (en) | Medical image segmentation method based on MLP | |
CN116563682A (en) | Attention scheme and strip convolution semantic line detection method based on depth Hough network | |
Huang et al. | Learning optical flow with R-CNN for visual odometry | |
CN117237623B (en) | Semantic segmentation method and system for remote sensing image of unmanned aerial vehicle | |
CN114663880A (en) | Three-dimensional target detection method based on multi-level cross-modal self-attention mechanism | |
CN117011515A (en) | Interactive image segmentation model based on attention mechanism and segmentation method thereof | |
CN116485867A (en) | Structured scene depth estimation method for automatic driving | |
CN116596966A (en) | Segmentation and tracking method based on attention and feature fusion | |
CN116721206A (en) | Real-time indoor scene vision synchronous positioning and mapping method | |
CN116485892A (en) | Six-degree-of-freedom pose estimation method for weak texture object | |
CN116228576A (en) | Image defogging method based on attention mechanism and feature enhancement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Xu Haitao Inventor after: Chen Huilin Inventor after: An Jianwei Inventor after: Lin Fuhong Inventor after: Zhou Xianwei Inventor before: Xu Haitao Inventor before: Chen Huilin Inventor before: An Jianwei Inventor before: Lin Fuhong Inventor before: Zhou Xianwei |
|
GR01 | Patent grant | ||
GR01 | Patent grant |