CN113657388B - Image semantic segmentation method for super-resolution reconstruction of fused image - Google Patents
Image semantic segmentation method for super-resolution reconstruction of fused image Download PDFInfo
- Publication number
- CN113657388B CN113657388B CN202110780769.XA CN202110780769A CN113657388B CN 113657388 B CN113657388 B CN 113657388B CN 202110780769 A CN202110780769 A CN 202110780769A CN 113657388 B CN113657388 B CN 113657388B
- Authority
- CN
- China
- Prior art keywords
- image
- feature
- resolution
- semantic segmentation
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000011218 segmentation Effects 0.000 title claims abstract description 60
- 238000000034 method Methods 0.000 title claims abstract description 38
- 238000005070 sampling Methods 0.000 claims abstract description 24
- 239000013598 vector Substances 0.000 claims abstract description 13
- 238000000605 extraction Methods 0.000 claims abstract description 12
- 230000004927 fusion Effects 0.000 claims abstract description 8
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 7
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 238000010586 diagram Methods 0.000 claims description 19
- 230000006870 function Effects 0.000 claims description 19
- 230000008569 process Effects 0.000 claims description 9
- 238000012549 training Methods 0.000 claims description 4
- 230000009466 transformation Effects 0.000 claims description 4
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000004519 manufacturing process Methods 0.000 claims description 3
- 238000013526 transfer learning Methods 0.000 claims description 3
- 238000010276 construction Methods 0.000 claims description 2
- 238000011176 pooling Methods 0.000 claims description 2
- 238000013135 deep learning Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 230000008034 disappearance Effects 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000013341 scale-up Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses an image semantic segmentation method for super-resolution reconstruction of a fusion image, which comprises the following steps: initializing parameters of a convolutional neural network based on a pre-trained ResNet-50 network model; preprocessing a data set, inputting the preprocessed data set into a downsampling encoding stage of the initialized network model for image feature extraction; performing super-resolution reconstruction on the image by using the extracted image features to obtain a high-resolution feature map; the extracted image features and the reconstructed high-resolution feature images are subjected to feature fusion, input to a feature decoder of a network model, a guided up-sampling module is built by utilizing the reconstructed high-resolution feature images, offset vectors of all pixel points are manufactured to serve as offset tables, and up-sampling operation is carried out by taking the offset tables as guidance, so that an image semantic segmentation result is obtained; and defining a loss function, and optimizing the network model. The method and the device can improve the precision of the semantic segmentation algorithm.
Description
Technical Field
The invention relates to the technical fields of image processing, computer vision, deep learning and image semantic segmentation, in particular to an image semantic segmentation method for fused image super-resolution reconstruction.
Background
In the field of computer vision, semantic segmentation is one of the most important tasks. The picture is divided into different semantic regions according to different object categories in the image, which can be interpreted as specific classification categories, such as buildings, pedestrians, trees, etc. Scene-based robot environmental understanding is an important intersection of multiple research fields such as computer vision, artificial intelligence and the like. The robot recognizes what types of objects exist in the working environment and where the objects are located is a fundamental capability to achieve scene understanding. The semantic segmentation technology can enable the robot to quickly and accurately obtain external scene information.
The semantic segmentation algorithm aims at predicting labels of categories corresponding to each pixel in an input image, and achieves identification segmentation accuracy at a pixel level, namely, the categories of objects belonging to each pixel position in the input image are classified and marked, so that area segmentation results of the positions of the objects of different categories in the image are obtained, and a large amount of visual information and inference information are provided.
The image super-resolution reconstruction technique uses a low resolution image to produce an estimate of a corresponding high resolution image. Image super-resolution reconstruction techniques involve increasing the size of small images while preventing degradation of their quality as much as possible. The image size is enlarged and the image is clear, which is one of the internal demands of image processing, however, in the case of the existing image with fixed resolution, the expansion from low resolution to high resolution often accompanies the problems of blurring and noise, so the image super-resolution reconstruction under the deep learning architecture is a hot spot for research in recent years. The deep learning based approach makes it possible to learn effectively the mapping between a given image using its internal similarity or using a low resolution image dataset and its corresponding high quality image.
Therefore, the super-resolution reconstruction of the image based on the deep learning is used as a branch of the image semantic segmentation main network to improve the semantic segmentation accuracy, and the method has important research significance.
Disclosure of Invention
Aiming at the challenges and problems of the image semantic segmentation algorithm, the invention introduces an image super-resolution reconstruction technology, provides the image semantic segmentation method for fusing the image super-resolution reconstruction, and can improve the segmentation precision of the semantic segmentation algorithm.
In order to solve the technical problems, the embodiment of the invention provides the following scheme:
an image semantic segmentation method for fused image super-resolution reconstruction comprises the following steps:
step 1, initializing parameters of a convolutional neural network based on a pre-trained ResNet-50 network model, and setting training parameters of the convolutional neural network;
step 2, preprocessing a data set, inputting the preprocessed data set into a downsampling encoding stage of an initialized network model for image feature extraction, wherein the downsampling encoding stage of the network model comprises a feature encoder;
step 3, performing super-resolution reconstruction on the image by using the extracted image features to obtain a high-resolution feature map;
step 4, carrying out feature fusion on the image features extracted in the step 2 and the high-resolution feature images reconstructed in the step 3, inputting the image features and the high-resolution feature images into a feature decoder of the network model, constructing a guided up-sampling module by using the high-resolution feature images reconstructed in the step 3, manufacturing an offset vector of each pixel point as an offset table, and carrying out up-sampling operation by using the offset table as a guide to obtain an image semantic segmentation result;
and 5, defining a loss function of the semantic segmentation network and a loss function of the image super-resolution reconstruction network, and optimizing a network model by using an Adam gradient descent method.
Preferably, in the step 1, a method of transfer learning is used, and pre-trained weights of the ImageNet dataset are selected for initializing the network model.
Preferably, in step 2, the pictures in the dataset are preprocessed, the resolution of the pictures is changed to 224x224, and the pictures are input into the initialized network model.
Preferably, in the step 2, the feature encoder is composed of a modified ResNet-50 network, which is called a core network; the core network is provided with five modules, the first three modules are identical to ResNet-50, the fourth module uses hole convolution as a convolution kernel to construct a pyramid module output characteristic diagram based on the hole convolution, and the fifth module is identical to the fourth module.
Preferably, the constructing the pyramid module based on the hole convolution comprises the following four steps:
the first step, using d convolution kernels of 1 x M to reduce the input characteristic diagram of M dimension to d dimension;
secondly, the characteristic images output in the last step are checked by using K convolution graphs with different expansion rates to carry out convolution in parallel, so that K characteristic images with the same size are obtained;
thirdly, starting the K feature images with the same size obtained in the last step from the feature image output by the minimum expansion convolution kernel, and performing gradual superposition and splicing to obtain an output feature image;
and fourthly, carrying out global pooling on the characteristic map of 1 x M by using one branch, taking the characteristic map as global characteristic, and adding the global characteristic and the characteristic map obtained in the third step to obtain a final output characteristic map.
Preferably, in the step 3, the super-resolution reconstruction is performed on the image by adopting an improved RED method, the improved RED feature extraction network is the same as the feature extraction network of the image semantic segmentation network, the weight between the networks is shared, and other parts are the same as the RED method.
Preferably, in the step 4, an input feature map G is given i Outputting a characteristic diagramGenerated by linear transformation, ++for the up-sampling case specific>The definition is as follows:
is the original coordinate point, +.>Is a target coordinate point, θ represents an upsampling factor;
given V i Output feature map and U nm The guided upsampling module defines the input feature map as follows:
p i and q i Representing two offsets of the sampled coordinates of each grid element shifted in the x and y directions, respectively, which are functions phi i I.e. the output of the coaching module, is defined as:
the guided upsampling module comprises two steps:
the first step, a guidance offset table is predicted by a guidance module, the offset table is a two-dimensional grid guiding an up-sampling process, the prediction process is realized by a function, and the output of the prediction process is tensor with specific dimensions: h×w×c, where H and W represent the width and height of the high resolution output semantic graph, and c=2 is a dimension containing two offset coordinates;
a second step of performing bilinear interpolation up-sampling by using the offset table as a guide, each two-dimensional coordinate vector of the regular sampling grid being added to a corresponding two-dimensional vector in the guide offset table;
the feature decoder gradually increases the size of the feature map through the guided up-sampling module, restores the feature map to twice the size of the original image, and obtains the image semantic segmentation result through category color correspondence.
Preferably, in said step 5, the whole objective loss function is lost by conventional multi-class cross entropy for semantic segmentation L ce And mean square error loss L for image super-resolution reconstruction mse Composition;
L=L ce +wL mse
SISR(X i ) And Y i Representing super-resolution output and its corresponding background fidelity, y i And p i Is the segmentation prediction probability and corresponding class of pixel i, N represents the number of pixels, w is set to 0.1, making these loss value ranges comparable, minimizing the entire target loss function end-to-end.
The technical scheme provided by the embodiment of the invention has the beneficial effects that at least:
according to the invention, a pyramid module based on the cavity convolution is designed in the feature extraction module of the semantic segmentation network by utilizing the cavity convolution, so that the spatial information of the image can be effectively utilized. Optimizing the rough segmentation map by using an image resolution reconstruction method, taking an image super-resolution reconstruction network as a branch of a semantic segmentation network, and carrying out feature fusion by using the reconstructed high-resolution feature map and image features extracted by the semantic segmentation network to improve segmentation accuracy. And adding a guide upsampling module into an upsampling module of the network, and executing upsampling operation to recover the image by using the offset vector of the reconstructed high-resolution feature map as a guide, so as to further improve the segmentation accuracy.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of an image semantic segmentation method for fused image super-resolution reconstruction provided by an embodiment of the present invention;
FIG. 2 is a schematic diagram of an image semantic segmentation network model for fused image super-resolution reconstruction provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of an image semantic segmentation network feature extractor provided by an embodiment of the present invention;
fig. 4 is a schematic diagram of a pyramid module based on hole convolution according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings.
The embodiment of the invention provides an image semantic segmentation method for fused image super-resolution reconstruction, wherein fig. 1 is a flow diagram of the image semantic segmentation method, and fig. 2 is a diagram of an image semantic segmentation network model for fused image super-resolution reconstruction. The method comprises the following steps in combination with the accompanying drawings:
and step 1, initializing parameters of the convolutional neural network based on a pre-trained ResNet-50 network model, and setting training parameters of the convolutional neural network.
Specifically, a method of transfer learning is used, and the weight of the network trained by using the image classification data set with rich label types is selected to initialize the network model. In this embodiment, the network model is initialized by selecting weights pre-trained by the ImageNet dataset.
And 2, preprocessing the data set, inputting the preprocessed data set into a downsampling encoding stage of the initialized network model for image feature extraction, wherein the downsampling encoding stage of the network model comprises a feature encoder.
Specifically, the pictures in the training data set are preprocessed, the resolution of the pictures is changed to 224x224, and the pictures are input into the initialized network model.
Fig. 3 is a schematic diagram of a feature encoder in the network model. The feature encoder consists of an improved ResNet-50 network, which is called a core network; the core network is provided with five modules, the first three modules are identical to ResNet-50, the fourth module uses hole convolution as a convolution kernel to construct a pyramid module output characteristic diagram based on the hole convolution, and the fifth module is identical to the fourth module.
FIG. 4 is a schematic diagram of the pyramid module based on the hole convolution, and the construction of the pyramid module based on the hole convolution includes the following four steps:
the first step, using d convolution kernels of 1 x M, reducing an input feature map (feature map) of M dimensions to d dimensions;
secondly, the characteristic map (feature map) output in the last step is checked by using K convolution cores with different expansion rates to carry out convolution in parallel, so that K characteristic maps (feature map) with the same size are obtained;
thirdly, starting from the feature map (feature map) output by the minimum expansion convolution kernel, overlapping and splicing step by step to obtain an output feature map (feature map);
fourth, the feature map (feature map) of 1×1×m is globally pooled by using one branch, and added to the feature map (feature map) obtained in the third step as a global feature, to obtain a final output feature map (feature map).
And 3, performing super-resolution reconstruction on the image by using the extracted image features to obtain a high-resolution feature map.
Specifically, super-Resolution technology (Super-Resolution) based on deep learning directly learns an end-to-end mapping function of a Resolution image to a high Resolution image through a neural network. The current approach to SR based on deep learning is relatively new, including SRCNN, DRCN, ESPCN, VESPCN, RED, DRRN and SRGAN. In this embodiment, the super-resolution reconstruction is performed on the image by using the RED method based on the improvement, the improved RED feature extraction network is the same as the feature extraction network of the image semantic segmentation network, the weights between the networks are shared, and other parts are the same as the RED method.
The structure of the RED network is symmetrical, with each convolutional layer having a corresponding deconvolution layer. The convolution layer is used to obtain the abstract content of the image, and the deconvolution layer is used to scale up the feature size and recover the image details. After the size of the input image is reduced by the convolution layer, the up-sampling of the deconvolution layer is increased, so that the input and output sizes are the same. The convolution layer and the deconvolution layer corresponding to each group of mirror images are provided with jumper connection structures, and the features (the features to be input into the convolution layer and the features to be output from the corresponding deconvolution layer) with the same size are added and then input into the next deconvolution layer. The structure can enable the counter-propagating signal to be directly transmitted to the bottom layer, solves the problem of gradient disappearance, can transmit details of the convolution layer to the deconvolution layer, and can recover cleaner pictures. One line in the RED network is to connect the input image to the output of the subsequent and final deconvolution layer, and the characteristic of the convolution layer in the middle of RED and deconvolution layer learning is the residual between the target image and the low quality image.
And 4, carrying out feature fusion on the image features extracted in the step 2 and the high-resolution feature map reconstructed in the step 3, inputting the image features and the high-resolution feature map into a feature decoder of the network model, constructing a guided up-sampling module by using the high-resolution feature map reconstructed in the step 3, manufacturing an offset vector of each pixel point as an offset table, and carrying out up-sampling operation by using the offset table as a guide to obtain an image semantic segmentation result.
In particular, the idea behind the guided upsampling module is to guide the upsampling operators through a guiding table of offset vectors, which guides the samples to the correct semantic classes, enriching the upsampling operations by introducing a learnable transformation of the semantic map. Typically the decoder generates the output split map by using a parameter-free operation such as bilinear or nearest-neighbor upsampling. The bilinear or nearest neighbor up-sampling operation is performed by superimposing a regular grid on the input signature.
In this step, an input feature map G is given i Outputting a characteristic diagramGenerated by linear transformation, ++for the up-sampling case specific>The definition is as follows:
is the original coordinate point, +.>Is a target coordinate point, θ represents an upsampling factor;
given V i Output feature map and U nm The guided upsampling module defines the input feature map as follows:
p i and q i Representing two offsets of the sampled coordinates of each grid element shifted in the x and y directions, respectively, which are functions phi i I.e. the output of the coaching module, is defined as:
the guided upsampling module comprises two steps:
the first step, a guidance offset table is predicted by a guidance module, the offset table is a two-dimensional grid guiding an up-sampling process, the prediction process is realized by a function, and the output of the prediction process is tensor with specific dimensions: h×w×c, where H and W represent the width and height of the high resolution output semantic graph, and c=2 is a dimension containing two offset coordinates;
in a second step, bilinear interpolation upsampling is performed by using the offset table as a guide, and each two-dimensional coordinate vector of the regular sampling grid is added to the corresponding two-dimensional vector in the guide offset table.
The feature decoder gradually increases the size of the feature map through the guided up-sampling module, restores the feature map to twice the size of the original image, and obtains the image semantic segmentation result through category color correspondence.
And 5, defining a loss function of the semantic segmentation network and a loss function of the image super-resolution reconstruction network, and optimizing a network model by using an Adam gradient descent method.
Specifically, the entire target loss function is lost L by conventional multi-class cross entropy for semantic segmentation ce And mean square error loss L for image super-resolution reconstruction mse Composition;
L=L ce +wL mse
SISR(X i ) And Y i Representing super-resolution output and its corresponding background fidelity, y i And p i Is the segmentation prediction probability and corresponding class of pixel i, N represents the number of pixels, w is set to 0.1, making these loss value ranges comparable, minimizing the entire target loss function end-to-end.
In summary, in the embodiment of the present invention, a pyramid module based on hole convolution is designed in the feature extraction module of the semantic segmentation network by using hole convolution, so that spatial information of an image can be effectively utilized. Optimizing the rough segmentation map by using an image resolution reconstruction method, taking an image super-resolution reconstruction network as a branch of a semantic segmentation network, and carrying out feature fusion by using the reconstructed high-resolution feature map and image features extracted by the semantic segmentation network to improve segmentation accuracy. And adding a guide upsampling module into an upsampling module of the network, and executing semantic segmentation upsampling operation to recover images by using the offset vector of the reconstructed high-resolution feature map as a guide, so as to further improve the segmentation accuracy.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.
Claims (4)
1. The image semantic segmentation method for super-resolution reconstruction of the fusion image is characterized by comprising the following steps of:
step 1, initializing parameters of a convolutional neural network based on a pre-trained ResNet-50 network model, and setting training parameters of the convolutional neural network;
step 2, preprocessing a data set, inputting the preprocessed data set into a downsampling encoding stage of an initialized network model for image feature extraction, wherein the downsampling encoding stage of the network model comprises a feature encoder;
in the step 2, the feature encoder is composed of an improved ResNet-50 network, which is called a core network; the core network is provided with five modules, the first three modules are identical to ResNet-50, the fourth module uses hole convolution as a convolution kernel to construct a pyramid module output characteristic diagram based on the hole convolution, and the fifth module is identical to the fourth module;
the construction of the pyramid module based on the cavity convolution comprises the following four steps:
the first step, using d convolution kernels of 1 x M to reduce the input characteristic diagram of M dimension to d dimension;
secondly, the characteristic images output in the last step are checked by using K convolution graphs with different expansion rates to carry out convolution in parallel, so that K characteristic images with the same size are obtained;
thirdly, starting the K feature images with the same size obtained in the last step from the feature image output by the minimum expansion convolution kernel, and performing gradual superposition and splicing to obtain an output feature image;
fourth, using a branch to carry out global pooling on the characteristic diagram of 1 x M, taking the characteristic diagram as global characteristic, and adding the global characteristic with the characteristic diagram obtained in the third step to obtain a final output characteristic diagram;
step 3, performing super-resolution reconstruction on the image by using the extracted image features to obtain a high-resolution feature map;
in the step 3, super-resolution reconstruction is performed on the image by adopting an improved RED method, the improved RED feature extraction network is the same as the feature extraction network of the image semantic segmentation network, the weight between the networks is shared, and other parts are the same as the RED method;
step 4, carrying out feature fusion on the image features extracted in the step 2 and the high-resolution feature images reconstructed in the step 3, inputting the image features and the high-resolution feature images into a feature decoder of the network model, constructing a guided up-sampling module by using the high-resolution feature images reconstructed in the step 3, manufacturing an offset vector of each pixel point as an offset table, and carrying out up-sampling operation by using the offset table as a guide to obtain an image semantic segmentation result;
in the step 4, an input feature map G is given i Outputting a characteristic diagramGenerated by linear transformation, ++for the up-sampling case specific>The definition is as follows:
is the original coordinate point, +.>Is a target coordinate point, θ represents an upsampling factor;
given V i Output feature map and U nm The guided upsampling module defines the input feature map as follows:
p i and q i Representing two offsets of the sampled coordinates of each grid element shifted in the x and y directions, respectively, which are functions phi i I.e. the output of the coaching module, is defined as:
the guided upsampling module comprises two steps:
the first step, a guidance offset table is predicted by a guidance module, the offset table is a two-dimensional grid guiding an up-sampling process, the prediction process is realized by a function, and the output of the prediction process is tensor with specific dimensions: h×w×c, where H and W represent the width and height of the high resolution output semantic graph, and c=2 is a dimension containing two offset coordinates;
a second step of performing bilinear interpolation up-sampling by using the offset table as a guide, each two-dimensional coordinate vector of the regular sampling grid being added to a corresponding two-dimensional vector in the guide offset table;
the feature decoder gradually increases the size of the feature map through the guided up-sampling module, restores the feature map to twice the size of the original image, and obtains the image semantic segmentation result through category color correspondence;
and 5, defining a loss function of the semantic segmentation network and a loss function of the image super-resolution reconstruction network, and optimizing a network model by using an Adam gradient descent method.
2. The image semantic segmentation method according to claim 1, wherein in the step 1, a method of transfer learning is used, and pre-trained weights of an ImageNet dataset are selected for initializing a network model.
3. The image semantic segmentation method according to claim 1, wherein in the step 2, the pictures in the dataset are preprocessed, the resolution of the pictures is changed to 224x224, and the pictures are input into the initialized network model.
4. The image semantic segmentation method according to claim 1, wherein in the step 5, the entire objective loss function is lost by conventional multi-class cross entropy for semantic segmentation ce And mean square error loss L for image super-resolution reconstruction mse Composition;
L=L ce +wL mse
SISR(X i ) And Y i Representing super-resolution output and its corresponding background fidelity, y i And p' i Is the segmentation prediction probability and corresponding class of pixel i, N represents the number of pixels, w is set to 0.1, making these loss value ranges comparable, minimizing the entire target loss function end-to-end.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110780769.XA CN113657388B (en) | 2021-07-09 | 2021-07-09 | Image semantic segmentation method for super-resolution reconstruction of fused image |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110780769.XA CN113657388B (en) | 2021-07-09 | 2021-07-09 | Image semantic segmentation method for super-resolution reconstruction of fused image |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113657388A CN113657388A (en) | 2021-11-16 |
CN113657388B true CN113657388B (en) | 2023-10-31 |
Family
ID=78477218
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110780769.XA Active CN113657388B (en) | 2021-07-09 | 2021-07-09 | Image semantic segmentation method for super-resolution reconstruction of fused image |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113657388B (en) |
Families Citing this family (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114426069B (en) * | 2021-12-14 | 2023-08-25 | 哈尔滨理工大学 | Indoor rescue vehicle based on real-time semantic segmentation and image semantic segmentation method |
CN114913072A (en) * | 2022-05-16 | 2022-08-16 | 中国第一汽车股份有限公司 | Image processing method and device, storage medium and processor |
CN115206331B (en) * | 2022-06-13 | 2024-04-05 | 华南理工大学 | Voice super-resolution method based on conical residual dense network |
CN115239564B (en) * | 2022-08-18 | 2023-06-16 | 中国矿业大学 | Mine image super-resolution reconstruction method combining semantic information |
CN116309274B (en) * | 2022-12-12 | 2024-01-30 | 湖南红普创新科技发展有限公司 | Method and device for detecting small target in image, computer equipment and storage medium |
CN115810139B (en) * | 2022-12-16 | 2023-09-01 | 西北民族大学 | Target area identification method and system for SPECT image |
CN116416261B (en) * | 2023-06-09 | 2023-09-12 | 南京航空航天大学 | CT image super-resolution segmentation method assisted by super-resolution reconstruction |
CN116453104B (en) * | 2023-06-15 | 2023-09-08 | 安徽容知日新科技股份有限公司 | Liquid level identification method, liquid level identification device, electronic equipment and computer readable storage medium |
CN117745746B (en) * | 2024-02-19 | 2024-05-31 | 中国人民解放军总医院第四医学中心 | Image segmentation method and device based on deformable nnUNet |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108268870A (en) * | 2018-01-29 | 2018-07-10 | 重庆理工大学 | Multi-scale feature fusion ultrasonoscopy semantic segmentation method based on confrontation study |
CN108876793A (en) * | 2018-04-13 | 2018-11-23 | 北京迈格威科技有限公司 | Semantic segmentation methods, devices and systems and storage medium |
CN109191392A (en) * | 2018-08-09 | 2019-01-11 | 复旦大学 | A kind of image super-resolution reconstructing method of semantic segmentation driving |
CN110136062A (en) * | 2019-05-10 | 2019-08-16 | 武汉大学 | A kind of super resolution ratio reconstruction method of combination semantic segmentation |
CN110136141A (en) * | 2019-04-24 | 2019-08-16 | 佛山科学技术学院 | A kind of image, semantic dividing method and device towards complex environment |
CN110210485A (en) * | 2019-05-13 | 2019-09-06 | 常熟理工学院 | The image, semantic dividing method of Fusion Features is instructed based on attention mechanism |
KR20190119261A (en) * | 2018-04-12 | 2019-10-22 | 가천대학교 산학협력단 | Apparatus and method for segmenting of semantic image using fully convolutional neural network based on multi scale image and multi scale dilated convolution |
CN110689061A (en) * | 2019-09-19 | 2020-01-14 | 深动科技(北京)有限公司 | Image processing method, device and system based on alignment feature pyramid network |
CN111259905A (en) * | 2020-01-17 | 2020-06-09 | 山西大学 | Feature fusion remote sensing image semantic segmentation method based on downsampling |
CN111401517A (en) * | 2020-02-21 | 2020-07-10 | 华为技术有限公司 | Method and device for searching perception network structure |
CN111709882A (en) * | 2020-08-06 | 2020-09-25 | 南京理工大学 | Super-resolution fusion calculation method based on sub-pixel convolution and feature segmentation |
CN111915627A (en) * | 2020-08-20 | 2020-11-10 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Semantic segmentation method, network, device and computer storage medium |
CN112132834A (en) * | 2020-09-18 | 2020-12-25 | 中山大学 | Ventricular image segmentation method, system, device and storage medium |
CN112396607A (en) * | 2020-11-18 | 2021-02-23 | 北京工商大学 | Streetscape image semantic segmentation method for deformable convolution fusion enhancement |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10839543B2 (en) * | 2019-02-26 | 2020-11-17 | Baidu Usa Llc | Systems and methods for depth estimation using convolutional spatial propagation networks |
-
2021
- 2021-07-09 CN CN202110780769.XA patent/CN113657388B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108268870A (en) * | 2018-01-29 | 2018-07-10 | 重庆理工大学 | Multi-scale feature fusion ultrasonoscopy semantic segmentation method based on confrontation study |
KR20190119261A (en) * | 2018-04-12 | 2019-10-22 | 가천대학교 산학협력단 | Apparatus and method for segmenting of semantic image using fully convolutional neural network based on multi scale image and multi scale dilated convolution |
CN108876793A (en) * | 2018-04-13 | 2018-11-23 | 北京迈格威科技有限公司 | Semantic segmentation methods, devices and systems and storage medium |
CN109191392A (en) * | 2018-08-09 | 2019-01-11 | 复旦大学 | A kind of image super-resolution reconstructing method of semantic segmentation driving |
CN110136141A (en) * | 2019-04-24 | 2019-08-16 | 佛山科学技术学院 | A kind of image, semantic dividing method and device towards complex environment |
CN110136062A (en) * | 2019-05-10 | 2019-08-16 | 武汉大学 | A kind of super resolution ratio reconstruction method of combination semantic segmentation |
CN110210485A (en) * | 2019-05-13 | 2019-09-06 | 常熟理工学院 | The image, semantic dividing method of Fusion Features is instructed based on attention mechanism |
CN110689061A (en) * | 2019-09-19 | 2020-01-14 | 深动科技(北京)有限公司 | Image processing method, device and system based on alignment feature pyramid network |
CN111259905A (en) * | 2020-01-17 | 2020-06-09 | 山西大学 | Feature fusion remote sensing image semantic segmentation method based on downsampling |
CN111401517A (en) * | 2020-02-21 | 2020-07-10 | 华为技术有限公司 | Method and device for searching perception network structure |
CN111709882A (en) * | 2020-08-06 | 2020-09-25 | 南京理工大学 | Super-resolution fusion calculation method based on sub-pixel convolution and feature segmentation |
CN111915627A (en) * | 2020-08-20 | 2020-11-10 | 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) | Semantic segmentation method, network, device and computer storage medium |
CN112132834A (en) * | 2020-09-18 | 2020-12-25 | 中山大学 | Ventricular image segmentation method, system, device and storage medium |
CN112396607A (en) * | 2020-11-18 | 2021-02-23 | 北京工商大学 | Streetscape image semantic segmentation method for deformable convolution fusion enhancement |
Non-Patent Citations (4)
Title |
---|
Davide Mazzini等.Guided Upsampling Network for Real-Time Semantic Segmentation.arXiv.2018,第1-12页. * |
Zhanpeng Zhang等.FarSee-Net: Real-Time Semantic Segmentation by Efficient Multi-scale Context Aggregation and Feature Space Super-resolution.2020 IEEE International Conference on Robotics and Automation (ICRA).2020,第8411-8417页. * |
李帅等.基于下采样的特征融合遥感图像语义分割.测试技术学报.2020,第34卷(第4期),第331-337页. * |
王恩德等.基于神经网络的遥感图像语义分割方法.光学学报.2019,(第12期),第93-104页. * |
Also Published As
Publication number | Publication date |
---|---|
CN113657388A (en) | 2021-11-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN113657388B (en) | Image semantic segmentation method for super-resolution reconstruction of fused image | |
CN113469094B (en) | Surface coverage classification method based on multi-mode remote sensing data depth fusion | |
CN111862126B (en) | Non-cooperative target relative pose estimation method combining deep learning and geometric algorithm | |
CN110111366B (en) | End-to-end optical flow estimation method based on multistage loss | |
CN111950453B (en) | Random shape text recognition method based on selective attention mechanism | |
CN115601549B (en) | River and lake remote sensing image segmentation method based on deformable convolution and self-attention model | |
CN111861880B (en) | Image super-fusion method based on regional information enhancement and block self-attention | |
JP7439153B2 (en) | Lifted semantic graph embedding for omnidirectional location recognition | |
CN111738113A (en) | Road extraction method of high-resolution remote sensing image based on double-attention machine system and semantic constraint | |
CN114048822A (en) | Attention mechanism feature fusion segmentation method for image | |
CN109657538B (en) | Scene segmentation method and system based on context information guidance | |
CN116453121B (en) | Training method and device for lane line recognition model | |
CN116645592B (en) | Crack detection method based on image processing and storage medium | |
CN113313703A (en) | Unmanned aerial vehicle power transmission line inspection method based on deep learning image recognition | |
CN112258436A (en) | Training method and device of image processing model, image processing method and model | |
CN110310305A (en) | A kind of method for tracking target and device based on BSSD detection and Kalman filtering | |
CN111476133A (en) | Unmanned driving-oriented foreground and background codec network target extraction method | |
CN113554032A (en) | Remote sensing image segmentation method based on multi-path parallel network of high perception | |
CN110633706B (en) | Semantic segmentation method based on pyramid network | |
CN112819837A (en) | Semantic segmentation method based on multi-source heterogeneous remote sensing image | |
CN116596966A (en) | Segmentation and tracking method based on attention and feature fusion | |
CN114663880A (en) | Three-dimensional target detection method based on multi-level cross-modal self-attention mechanism | |
CN117237623B (en) | Semantic segmentation method and system for remote sensing image of unmanned aerial vehicle | |
CN114494786A (en) | Fine-grained image classification method based on multilayer coordination convolutional neural network | |
CN114549958B (en) | Night and camouflage target detection method based on context information perception mechanism |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB03 | Change of inventor or designer information | ||
CB03 | Change of inventor or designer information |
Inventor after: Xu Haitao Inventor after: Chen Huilin Inventor after: An Jianwei Inventor after: Lin Fuhong Inventor after: Zhou Xianwei Inventor before: Xu Haitao Inventor before: Chen Huilin Inventor before: An Jianwei Inventor before: Lin Fuhong Inventor before: Zhou Xianwei |
|
GR01 | Patent grant | ||
GR01 | Patent grant |