CN110136062B - Super-resolution reconstruction method combining semantic segmentation - Google Patents

Super-resolution reconstruction method combining semantic segmentation Download PDF

Info

Publication number
CN110136062B
CN110136062B CN201910389111.9A CN201910389111A CN110136062B CN 110136062 B CN110136062 B CN 110136062B CN 201910389111 A CN201910389111 A CN 201910389111A CN 110136062 B CN110136062 B CN 110136062B
Authority
CN
China
Prior art keywords
resolution
semantic
network
super
reconstruction
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201910389111.9A
Other languages
Chinese (zh)
Other versions
CN110136062A (en
Inventor
向炟
陈军
杨玉红
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201910389111.9A priority Critical patent/CN110136062B/en
Publication of CN110136062A publication Critical patent/CN110136062A/en
Application granted granted Critical
Publication of CN110136062B publication Critical patent/CN110136062B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds

Abstract

The invention provides an image super-resolution reconstruction method combined with semantic segmentation, which utilizes an intermediate result and a final result generated when a low-quality image is subjected to semantic segmentation to perform super-resolution reconstruction, and has a realistic effect when large-multiple super-resolution reconstruction is performed. Because the high-level semantic information of the image is used as the inherent information of the image and contains a large amount of class priors on the pixel level, the high-level semantic information can be used as constraint information in the super-resolution reconstruction process to improve the quality of the reconstruction result. The method combines the computer vision low-level problem of image super-resolution reconstruction and the image semantic segmentation as a high-level problem, utilizes various information generated after the image is subjected to semantic segmentation to constrain and enhance the super-resolution reconstruction process, solves the problem that the reconstruction of a low-resolution image lacks authenticity under the condition of a large zoom factor, and has higher improvement on subjective quality evaluation.

Description

Super-resolution reconstruction method combining semantic segmentation
Technical Field
The invention relates to the technical field of image processing, in particular to a method for reconstructing super-resolution of an image by utilizing semantic segmentation.
Background
The image super-resolution reconstruction means that various technical means are used for converting a low-resolution image into a high-resolution image, and more high-frequency information is recovered, so that the image has clearer texture and detail. Since the first proposal, the super-resolution reconstruction method of images has been developed for half a century, and many super-resolution reconstruction methods of images can be roughly classified into three categories according to their principles: interpolation-based methods, reconstruction-based methods, and learning-based methods.
The method based on interpolation links the super-resolution reconstruction problem with the image interpolation problem, and is the most direct method in super-resolution reconstruction. Common interpolation methods include a nearest neighbor interpolation method, a bilinear interpolation method, a bicubic interpolation method and the like. The core idea is that points in a target image are searched for points related to the points in a source image according to a scaling relationship, and then the pixel values of the target point are obtained through interpolation calculation according to the pixel values of the related points in the source image. The interpolation-based method has the advantages of simplicity, intuition, high running speed and relatively poor adaptability, is not easy to add prior information of an image, is easy to introduce extra noise, and causes the reconstructed image to lack details, generate the phenomena of fuzziness, sawtooth and the like.
The reconstruction-based method obtains the most extensive attention and research, and the method assumes that a high-resolution image is subjected to proper motion transformation, blurring and noise to obtain a low-resolution image, and converts the super-resolution reconstruction problem into the optimization problem of a cost function under a constraint condition. The key idea of the method is to extract key information in a low-resolution image by utilizing methods such as regularization and the like from a degradation model of the image, and to combine the priori knowledge of an unknown super-resolution image to constrain the generation of the super-resolution image. The method only needs a few local prior assumptions during reconstruction, the fuzzy or sawtooth effect generated by the interpolation method is relieved to a certain extent, but when the magnification factor is too large, the degradation model cannot well provide the prior knowledge required by reconstruction, and the reconstruction result lacks high-frequency information.
Learning-based methods are a hot direction for super-resolution algorithm research in recent years. The basic idea is to learn a combined system model through training a group of training sets simultaneously comprising high-resolution images and low-resolution images, and perform super-resolution reconstruction on similar low-resolution images by using the learned model to achieve the aim of improving the image resolution. The learning-based method fully utilizes the priori knowledge of the image, can recover more high-frequency information in the low-resolution image, and obtains a better reconstruction result than the other two methods. Among all learning-based methods, the super-resolution reconstruction method based on deep learning has achieved excellent performance in recent years.
Although the current single-image super-resolution reconstruction technology makes a breakthrough in accuracy and speed by means of deep learning, the effect is reduced when a more complex low-resolution image is processed. For example: when the processed low resolution image contains many objects and there is a large portion of overlap and occlusion between the objects, the existing method cannot divide the boundary between the overlapped and occluded objects well, resulting in insufficient texture details of the reconstruction result, and even reconstructing multiple overlapped objects into one.
Disclosure of Invention
In order to solve the problems, the invention provides a brand-new super-resolution reconstruction method combining semantic segmentation. Semantic segmentation is one of the basic tasks in computer vision, the purpose of which is to classify visual input into different semantically interpretable classes, that is, for an image, pixels in the image into different classes. Based on the characteristic that the pixels are classified by semantic segmentation, the super-resolution reconstruction method combined with the semantic segmentation can better process a low-resolution image with a plurality of overlapped and shielded objects.
Aiming at the defects of the prior art, the invention provides a method for performing super-resolution reconstruction on a low-resolution image, which comprises the following steps:
step 1, constructing a low-resolution semantic segmentation data set, wherein the low-resolution semantic segmentation data set comprises a low-resolution image and a corresponding semantic layout;
step 2, training a semantic segmentation network by using a low-resolution semantic segmentation data set;
step 3, constructing a data set for training the super-resolution reconstruction network, wherein the data set for training the super-resolution reconstruction network comprises a semantic layout chart and a semantic feature chart of a low-resolution image and a corresponding high-resolution image, and the semantic layout chart and the semantic feature chart of the low-resolution image are obtained by inputting the low-resolution image into the semantic segmentation network trained in the step 2;
step 4, taking the semantic layout map and the semantic feature map as input, taking a high-resolution image corresponding to the semantic layout map as a true value, and training a super-resolution reconstruction network to output a corresponding high-resolution reconstruction result according to the input semantic layout map;
and 5, inputting a low-resolution picture to be reconstructed into the semantic segmentation result obtained in the step 2 to obtain a semantic layout picture and a semantic feature picture, inputting the semantic layout picture and the semantic feature picture into the super-resolution reconstruction network obtained by training in the step 4, and finally obtaining a reconstructed high-resolution image.
Further, the low-resolution semantic segmentation data set in step 1 is obtained by down-sampling the high-resolution image and the semantic layout map in the normal semantic segmentation data set with the same scaling factor, and the obtained low-resolution image and the semantic layout map form the low-resolution semantic segmentation data set.
Further, the semantic segmentation network in step 2 is a full convolution network obtained by changing a full connection layer in VGG16 into a convolution layer, and the specific network structure is as follows: convolutional layer × 2+ pooling layer + convolutional layer × 3+ pooling layer + convolutional layer × 2+ deconvolution layer, wherein the convolutional core size of the convolutional layer is 3 × 3, and the pooling layer is the largest pooling layer.
Further, the weight of the full convolution network is initialized to the weight in the pre-trained VGG 16; the loss function optimized in the training is the sum of deviations of pixel predicted values of the last layer of the network; the specific parameters of the training are as follows: the training batch size was 20, using a momentum of 0.9 and a decay rate of 10-4The Adam algorithm is optimized, and the learning rate of the network is 10-4
Further, the super-resolution reconstruction network in step 4 is a cascaded reconstruction network composed of a series of cascaded reconstruction modules, the cascaded reconstruction modules operate at increasing resolutions, wherein each reconstruction module is composed of 3 network layers: the first layer is a feature fusion layer and is used for fusing an input semantic layout graph and a semantic feature graph with an output result of the previous layer; the latter two layers are convolution layers with a 3 x 3 convolution kernel, layer regularization and modified linear units as a means of reconstructing the fused features.
Further, the specific operation relationship among the reconstruction modules in the super-resolution reconstruction network is as follows,
the first reconstruction module takes the semantic layout map and the semantic feature map which are down-sampled to the current resolution as input, outputs a result of the current resolution, the result is regarded as the feature map which is merged and convolved, the later reconstruction module takes the result of the previous module, the semantic layout map and the semantic feature map which are down-sampled as input, outputs a new result, and the final result output by the reconstruction module is the super-resolution reconstruction result after a plurality of processes, wherein the mathematical description of the process is as follows:
Figure BDA0002055839000000031
Figure BDA0002055839000000032
wherein, OiRepresents the output of the ith reconstruction module, F represents the convolution and other operations in the reconstruction module, L represents the semantic layout, F represents the semantic feature map,
Figure BDA0002055839000000033
representing feature fusion.
Furthermore, the loss function used in training the super-resolution reconstruction network in step 4 is,
Figure BDA0002055839000000034
wherein, I is a high-resolution image representing a true value, f is a cascade reconstruction network to be trained, theta is a parameter set in f, L is an input semantic layout, phi is a trained visual perception network which is a VGG network, and phi is a parameter set in flRepresenting convolutional layers in a visual perception network, lambdalIn order to control the over-parameters of the weights, the values are adjusted along with the training process in the training process.
Further, when training the super-resolution reconstruction network, the specific settings are as follows: the integral iteration times are 200 generations; the learning rate of the model is 10-4And the learning rate is reduced to half after every 100 training generations; using a momentum of 0.9 and a decay rate of 10-4The Adam algorithm of (1) is optimized.
Compared with the prior art, the invention has the following advantages and positive effects:
because the high-level semantic information of the image is used as the inherent information of the image and contains a large amount of class priors on the pixel level, the high-level semantic information can be used as constraint information in the super-resolution reconstruction process to improve the quality of the reconstruction result. The method combines the computer vision low-level problem of image super-resolution reconstruction and the image semantic segmentation as a high-level problem, utilizes various information generated after the image is subjected to semantic segmentation to constrain and enhance the super-resolution reconstruction process, solves the problem that the reconstruction of a low-resolution image lacks authenticity under the condition of a large zoom factor, and has higher improvement on subjective quality evaluation.
Drawings
Fig. 1 is a network structure diagram of a full convolutional network in an embodiment of the present invention.
Fig. 2 is a block diagram of a cascaded reconstruction network according to an embodiment of the present invention.
Fig. 3 is an overall flow chart of the present invention.
FIG. 4 is a comparison graph of visual effects of the present invention and the comparison method, wherein (a) is Bicubic, (b) is SRCNN, (c) is SRDenNet, (d) is SRGAN, and (e) is the present invention.
Detailed Description
In order to facilitate the understanding and implementation of the present invention for those of ordinary skill in the art, the present invention is further described in detail with reference to the accompanying drawings and examples, it is to be understood that the embodiments described herein are merely illustrative and explanatory of the present invention and are not restrictive thereof.
The invention combines the characteristics of two computer vision tasks of image semantic segmentation and image super-resolution reconstruction, takes the characteristics of an image generated by semantic segmentation as prior information of super-resolution reconstruction, and provides an image super-resolution reconstruction method combining semantic segmentation. The general flow illustrated by this method is shown in fig. 3, this method can be implemented by computer software technology, and the embodiment specifically illustrates the flow of the present invention by taking training of the network as a main content, as follows:
step 1, constructing a low-resolution semantic segmentation data set, wherein the low-resolution semantic segmentation data set comprises a low-resolution picture and a corresponding semantic layout. The general semantic segmentation data set comprises a high-resolution picture and a semantic layout corresponding to the high-resolution picture, and the high-resolution picture and the semantic layout corresponding to the high-resolution picture are uniformly downsampled to obtain a low-resolution semantic segmentation data set.
In specific implementation, image processing software is used for reading all high-resolution pictures and corresponding semantic layout maps thereof, unifying the sizes of the pictures, and then performing down-sampling on all the high-resolution pictures by using a bicubic interpolation value with a scaling factor of 4. And then the corresponding semantic layout is down-sampled to the same resolution by using the same method. In this way, a low resolution semantic segmentation dataset is obtained that is composed of a low resolution image and a corresponding semantic layout.
And 2, training a semantic segmentation network by using the semantic segmentation data set with low resolution. The object processed by the common semantic segmentation network is a high-resolution picture, and the semantic segmentation network is trained by using the low-resolution data set obtained in the step 1, so that the semantic segmentation network can output a corresponding accurate semantic layout when a low-resolution image is input;
in the present embodiment, the semantic segmentation network is described by taking a Full Convolution Network (FCN) as an example. The full convolutional network is a convolutional neural network without a full connection layer, can predict and classify each pixel in a picture to obtain a semantic segmentation result, and has a network structure shown in fig. 1. In particular, the full convolutional network improvement in the present embodiment is obtained by changing the full link layer in the VGG16 to the convolutional layer from the VGG16 classification network. In a full convolutional network, note xijIs a data vector, y, of a certain layer (i, j) position of the networkijData vector, y, for the (i, j) position of the next network layerijFrom xijThis can be obtained by the following equation:
yij=fks({xsi+i,sj+j}0≤i,j≤k)
wherein k represents a convolution kernelS represents the step size of the convolution kernel or the down-sampling factor, si, sj represents the change of the position coordinate of the data vector of the original network layer (i, j) position after the convolution or pooling operation, which is related to s, and i, j represents the space displacement generated in the convolution or pooling process, usually caused by the zero padding operation. f. ofksThe type of network layer is determined, which may be a matrix multiplication for convolution or pooling, or a spatial maximization for maximal pooling, or an elemental nonlinear mapping of the activation function. For a full convolutional network, the functions implemented by each network layer can be summarized by the above formula.
The specific implementation of training the full convolutional network is as follows:
1. and constructing a network. In this embodiment, the full convolution network main body is composed of VGG16, and its network structure is: convolutional layer × 2+ pooling layer + convolutional layer × 3+ pooling layer + convolutional layer × 2+ deconvolution layer. The convolution kernel size of the convolution layer is 3 multiplied by 3, the pooling layer adopts maximum pooling, and the data size becomes smaller and the channels become more as the convolution layer goes deeper.
2. Weights in the network are initialized. Unlike the random initialization in the usual case, the weights in this embodiment are initialized to the weights in the pre-trained VGG 16.
3. And training the network. The loss function optimized in the training is the sum of the deviations of the pixel predicted values of the last layer of the network. In this embodiment, the specific parameters of the training are: the training batch size was 20, using a momentum of 0.9 and a decay rate of 10-4The Adam algorithm is optimized, and the learning rate of the network is 10-4
And 3, constructing a data set for training the super-resolution reconstruction network, wherein the data set for training the super-resolution reconstruction network comprises a semantic layout of low-resolution pictures and corresponding high-resolution pictures. And (3) inputting the low-resolution picture into the semantic segmentation network obtained in the step (2) to obtain a semantic segmentation result-semantic layout. In addition, an intermediate result and a semantic feature map generated in the semantic segmentation process can be obtained. The semantic layout map, the corresponding feature map and the corresponding high-resolution picture form a new data set to train the super-resolution reconstruction network. After the image is input into the semantic segmentation network, the final output result of the network is a semantic layout map, and the semantic feature map needs to be extracted from different network layers of the semantic segmentation network. In this embodiment, the selected semantic feature map is a feature in the convolutional layer before the pooling layer in the full convolutional network.
Step 4, taking the semantic layout map and the semantic feature map as input, taking a high-resolution image corresponding to the semantic layout map as a true value, and training a super-resolution reconstruction network to output a corresponding high-resolution reconstruction result according to the input semantic layout map;
in this embodiment, a cascaded reconstruction network composed of a series of cascaded reconstruction modules is selected as the super-resolution reconstruction network, and the structure of the cascaded reconstruction network is shown in fig. 2. Each reconstruction module operates at a different resolution, the resolution of the first module is set to 8 x 16, the resolution of the following modules is doubled in turn, and after 5 reconstruction modules, the final output resolution is 256 x 512. The first reconstruction module takes the semantic layout and the feature map which are down-sampled to the current resolution as input, and outputs a result of the current resolution, wherein the result can be regarded as the feature map after combination and convolution. The later reconstruction module takes the result of the former module, the semantic layout map after down sampling and the feature map as input and outputs a new result. After a plurality of processes, the final result output by the reconstruction module is the super-resolution reconstruction result. The mathematical description of this process is as follows:
Figure BDA0002055839000000061
Figure BDA0002055839000000062
wherein, OiRepresenting the output of the ith reconstruction module, F representing convolution and other operations in the reconstruction module, L representing the semantic layout, F representing the semantic featuresIn the figure, the figure shows that,
Figure BDA0002055839000000063
representing feature fusion.
Each reconstruction module consists of 3 network layers: the first layer is a feature fusion layer and is used for fusing an input semantic layout graph and a semantic feature graph with an output result of the previous layer; the latter two layers are convolution layers with a 3 x 3 convolution kernel, layer regularization and modified linear units as a means of reconstructing the fused features. Except for the last reconstruction module, the structure of each reconstruction module is the same, but the emphasis point of reconstruction of each module is different, because the input feature map contains different levels of information.
The cascade reconstruction network takes the semantic layout as a frame, and reconstructs details of an image by utilizing various information contained in a characteristic diagram, so that a loss function used in training is different from a general super-resolution reconstruction method, the reconstruction result is directly compared with an original high-definition image pixel by pixel unlike a conventional mean square error loss function, the cascade reconstruction network uses a loss function called perception loss, the aim is to compare the characteristic difference of the reconstruction result and a true value in the visual perception network, and the definition is as follows:
Figure BDA0002055839000000064
wherein I is a high-resolution image representing a true value, f is a cascade reconstruction network to be trained, theta is a parameter set in f, L is an input semantic layout, and lambdalThe value of the hyper-parameter for controlling the weight can be adjusted along with the training process in the training process, phi is a trained visual perception network, philThe method represents a convolutional layer in a visual perception network, the visual perception network is an image classification network trained by a large amount of data, has the capability of correctly classifying objects in an input image, and commonly uses a publicly released VGG series network which can be found on an official website. Through the training of the perception loss function, the cascade reconstruction network can reconstruct a more real reconstruction result.
The specific implementation scheme for training the super-resolution reconstruction network is as follows:
1. and constructing a network. The super-resolution reconstruction network in this embodiment is formed by cascading a series of reconstruction modules, and the structure of each cascaded module is consistent. The built reconstruction module consists of three layers of networks, wherein the first layer of network fuses input features, the second layer of network is a convolution layer, the size of a convolution kernel of the convolution layer is 3 multiplied by 3, and the convolution layer is provided with layer regularization and LRELU activation functions.
2. And initializing the weight in the network, and randomly initializing the weight in the network.
3. And training the network. The function to be optimized in the training is the perception loss, and the specific setting of the training is that the integral iteration times are 200 generations; the learning rate of the model is 10-4And the learning rate is reduced to half after every 100 training generations; using a momentum of 0.9 and a decay rate of 10-4The Adam algorithm of (1) is optimized.
And 5, performing super-resolution reconstruction by using the network obtained by training. The specific implementation scheme is as follows: and (3) inputting a low-resolution picture to be reconstructed into the semantic segmentation network obtained in the step (2) to obtain a semantic layout map and a semantic feature map of the low-resolution picture, inputting the semantic layout map and the semantic feature map into the super-resolution reconstruction network obtained by training in the step (4), and finally obtaining a reconstructed high-resolution image.
In order to verify the technical effect of the invention, a City landscape data set of Cityscapes is used for verification. The cityscaps dataset had 2975 high resolution images and corresponding fine semantic maps. 1000 of the 2975 images were used as training semantic segmentation networks, and the rest 1975 images were used to train super-resolution reconstruction networks. Examples of methods for comparison include Bicubic (Bicubic), super-resolution convolutional neural network SRCNN (c.dong, c.c.loy, k.he, and x.tang. Image super-resolution using device convolutional network. IEEE Transactions on pattern Analysis and Machine Analysis, 38(2): 295. 307,2016.), dense super-resolution neural network SRDenNet (t.tong, g.li, x.liu, and q.gao, "Image super-resolution using network connection," in 2017 IEEE International Conference on computer vision (ic cv. IEEE,2017, super-resolution, 4809-4817.), and antagonistic network generating network gan (c.light, l.die, l.balance, voice. transform, call. sub.35J.forward, sample software, and software).
Table 1 shows the corresponding objective and subjective evaluation indexes of each method under the condition of a scaling factor of 4, including PSNR (peak signal-to-noise ratio), SSIM (structural similarity) and MOS (mean subjective opinion score). As can be seen from Table 1, the method of the present invention has a stable improvement in the subjective quality of the restored image.
TABLE 1 Objective and subjective Scoring for each method
Figure BDA0002055839000000081
For example, as shown in fig. 4, it can be seen from the comparison result that the method of the present invention is more vivid and specific in the reconstructed details than other methods, has higher sense of realism and visual persuasion as a whole, and has a greater improvement in the subjective evaluation index on the level that the objective evaluation index is kept basically unchanged.
It should be understood that parts of the specification not set forth in detail are well within the prior art.
It should be understood that the above description of the embodiments is in some detail, and not to be taken as limiting the scope of the invention. Those skilled in the art can make substitutions and modifications without departing from the scope of the invention as defined by the appended claims, and the scope of the invention is not limited by the claims.

Claims (8)

1. A super-resolution reconstruction method combining semantic segmentation is characterized by comprising the following steps:
step 1, constructing a low-resolution semantic segmentation data set, wherein the low-resolution semantic segmentation data set comprises a low-resolution image and a corresponding semantic layout;
step 2, training a semantic segmentation network by using a low-resolution semantic segmentation data set;
step 3, constructing a data set for training the super-resolution reconstruction network, wherein the data set for training the super-resolution reconstruction network comprises a semantic layout chart and a semantic feature chart of a low-resolution image and a corresponding high-resolution image, and the semantic layout chart and the semantic feature chart of the low-resolution image are obtained by inputting the low-resolution image into the semantic segmentation network trained in the step 2;
step 4, taking the semantic layout map and the semantic feature map as input, taking a high-resolution image corresponding to the semantic layout map as a true value, and training a super-resolution reconstruction network to output a corresponding high-resolution reconstruction result according to the input semantic layout map;
and 5, inputting a low-resolution picture to be reconstructed into the semantic segmentation result obtained in the step 2 to obtain a semantic layout picture and a semantic feature picture, inputting the semantic layout picture and the semantic feature picture into the super-resolution reconstruction network obtained by training in the step 4, and finally obtaining a reconstructed high-resolution image.
2. The super-resolution reconstruction method based on semantic segmentation as claimed in claim 1, wherein: the low-resolution semantic segmentation data set in the step 1 is obtained by down-sampling a high-resolution image and a semantic layout in a common semantic segmentation data set by the same scaling factor, and the obtained low-resolution image and the semantic layout form the low-resolution semantic segmentation data set.
3. The super-resolution reconstruction method based on semantic segmentation as claimed in claim 1, wherein: the semantic segmentation network in the step 2 is a full convolution network obtained by changing a full connection layer in the VGG16 into a convolution layer, and the specific network structure is as follows: convolutional layer × 2+ pooling layer + convolutional layer × 3+ pooling layer + convolutional layer × 2+ deconvolution layer, wherein the convolutional core size of the convolutional layer is 3 × 3, and the pooling layer is the largest pooling layer.
4. The super-resolution reconstruction method based on semantic segmentation as claimed in claim 3, wherein: initializing the weight of the full convolution network to the weight in the pre-trained VGG 16; the loss function optimized in the training is the sum of deviations of pixel predicted values of the last layer of the network; the specific parameters of the training are as follows: the training batch size was 20, using a momentum of 0.9 and a decay rate of 10-4The Adam algorithm is optimized, and the learning rate of the network is 10-4
5. The super-resolution reconstruction method based on semantic segmentation as claimed in claim 1, wherein: step 4, the super-resolution reconstruction network is a cascade reconstruction network composed of a series of cascade reconstruction modules, the cascade reconstruction modules operate with increasing resolution, wherein each reconstruction module is composed of 3 network layers: the first layer is a feature fusion layer and is used for fusing an input semantic layout graph and a semantic feature graph with an output result of the previous layer; the latter two layers are convolution layers with a 3 x 3 convolution kernel, layer regularization and modified linear units as a means of reconstructing the fused features.
6. The super-resolution reconstruction method based on semantic segmentation as claimed in claim 5, wherein: the specific operational relationship between reconstruction modules in a super-resolution reconstruction network is as follows,
the first reconstruction module takes the semantic layout map and the semantic feature map which are down-sampled to the current resolution as input, outputs a result of the current resolution, the result is regarded as the feature map which is merged and convolved, the later reconstruction module takes the result of the previous module, the semantic layout map and the semantic feature map which are down-sampled as input, outputs a new result, and the final result output by the reconstruction module is the super-resolution reconstruction result after a plurality of processes, wherein the mathematical description of the process is as follows:
Figure FDA0002599839720000021
Figure FDA0002599839720000022
wherein, OiThe output of the ith reconstruction module is shown, F is the convolution operation in the reconstruction module, L is the semantic layout, F is the semantic feature map, and ^ c is the feature fusion.
7. The super-resolution reconstruction method based on semantic segmentation as claimed in claim 5, wherein: the loss function used in training the super-resolution reconstruction network in step 4 is,
Figure FDA0002599839720000023
wherein, I is a high-resolution image representing a true value, f is a cascade reconstruction network to be trained, theta is a parameter set in f, L is an input semantic layout, phi is a trained visual perception network which is a VGG network, and phi is a parameter set in flRepresenting convolutional layers in a visual perception network, lambdalIn order to control the over-parameters of the weights, the values are adjusted along with the training process in the training process.
8. The super-resolution reconstruction method based on semantic segmentation as claimed in claim 5, wherein: when the super-resolution reconstruction network is trained, the specific setting is as follows: the integral iteration times are 200 generations; the learning rate of the model is 10-4And the learning rate is reduced to half after every 100 training generations; using a momentum of 0.9 and a decay rate of 10-4The Adam algorithm of (1) is optimized.
CN201910389111.9A 2019-05-10 2019-05-10 Super-resolution reconstruction method combining semantic segmentation Expired - Fee Related CN110136062B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910389111.9A CN110136062B (en) 2019-05-10 2019-05-10 Super-resolution reconstruction method combining semantic segmentation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910389111.9A CN110136062B (en) 2019-05-10 2019-05-10 Super-resolution reconstruction method combining semantic segmentation

Publications (2)

Publication Number Publication Date
CN110136062A CN110136062A (en) 2019-08-16
CN110136062B true CN110136062B (en) 2020-11-03

Family

ID=67573253

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910389111.9A Expired - Fee Related CN110136062B (en) 2019-05-10 2019-05-10 Super-resolution reconstruction method combining semantic segmentation

Country Status (1)

Country Link
CN (1) CN110136062B (en)

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110570355B (en) * 2019-09-12 2020-09-01 杭州海睿博研科技有限公司 Multi-scale automatic focusing super-resolution processing system and method
CN110991485B (en) * 2019-11-07 2023-04-14 成都傅立叶电子科技有限公司 Performance evaluation method and system of target detection algorithm
CN111145202B (en) * 2019-12-31 2024-03-08 北京奇艺世纪科技有限公司 Model generation method, image processing method, device, equipment and storage medium
CN111461990B (en) * 2020-04-03 2022-03-18 华中科技大学 Method for realizing super-resolution imaging step by step based on deep learning
CN112288627B (en) * 2020-10-23 2022-07-05 武汉大学 Recognition-oriented low-resolution face image super-resolution method
CN112634160A (en) * 2020-12-25 2021-04-09 北京小米松果电子有限公司 Photographing method and device, terminal and storage medium
CN112546463B (en) * 2021-02-25 2021-06-01 四川大学 Radiotherapy dose automatic prediction method based on deep neural network
CN113160234B (en) * 2021-05-14 2021-12-14 太原理工大学 Unsupervised remote sensing image semantic segmentation method based on super-resolution and domain self-adaptation
CN113657388B (en) * 2021-07-09 2023-10-31 北京科技大学 Image semantic segmentation method for super-resolution reconstruction of fused image
CN113781488A (en) * 2021-08-02 2021-12-10 横琴鲸准智慧医疗科技有限公司 Tongue picture image segmentation method, apparatus and medium
CN114782255B (en) * 2022-06-16 2022-09-02 武汉大学 Semantic-based noctilucent remote sensing image high-resolution reconstruction method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108335306A (en) * 2018-02-28 2018-07-27 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
CN108830855A (en) * 2018-04-02 2018-11-16 华南理工大学 A kind of full convolutional network semantic segmentation method based on the fusion of multiple dimensioned low-level feature
CN109064399A (en) * 2018-07-20 2018-12-21 广州视源电子科技股份有限公司 Image super-resolution rebuilding method and system, computer equipment and its storage medium
CN109191392A (en) * 2018-08-09 2019-01-11 复旦大学 A kind of image super-resolution reconstructing method of semantic segmentation driving

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10803378B2 (en) * 2017-03-15 2020-10-13 Samsung Electronics Co., Ltd System and method for designing efficient super resolution deep convolutional neural networks by cascade network training, cascade network trimming, and dilated convolutions
CN109087274B (en) * 2018-08-10 2020-11-06 哈尔滨工业大学 Electronic device defect detection method and device based on multi-dimensional fusion and semantic segmentation
CN109544555B (en) * 2018-11-26 2021-09-03 陕西师范大学 Tiny crack segmentation method based on generation type countermeasure network

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108335306A (en) * 2018-02-28 2018-07-27 北京市商汤科技开发有限公司 Image processing method and device, electronic equipment and storage medium
CN108830855A (en) * 2018-04-02 2018-11-16 华南理工大学 A kind of full convolutional network semantic segmentation method based on the fusion of multiple dimensioned low-level feature
CN109064399A (en) * 2018-07-20 2018-12-21 广州视源电子科技股份有限公司 Image super-resolution rebuilding method and system, computer equipment and its storage medium
CN109191392A (en) * 2018-08-09 2019-01-11 复旦大学 A kind of image super-resolution reconstructing method of semantic segmentation driving

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Fully Convolutional Networks for Semantic Segmentation;Jonathan Long 等;《CVPR 2015》;20151231;第3431-3440页 *

Also Published As

Publication number Publication date
CN110136062A (en) 2019-08-16

Similar Documents

Publication Publication Date Title
CN110136062B (en) Super-resolution reconstruction method combining semantic segmentation
Wang et al. Deep video super-resolution using HR optical flow estimation
CN109741260B (en) Efficient super-resolution method based on depth back projection network
Wang et al. Multi-memory convolutional neural network for video super-resolution
CN112396607B (en) Deformable convolution fusion enhanced street view image semantic segmentation method
CN111709895A (en) Image blind deblurring method and system based on attention mechanism
CN111242846B (en) Fine-grained scale image super-resolution method based on non-local enhancement network
CN109035146B (en) Low-quality image super-resolution method based on deep learning
Chen et al. Cross parallax attention network for stereo image super-resolution
Chadha et al. iSeeBetter: Spatio-temporal video super-resolution using recurrent generative back-projection networks
CN113837946B (en) Lightweight image super-resolution reconstruction method based on progressive distillation network
CN111861886B (en) Image super-resolution reconstruction method based on multi-scale feedback network
CN114898284B (en) Crowd counting method based on feature pyramid local difference attention mechanism
Singla et al. A review on Single Image Super Resolution techniques using generative adversarial network
CN112837224A (en) Super-resolution image reconstruction method based on convolutional neural network
US20230153946A1 (en) System and Method for Image Super-Resolution
Guan et al. Srdgan: learning the noise prior for super resolution with dual generative adversarial networks
Li et al. DLGSANet: lightweight dynamic local and global self-attention networks for image super-resolution
CN116563100A (en) Blind super-resolution reconstruction method based on kernel guided network
Esmaeilzehi et al. UPDResNN: A deep light-weight image upsampling and deblurring residual neural network
CN113962905A (en) Single image rain removing method based on multi-stage feature complementary network
CN113592715A (en) Super-resolution image reconstruction method for small sample image set
CN112598604A (en) Blind face restoration method and system
Yu et al. MagConv: Mask-guided convolution for image inpainting
Albluwi et al. Super-resolution on degraded low-resolution images using convolutional neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20201103

Termination date: 20210510