US20170046816A1 - Super resolution image enhancement technique - Google Patents
Super resolution image enhancement technique Download PDFInfo
- Publication number
- US20170046816A1 US20170046816A1 US14/827,030 US201514827030A US2017046816A1 US 20170046816 A1 US20170046816 A1 US 20170046816A1 US 201514827030 A US201514827030 A US 201514827030A US 2017046816 A1 US2017046816 A1 US 2017046816A1
- Authority
- US
- United States
- Prior art keywords
- patches
- regression coefficients
- image
- cluster
- clusters
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 79
- 238000012549 training Methods 0.000 claims description 34
- 230000008569 process Effects 0.000 claims description 23
- 230000009467 reduction Effects 0.000 claims description 14
- 230000003044 adaptive effect Effects 0.000 claims description 7
- 238000005457 optimization Methods 0.000 claims description 6
- 238000000513 principal component analysis Methods 0.000 claims description 6
- 238000011946 reduction process Methods 0.000 claims description 6
- 238000012935 Averaging Methods 0.000 claims description 2
- 230000002146 bilateral effect Effects 0.000 description 12
- 238000003064 k means clustering Methods 0.000 description 11
- 238000000605 extraction Methods 0.000 description 8
- 230000006870 function Effects 0.000 description 6
- 238000013507 mapping Methods 0.000 description 5
- 239000011159 matrix material Substances 0.000 description 4
- 238000003708 edge detection Methods 0.000 description 3
- 238000009499 grossing Methods 0.000 description 3
- 230000000007 visual effect Effects 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000014509 gene expression Effects 0.000 description 2
- 230000006872 improvement Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000010606 normalization Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012800 visualization Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000015556 catabolic process Effects 0.000 description 1
- 238000007621 cluster analysis Methods 0.000 description 1
- 238000006731 degradation reaction Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000002156 mixing Methods 0.000 description 1
- 230000008447 perception Effects 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000000638 solvent extraction Methods 0.000 description 1
- 230000026676 system process Effects 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4053—Scaling of whole images or parts thereof, e.g. expanding or contracting based on super-resolution, i.e. the output image resolution being higher than the sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G06K9/342—
-
- G06K9/46—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/30—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/762—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using clustering, e.g. of similar faces in social networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/7715—Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/10—Image acquisition
- G06V10/16—Image acquisition using multiple overlapping images; Image stitching
Definitions
- Super-resolution techniques generate high-resolution (HR) images from one or more low-resolution (LR) images.
- HR high-resolution
- LR low-resolution
- the input video should be increased in resolution to match that of the display. Accordingly, often the super-resolution technique predicts thousands of unknown pixel values from a small fraction of input pixels. This is inherently an ill-posed problem and the ambiguity increases as the scaling factor increases.
- the amount of information that is “missing” in a low resolution image relative to the target high resolution image is very large, in terms of fine detail and high frequency information that contributes to the perception of a high quality rendered image on a high resolution display.
- Existing techniques have a limited ability to restore and enhance fine image detail. It is desirable to reconstruct and enhance image detail with a high quality result even at increased upscaling factors.
- Existing techniques for super-resolution image and video upscaling often require very high computational cost. Some techniques combine images of a scene or multiple frames of a video to enhance resolution, which may incur high memory or data transfer costs. Some techniques utilize iterative optimization approaches to enhance resolution, which may incur high computational cost. It is also desirable to perform super-resolution image generation at a limited computational cost. It is desirable for a super resolution image enhancement system that uses a single low resolution input image to generate a high resolution output image.
- FIG. 1 illustrates an exemplary training technique
- FIG. 2 illustrates a graph of visualization of energy versus dimensionality.
- FIG. 3 illustrates an exemplary high resolution image generation technique.
- FIG. 4 illustrates another exemplary high resolution image generation technique.
- FIG. 5 illustrates a post super-resolution processing technique
- FIG. 6 illustrates a de-ringing processing technique
- FIG. 7 illustrates a jaggyness reduction technique
- a super-resolution technique includes a training phase 100 to create a model that is used for a subsequent resolution enhancement technique.
- the system uses a database of high resolution (HR) training images 110 .
- the HR training images 110 are representative of natural images with fine detail, such as scenery and/or items and/or people, rather than synthetic computer generated graphics.
- the system may obtain a corresponding low resolution (LR) image 112 I l .
- a database of LR images 112 corresponding to the HR images 110 may be used. Using a suitable technique, the system may use any set of HR images and determine a corresponding LR image for each.
- the LR images 112 may be processed to perform feature extraction and dimensionality reduction 114 based upon patches of the LR images 112 .
- using raw pixel values during subsequent clustering provides limited generalization properties.
- a feature such as a gradient feature.
- first and second order gradients may be used to characterize features of the low resolution patches of the low resolution images 112 .
- Four 1-D filters may be used to extract the first and second order derivatives or gradients in horizontal and vertical direction as follows:
- the system processes image data on a patch-by-patch basis, where a patch includes a small block of image pixels.
- a patch may correspond to a 7 ⁇ 7 block of pixels in the LR image.
- each LR image patch may include 45 pixels in a small neighborhood or image area. The computation of the gradients in the manner above increases the dimensionality of the LR patch from 45 to 180, thus increasing the computational complexity of the system.
- the system may apply a principal component analysis dimensionality reduction.
- the principal component analysis projects the features to a lower-dimensional space.
- the principal component analysis either linear or non-linear, may be used to reduce the dimensionality from 180 dimensions to 36 dimensions, thus reducing the dimensionality of the features by 80%.
- FIG. 2 a graph of the visualization of energy versus dimensionality may be observed.
- the principal component analysis results in information primarily along horizontal, vertical, and diagonal edges together with a representation of texture. Further, the use of the principal component analysis results in a reduction in jaggy artifacts during reconstruction.
- Any technique may be used to characterize features of the images, such as on a patch basis, and any technique may be used for dimensionality reduction, if desired.
- a suitable technique that extracts compact features directly from the image in a single step may be used instead of the 2-step feature extraction and dimensionality reduction process.
- suitable normalization techniques may be applied to the features, such as thresholding, clipping and normalizing by their vector norm.
- a first step towards determining optimized patch feature clusters may include K-means clustering 116 .
- K-means clustering is a well-known technique of vector quantization of the features that performs cluster analysis in the data by partitioning N observations into K clusters in which each observation belongs to the cluster with the nearest mean or cluster center, serving as the prototype of the cluster. This may be performed, for example, using a technique similar to an expected-maximization technique for mixtures of Gaussian distributions via an iterative refinement.
- Each of the cluster centers may be considered to be representative of the feature space of the natural image patches.
- the system may collect a fixed number of exemplar training patches, which reduces the computational complexity of the system. Other clustering techniques may likewise be used, if desired.
- the collection of a fixed number of exemplar training patches for each cluster is used to train a mapping function. It has been observed that some clusters in the feature space have very few corresponding exemplar training patches. While having a very limited number of exemplary training patches for some clusters may be useful in efficiently determining the feature, it turns out that using such a limited set of corresponding training patches results in poor subsequent reconstruction of a high resolution image and also results in undesirable artifacts in the reconstruction of the high resolution image. Accordingly, in the case of cluster centers having a fewer number of corresponding low resolution patches than a threshold, it is preferable to include additional low resolution training patches for those cluster centers.
- the additional training patches may correspond to its M nearest neighbor clusters, which may be determined using a distance metric.
- the selection and grouping of the additional training patches in this manner from its nearest neighboring clusters results in an increased probability that those training patches are close in appearance to one another. Also, for clusters that are close to one another and do not have sufficient exemplar training patches, it increases robustness of subsequent regression coefficients, described later, since the same samples can be shared with neighboring clusters.
- the exemplar training patches may be used to train the mapping function based on the K-means clustering 116 .
- the system may apply multiple different rounds of K-means clustering 116 A- 116 C.
- the different rounds of K-means clustering may be initialized with different randomized seeds so that different clustering outcomes are obtained.
- the different rounds of K-means clustering may be based upon different clustering techniques for the data.
- One of the different K-means clustering 116 A- 116 C may be selected as the best clustering result 118 , as described below.
- the system may use “ground truth” HR information 120 to validate the clustering process based on a reconstruction error to select the best K-means clustering result 118 .
- This reconstruction error may be a residual sum of squared errors (RSS) aggregated over all training patches.
- the residual sum of squared errors is evaluated between the ground truth HR image data and predicted high resolution image data that is generated by applying regression coefficients, where the regression coefficients are determined as described below.
- Each of the cluster centers 122 is representative of a clustering of a set of LR patches from the low resolution training images 112 .
- the clustering centers 122 may be associated with a database of the LR patches corresponding to each of the cluster centers 122 . It is noted that in some cases, one patch may correspond to multiple different cluster centers.
- the cluster centers 122 may be provided to the high resolution image generation process 300 . In particular, the cluster centers 122 may be used to characterize a low resolution input patch of an input image provided during the high resolution image generation process.
- cluster centers 122 may be used to characterize a low resolution input patch of the input images of the high resolution image generation phase, there also needs to be a function provided to the high resolution image generation phase that characterizes the corresponding unknown high resolution patch for the resolution upsampling.
- a set of exemplar patches are identified 150 based upon the cluster centers 122 . This may be provided by way of a known relationship between the cluster centers 122 and the corresponding low resolution input patches. In this manner, the cluster centers identify the groups of patches 150 of the low resolution images corresponding with each of the cluster centers.
- the exemplary patches 150 of the low resolution images are provided together with the corresponding patches of the high resolution images 110 to a regression coefficients calculation process 152 .
- a set of regression coefficients may be determined 152 to characterize a corresponding high resolution patch based upon a low resolution patch. Other techniques may be used to determine a high resolution patch based upon a low resolution patch.
- the output of the regression coefficients calculation process 152 may be a set of regression coefficients 310 for each corresponding cluster center 122 .
- the system may learn a mapping function based upon a least squares approximation.
- the regression coefficients of the mapping function may be determined by linear least-squares minimization as follows:
- C i are the regression coefficients for each cluster i
- W i are the samples of the group of HR patches associated with cluster i collected in a matrix
- X i are the samples of the LR patches associated with cluster i collected in a matrix
- “1” is a vector with the same number of elements as the number of training patches in X i filled entirely with ones.
- These regression coefficients differ for each cluster and storing them results in a computational efficiency increase of the high resolution image generation process.
- the system first computes the mean of each LR patch and determines the LR samples as the intensity samples subtracting the mean of that patch.
- the system may subtract the mean of the corresponding LR patch from the intensity samples of the HR patch.
- the system may also use a filtered version of the LR patch to emphasize fine detail in the LR and HR samples used for regression.
- the system may use other forms of normalization of the LR and HR patch samples before calculating regression coefficients.
- the system may include an additional cluster center optimization stage 160 . It is the goal of the cluster center optimization stage to further improve the visual quality of the super-resolution image output. This optimization stage performs further minimization of the reconstruction error during the training phase 100 .
- the reconstruction error may be a residual sum of squared errors (RSS) aggregated over all training patches. The residual sum of squared errors is evaluated between the ground truth HR image data and predicted high resolution image data that is generated by applying regression coefficients.
- the reconstruction error may be minimized during the training phase in an iterative manner, using known nonlinear optimization algorithms. For example, a simplex algorithm may be used for minimization.
- the reconstruction error minimization process 160 may start with the cluster centers that are determined as described above and compute the reconstruction error as described above.
- the process may then determine new candidate cluster center locations, and determine the corresponding regression coefficients as described above, and again compute the reconstruction error (for example, RSS) as described above.
- the system may iteratively minimize the reconstruction error and achieve improved visual quality of the high resolution output images.
- a low-resolution (LR) image 322 is received.
- the LR image 322 is processed in a patch-by-patch manner.
- the low-resolution image 322 may be processed using a feature extraction and dimensionality reduction 324 .
- the feature extraction and dimensionality reduction that is applied to each patch in the LR input image 324 preferably matches the feature extraction and dimensionality reduction 114 so that the feature extraction and dimensionality reduction outputs mirror one another. If desired, the feature extraction and/or dimensionality reduction 324 and 114 may be different from one another.
- a fast search for approximate closest cluster 326 using the output of the feature extraction and dimensionality reduction 324 may be performed based upon the output 300 of the cluster centers 122 . While the search may be performed in a linear and exhaustive fashion, it tends to be a computationally intensive step. Instead of looking for the exact nearest neighbor cluster center it is preferable to use a KD-Tree to perform a non-exhaustive, approximate search for the nearest neighbor cluster center.
- the KD-Tree is a generalization of a binary search tree that stores k-dimensional points. The KD-Tree reduces the computational time needed to find a suitable cluster center given the input LR features.
- the KD-Tree data-structure is preferably computed off-line during the training stage, and is subsequently used during the high resolution image generation stage.
- Other approximate search techniques may be likewise used, as desired.
- another known technique is based on using hashing tables.
- the system may apply regression coefficients 328 to the LR input patch 330 based upon the regression coefficients 310 associated with the closest cluster center, provided as a result of the training stage.
- the regression coefficients of the mapping function may be obtained by linear least-squares minimization as follows:
- Ci are the regression coefficients for each cluster i
- W i are the samples of the group of HR patches associated with cluster i collected in a matrix
- X i are the samples of the LR patches associated with cluster i collected in a matrix
- “1” is a vector with the same number of elements as the number of training patches in X i filled entirely with ones.
- the system may use the KD-tree to search for multiple approximate nearest neighbors 350 . This results in an improvement in the searching with limited additional computational complexity.
- the system may perform an application of regression coefficients 352 to the LR input patch 330 based upon the corresponding regression coefficients 310 for each of the multiple selected (L) cluster centers.
- the high resolution image patches resulting from the multiple application of regression coefficients 352 may be combined in any manner, such as a weighted sum of image samples 354 which then results in the high resolution image 332 . This may include combining the pixel values of generated high resolution image patches that may partially overlap, by a weighted average technique.
- the high resolution output image 332 may be further processed with a de-ringing process 500 , and a jaggyness reduction process 510 .
- the de-ringing process 500 may include a local weighted averaging filter, such as a bilateral filter or an adaptive bilateral filter 610 based on the HR image 332 .
- the bilateral filter reduces ringing artifacts near edges by smoothing. However, the bilateral filter may also undesirably smooth fine detail away from edges.
- the de-ringing process 500 may use an edge distance map 620 to prevent smoothing detail that is not near an edge.
- the de-ringing process 500 may determine an edge distance map 620 based upon the HR image 332 .
- the de-ringing process 500 may blend 630 the HR image 640 with the output of the bilateral filter/adaptive bilateral filter 610 based upon a soft threshold on the edge distance map 620 .
- the soft threshold may be controlled by the edge distance map 620 .
- the final output is the weighted sum of the output of the bilateral filter and the original input image, where the weights are locally adapted based on the edge distance map.
- a higher weight is given to bilateral filtered pixel data
- a higher weight is given to the unfiltered HR pixel data 640 .
- a lower weight is given to the bilateral filtered pixel data are applied.
- the output of the blending 630 is a blended image 650 .
- the process may include further edge enhancement by using the known adaptive bilateral filter, instead of the bilateral filter.
- the adaptive bilateral filter switches from smoothing to sharpening close to a significant edge.
- the edge map can be obtained from various edge detection techniques, for instance, canny edge detection or sobel edge detection.
- I out , I in and I bit are output image, input image and the filtered image respectively.
- the edge jaggyness reduction process 510 may include an adaptive kernel regression filter 710 based upon the blended image 650 .
- the jaggyness reduction process 510 may include the determination of local gradients and local image derivatives 720 based upon the blended image 650 .
- the adaptive kernel regression 710 may be based upon the local derivatives and gradients 720 which are used to control the kernel regression and differentiate jaggy edge artifacts from texture, junctions, and corners. Discriminating strong edges from fine texture detail and other image features is important to avoid undesirable reduction of such fine detail by the jaggyness reduction filter.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Software Systems (AREA)
- Medical Informatics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- General Engineering & Computer Science (AREA)
- Image Processing (AREA)
Abstract
Description
- None.
- Super-resolution techniques generate high-resolution (HR) images from one or more low-resolution (LR) images. With the improvement in the resolution of image capture technology, even though there are capture and display devices that can produce high-resolution images and videos, there are many existing low-resolution images and videos that can be found in surveillance videos, mobile devices, and broadcast content. In order to improve the user experience while watching such content on higher resolution display devices, such as high definition display device, 4K display device, or 8K display device, the input video should be increased in resolution to match that of the display. Accordingly, often the super-resolution technique predicts thousands of unknown pixel values from a small fraction of input pixels. This is inherently an ill-posed problem and the ambiguity increases as the scaling factor increases. The amount of information that is “missing” in a low resolution image relative to the target high resolution image is very large, in terms of fine detail and high frequency information that contributes to the perception of a high quality rendered image on a high resolution display. Existing techniques have a limited ability to restore and enhance fine image detail. It is desirable to reconstruct and enhance image detail with a high quality result even at increased upscaling factors. Existing techniques for super-resolution image and video upscaling often require very high computational cost. Some techniques combine images of a scene or multiple frames of a video to enhance resolution, which may incur high memory or data transfer costs. Some techniques utilize iterative optimization approaches to enhance resolution, which may incur high computational cost. It is also desirable to perform super-resolution image generation at a limited computational cost. It is desirable for a super resolution image enhancement system that uses a single low resolution input image to generate a high resolution output image.
- The foregoing and other objectives, features, and advantages of the invention may be more readily understood upon consideration of the following detailed description of the invention, taken in conjunction with the accompanying drawings.
-
FIG. 1 illustrates an exemplary training technique. -
FIG. 2 illustrates a graph of visualization of energy versus dimensionality. -
FIG. 3 illustrates an exemplary high resolution image generation technique. -
FIG. 4 illustrates another exemplary high resolution image generation technique. -
FIG. 5 illustrates a post super-resolution processing technique. -
FIG. 6 illustrates a de-ringing processing technique. -
FIG. 7 illustrates a jaggyness reduction technique. - Referring to
FIG. 1 , a super-resolution technique includes atraining phase 100 to create a model that is used for a subsequent resolution enhancement technique. During the training phase, the system uses a database of high resolution (HR)training images 110. Preferably, theHR training images 110 are representative of natural images with fine detail, such as scenery and/or items and/or people, rather than synthetic computer generated graphics. For each HR training image 110 Ih, the system may obtain a corresponding low resolution (LR) image 112 Il. For example, thecorresponding LR image 112 may be computed as follows: Il=(Ih*G)↓. * denotes convolution, G is a Gaussian Kernel, and ↓ is a down-sampling operator. Other filter kernels and other degradation operations may be used as appropriate. A database ofLR images 112 corresponding to theHR images 110 may be used. Using a suitable technique, the system may use any set of HR images and determine a corresponding LR image for each. - It is desirable to convert the
LR images 112 to a different space, such as a feature space, to further characterize the image content. The LR images may be processed to perform feature extraction anddimensionality reduction 114 based upon patches of theLR images 112. In particular, using raw pixel values during subsequent clustering provides limited generalization properties. Rather than using raw pixel values during subsequent clustering, it is preferable to use a feature, such as a gradient feature. For example, first and second order gradients may be used to characterize features of the low resolution patches of thelow resolution images 112. Four 1-D filters may be used to extract the first and second order derivatives or gradients in horizontal and vertical direction as follows: -
f 1=[−1,0,1],f 2 =f 1 T -
f 3=[1,−2,1],f 4 =f 3 T - The system processes image data on a patch-by-patch basis, where a patch includes a small block of image pixels. For example a patch may correspond to a 7×7 block of pixels in the LR image. As another example, each LR image patch may include 45 pixels in a small neighborhood or image area. The computation of the gradients in the manner above increases the dimensionality of the LR patch from 45 to 180, thus increasing the computational complexity of the system.
- To both reduce the computational complexity and increase the discriminative property amongst the features, the system may apply a principal component analysis dimensionality reduction. The principal component analysis projects the features to a lower-dimensional space. For example, the principal component analysis, either linear or non-linear, may be used to reduce the dimensionality from 180 dimensions to 36 dimensions, thus reducing the dimensionality of the features by 80%. Referring to
FIG. 2 , a graph of the visualization of energy versus dimensionality may be observed. The principal component analysis results in information primarily along horizontal, vertical, and diagonal edges together with a representation of texture. Further, the use of the principal component analysis results in a reduction in jaggy artifacts during reconstruction. Any technique may be used to characterize features of the images, such as on a patch basis, and any technique may be used for dimensionality reduction, if desired. In addition, a suitable technique that extracts compact features directly from the image in a single step may be used instead of the 2-step feature extraction and dimensionality reduction process. In addition, suitable normalization techniques may be applied to the features, such as thresholding, clipping and normalizing by their vector norm. - The system may then cluster all, or a selected set of, the dimensionally reduced extracted
features 114 of the LR patches in a manner that optimizes the visual quality of the super-resolution image output. A first step towards determining optimized patch feature clusters may include K-means clustering 116. K-means clustering is a well-known technique of vector quantization of the features that performs cluster analysis in the data by partitioning N observations into K clusters in which each observation belongs to the cluster with the nearest mean or cluster center, serving as the prototype of the cluster. This may be performed, for example, using a technique similar to an expected-maximization technique for mixtures of Gaussian distributions via an iterative refinement. Each of the cluster centers may be considered to be representative of the feature space of the natural image patches. For each cluster, the system may collect a fixed number of exemplar training patches, which reduces the computational complexity of the system. Other clustering techniques may likewise be used, if desired. - As described above using the K-means clustering, the collection of a fixed number of exemplar training patches for each cluster is used to train a mapping function. It has been observed that some clusters in the feature space have very few corresponding exemplar training patches. While having a very limited number of exemplary training patches for some clusters may be useful in efficiently determining the feature, it turns out that using such a limited set of corresponding training patches results in poor subsequent reconstruction of a high resolution image and also results in undesirable artifacts in the reconstruction of the high resolution image. Accordingly, in the case of cluster centers having a fewer number of corresponding low resolution patches than a threshold, it is preferable to include additional low resolution training patches for those cluster centers. The additional training patches may correspond to its M nearest neighbor clusters, which may be determined using a distance metric. The selection and grouping of the additional training patches in this manner from its nearest neighboring clusters results in an increased probability that those training patches are close in appearance to one another. Also, for clusters that are close to one another and do not have sufficient exemplar training patches, it increases robustness of subsequent regression coefficients, described later, since the same samples can be shared with neighboring clusters.
- As previously described, the exemplar training patches may be used to train the mapping function based on the K-means clustering 116. The system may apply multiple different rounds of K-
means clustering 116A-116C. The different rounds of K-means clustering may be initialized with different randomized seeds so that different clustering outcomes are obtained. The different rounds of K-means clustering may be based upon different clustering techniques for the data. One of the different K-means clustering 116A-116C may be selected as thebest clustering result 118, as described below. - To determine which of the K-means clustering results is preferable, or otherwise more representative of the HR image content, the system may use “ground truth”
HR information 120 to validate the clustering process based on a reconstruction error to select the best K-means clustering result 118. This reconstruction error may be a residual sum of squared errors (RSS) aggregated over all training patches. The residual sum of squared errors is evaluated between the ground truth HR image data and predicted high resolution image data that is generated by applying regression coefficients, where the regression coefficients are determined as described below. - With the best K-
means clustering outcome 118 selected, this results in a set of cluster centers 122. Each of the cluster centers 122 is representative of a clustering of a set of LR patches from the lowresolution training images 112. Also, the clustering centers 122 may be associated with a database of the LR patches corresponding to each of the cluster centers 122. It is noted that in some cases, one patch may correspond to multiple different cluster centers. The cluster centers 122 may be provided to the high resolutionimage generation process 300. In particular, the cluster centers 122 may be used to characterize a low resolution input patch of an input image provided during the high resolution image generation process. However, while the cluster centers 122 may be used to characterize a low resolution input patch of the input images of the high resolution image generation phase, there also needs to be a function provided to the high resolution image generation phase that characterizes the corresponding unknown high resolution patch for the resolution upsampling. - A set of exemplar patches are identified 150 based upon the cluster centers 122. This may be provided by way of a known relationship between the cluster centers 122 and the corresponding low resolution input patches. In this manner, the cluster centers identify the groups of
patches 150 of the low resolution images corresponding with each of the cluster centers. Theexemplary patches 150 of the low resolution images are provided together with the corresponding patches of thehigh resolution images 110 to a regressioncoefficients calculation process 152. A set of regression coefficients may be determined 152 to characterize a corresponding high resolution patch based upon a low resolution patch. Other techniques may be used to determine a high resolution patch based upon a low resolution patch. The output of the regressioncoefficients calculation process 152 may be a set ofregression coefficients 310 for eachcorresponding cluster center 122. - For example, for each cluster using information from the corresponding exemplar patches, the system may learn a mapping function based upon a least squares approximation. The regression coefficients of the mapping function may be determined by linear least-squares minimization as follows:
-
- Ci are the regression coefficients for each cluster i, Wi are the samples of the group of HR patches associated with cluster i collected in a matrix, Xi are the samples of the LR patches associated with cluster i collected in a matrix, and “1” is a vector with the same number of elements as the number of training patches in Xi filled entirely with ones. These regression coefficients differ for each cluster and storing them results in a computational efficiency increase of the high resolution image generation process. Preferably, during the computation of the regression coefficients, the system first computes the mean of each LR patch and determines the LR samples as the intensity samples subtracting the mean of that patch. For the HR samples, the system may subtract the mean of the corresponding LR patch from the intensity samples of the HR patch. Instead of using the mean, the system may also use a filtered version of the LR patch to emphasize fine detail in the LR and HR samples used for regression. In addition, the system may use other forms of normalization of the LR and HR patch samples before calculating regression coefficients.
- Referring to
FIG. 1 , the system may include an additional clustercenter optimization stage 160. It is the goal of the cluster center optimization stage to further improve the visual quality of the super-resolution image output. This optimization stage performs further minimization of the reconstruction error during thetraining phase 100. The reconstruction error may be a residual sum of squared errors (RSS) aggregated over all training patches. The residual sum of squared errors is evaluated between the ground truth HR image data and predicted high resolution image data that is generated by applying regression coefficients. The reconstruction error may be minimized during the training phase in an iterative manner, using known nonlinear optimization algorithms. For example, a simplex algorithm may be used for minimization. The reconstructionerror minimization process 160 may start with the cluster centers that are determined as described above and compute the reconstruction error as described above. The process may then determine new candidate cluster center locations, and determine the corresponding regression coefficients as described above, and again compute the reconstruction error (for example, RSS) as described above. In this manner, the system may iteratively minimize the reconstruction error and achieve improved visual quality of the high resolution output images. - Referring to
FIG. 3 , during the high resolution image generation process 320 a low-resolution (LR)image 322 is received. TheLR image 322 is processed in a patch-by-patch manner. The low-resolution image 322 may be processed using a feature extraction anddimensionality reduction 324. The feature extraction and dimensionality reduction that is applied to each patch in theLR input image 324 preferably matches the feature extraction anddimensionality reduction 114 so that the feature extraction and dimensionality reduction outputs mirror one another. If desired, the feature extraction and/ordimensionality reduction - A fast search for approximate
closest cluster 326 using the output of the feature extraction anddimensionality reduction 324 may be performed based upon theoutput 300 of the cluster centers 122. While the search may be performed in a linear and exhaustive fashion, it tends to be a computationally intensive step. Instead of looking for the exact nearest neighbor cluster center it is preferable to use a KD-Tree to perform a non-exhaustive, approximate search for the nearest neighbor cluster center. The KD-Tree is a generalization of a binary search tree that stores k-dimensional points. The KD-Tree reduces the computational time needed to find a suitable cluster center given the input LR features. The KD-Tree data-structure is preferably computed off-line during the training stage, and is subsequently used during the high resolution image generation stage. Other approximate search techniques may be likewise used, as desired. As an example, another known technique is based on using hashing tables. - With the
closest cluster 326 identified for the patch of theLR input image 322, the system may applyregression coefficients 328 to theLR input patch 330 based upon theregression coefficients 310 associated with the closest cluster center, provided as a result of the training stage. For example, the regression coefficients of the mapping function may be obtained by linear least-squares minimization as follows: -
- Where Ci are the regression coefficients for each cluster i, Wi are the samples of the group of HR patches associated with cluster i collected in a matrix, Xi are the samples of the LR patches associated with cluster i collected in a matrix, and “1” is a vector with the same number of elements as the number of training patches in Xi filled entirely with ones. In this manner, the corresponding regression coefficients that were determined during the training stage are applied to input LR patches during the high resolution image generation stage in order to determine an appropriate
high resolution image 332. - Referring to
FIG. 4 , in another embodiment, during high resolution image generation the system may use the KD-tree to search for multiple approximatenearest neighbors 350. This results in an improvement in the searching with limited additional computational complexity. Preferably the system may look for the L=3 closest clusters while any number of nearest clusters may be used. Also, the system may perform an application ofregression coefficients 352 to theLR input patch 330 based upon the correspondingregression coefficients 310 for each of the multiple selected (L) cluster centers. The high resolution image patches resulting from the multiple application ofregression coefficients 352 may be combined in any manner, such as a weighted sum ofimage samples 354 which then results in thehigh resolution image 332. This may include combining the pixel values of generated high resolution image patches that may partially overlap, by a weighted average technique. - While the results of the regression-based technique provides a high quality image it tends to introduce artifacts near edges such as ringing and jaggyness. Referring to
FIG. 5 , to decrease the artifacts near the edges, the highresolution output image 332 may be further processed with ade-ringing process 500, and ajaggyness reduction process 510. - Referring to
FIG. 6 , thede-ringing process 500 may include a local weighted averaging filter, such as a bilateral filter or an adaptivebilateral filter 610 based on theHR image 332. The bilateral filter reduces ringing artifacts near edges by smoothing. However, the bilateral filter may also undesirably smooth fine detail away from edges. Hence, thede-ringing process 500 may use anedge distance map 620 to prevent smoothing detail that is not near an edge. Thede-ringing process 500 may determine anedge distance map 620 based upon theHR image 332. Thede-ringing process 500 may blend 630 theHR image 640 with the output of the bilateral filter/adaptivebilateral filter 610 based upon a soft threshold on theedge distance map 620. The soft threshold may be controlled by theedge distance map 620. The final output is the weighted sum of the output of the bilateral filter and the original input image, where the weights are locally adapted based on the edge distance map. When the pixel is close to the major edges a higher weight is given to bilateral filtered pixel data, and when the pixel is far away from the major edges, a higher weight is given to the unfilteredHR pixel data 640. When the pixel is far away from the major edges, a lower weight is given to the bilateral filtered pixel data are applied. The output of the blending 630 is a blendedimage 650. The process may include further edge enhancement by using the known adaptive bilateral filter, instead of the bilateral filter. The adaptive bilateral filter switches from smoothing to sharpening close to a significant edge. - In one embodiment, the blended
image 650 is calculated as: Iout=w/dth×Iin+(1−w/dth)×Ibit where dth is a constant number which clips the edge distance map. Namely if the edge distance is larger than dth, the edge distance is clipped to dth, otherwise, the edge distance is recorded as w. The edge map can be obtained from various edge detection techniques, for instance, canny edge detection or sobel edge detection. Iout, Iin and Ibit are output image, input image and the filtered image respectively. - Referring to
FIG. 7 , the edgejaggyness reduction process 510 may include an adaptivekernel regression filter 710 based upon the blendedimage 650. Thejaggyness reduction process 510 may include the determination of local gradients andlocal image derivatives 720 based upon the blendedimage 650. Theadaptive kernel regression 710 may be based upon the local derivatives andgradients 720 which are used to control the kernel regression and differentiate jaggy edge artifacts from texture, junctions, and corners. Discriminating strong edges from fine texture detail and other image features is important to avoid undesirable reduction of such fine detail by the jaggyness reduction filter. - The terms and expressions which have been employed in the foregoing specification are used in as terms of description and not of limitation, and there is no intention, in the use of such terms and expressions, of excluding equivalents of the features shown and described or portions thereof, it being recognized that the scope of the invention is defined and limited only by the claims which follow.
Claims (24)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/827,030 US9589323B1 (en) | 2015-08-14 | 2015-08-14 | Super resolution image enhancement technique |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/827,030 US9589323B1 (en) | 2015-08-14 | 2015-08-14 | Super resolution image enhancement technique |
Publications (2)
Publication Number | Publication Date |
---|---|
US20170046816A1 true US20170046816A1 (en) | 2017-02-16 |
US9589323B1 US9589323B1 (en) | 2017-03-07 |
Family
ID=57994686
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/827,030 Expired - Fee Related US9589323B1 (en) | 2015-08-14 | 2015-08-14 | Super resolution image enhancement technique |
Country Status (1)
Country | Link |
---|---|
US (1) | US9589323B1 (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107194893A (en) * | 2017-05-22 | 2017-09-22 | 西安电子科技大学 | Depth image ultra-resolution method based on convolutional neural networks |
CN107194872A (en) * | 2017-05-02 | 2017-09-22 | 武汉大学 | Remote sensed image super-resolution reconstruction method based on perception of content deep learning network |
CN107239783A (en) * | 2017-06-13 | 2017-10-10 | 中国矿业大学(北京) | Coal-rock identification method based on extension local binary patterns and regression analysis |
CN109636727A (en) * | 2018-12-17 | 2019-04-16 | 辽宁工程技术大学 | A kind of super-resolution rebuilding image spatial resolution evaluation method |
CN109712099A (en) * | 2018-12-04 | 2019-05-03 | 山东大学 | Method is equalized based on the sonar image of SLIC and adaptive-filtering |
CN110555800A (en) * | 2018-05-30 | 2019-12-10 | 北京三星通信技术研究有限公司 | image processing apparatus and method |
CN112150360A (en) * | 2020-09-16 | 2020-12-29 | 北京工业大学 | IVUS image super-resolution reconstruction method based on dense residual error network |
US20210342496A1 (en) * | 2018-11-26 | 2021-11-04 | Hewlett-Packard Development Company, L.P. | Geometry-aware interactive design |
CN115880157A (en) * | 2023-01-06 | 2023-03-31 | 中国海洋大学 | Stereo image super-resolution reconstruction method based on K space pyramid feature fusion |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108335265B (en) * | 2018-02-06 | 2021-05-07 | 上海通途半导体科技有限公司 | Rapid image super-resolution reconstruction method and device based on sample learning |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7218796B2 (en) * | 2003-04-30 | 2007-05-15 | Microsoft Corporation | Patch-based video super-resolution |
US7715658B2 (en) * | 2005-08-03 | 2010-05-11 | Samsung Electronics Co., Ltd. | Apparatus and method for super-resolution enhancement processing |
US8335403B2 (en) * | 2006-11-27 | 2012-12-18 | Nec Laboratories America, Inc. | Soft edge smoothness prior and application on alpha channel super resolution |
EP2179589A4 (en) * | 2007-07-20 | 2010-12-01 | Fujifilm Corp | Image processing apparatus, image processing method and program |
US8538203B2 (en) * | 2007-07-24 | 2013-09-17 | Sharp Laboratories Of America, Inc. | Image upscaling technique |
CN101414925B (en) * | 2007-10-17 | 2011-04-06 | 华为技术有限公司 | Method, system and apparatus for configuring optical network terminal |
US9064476B2 (en) * | 2008-10-04 | 2015-06-23 | Microsoft Technology Licensing, Llc | Image super-resolution using gradient profile prior |
US8630464B2 (en) * | 2009-06-15 | 2014-01-14 | Honeywell International Inc. | Adaptive iris matching using database indexing |
JP5506274B2 (en) * | 2009-07-31 | 2014-05-28 | 富士フイルム株式会社 | Image processing apparatus and method, data processing apparatus and method, and program |
US8687923B2 (en) | 2011-08-05 | 2014-04-01 | Adobe Systems Incorporated | Robust patch regression based on in-place self-similarity for image upscaling |
US9324133B2 (en) * | 2012-01-04 | 2016-04-26 | Sharp Laboratories Of America, Inc. | Image content enhancement using a dictionary technique |
-
2015
- 2015-08-14 US US14/827,030 patent/US9589323B1/en not_active Expired - Fee Related
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107194872A (en) * | 2017-05-02 | 2017-09-22 | 武汉大学 | Remote sensed image super-resolution reconstruction method based on perception of content deep learning network |
CN107194893A (en) * | 2017-05-22 | 2017-09-22 | 西安电子科技大学 | Depth image ultra-resolution method based on convolutional neural networks |
CN107239783A (en) * | 2017-06-13 | 2017-10-10 | 中国矿业大学(北京) | Coal-rock identification method based on extension local binary patterns and regression analysis |
CN110555800A (en) * | 2018-05-30 | 2019-12-10 | 北京三星通信技术研究有限公司 | image processing apparatus and method |
US11967045B2 (en) | 2018-05-30 | 2024-04-23 | Samsung Electronics Co., Ltd | Image processing device and method |
US20210342496A1 (en) * | 2018-11-26 | 2021-11-04 | Hewlett-Packard Development Company, L.P. | Geometry-aware interactive design |
CN109712099A (en) * | 2018-12-04 | 2019-05-03 | 山东大学 | Method is equalized based on the sonar image of SLIC and adaptive-filtering |
CN109636727A (en) * | 2018-12-17 | 2019-04-16 | 辽宁工程技术大学 | A kind of super-resolution rebuilding image spatial resolution evaluation method |
CN112150360A (en) * | 2020-09-16 | 2020-12-29 | 北京工业大学 | IVUS image super-resolution reconstruction method based on dense residual error network |
CN115880157A (en) * | 2023-01-06 | 2023-03-31 | 中国海洋大学 | Stereo image super-resolution reconstruction method based on K space pyramid feature fusion |
Also Published As
Publication number | Publication date |
---|---|
US9589323B1 (en) | 2017-03-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9589323B1 (en) | Super resolution image enhancement technique | |
CN111047516B (en) | Image processing method, image processing device, computer equipment and storage medium | |
US10339643B2 (en) | Algorithm and device for image processing | |
Kim et al. | Joint patch clustering-based dictionary learning for multimodal image fusion | |
Fan et al. | Homomorphic filtering based illumination normalization method for face recognition | |
Ren et al. | Single image super-resolution using local geometric duality and non-local similarity | |
US20160027148A1 (en) | Image enhancement using a patch based technique | |
WO2017080196A1 (en) | Video classification method and device based on human face image | |
Huang et al. | Selective wavelet attention learning for single image deraining | |
US8463050B2 (en) | Method for measuring the dissimilarity between a first and a second images and a first and second video sequences | |
US9449395B2 (en) | Methods and systems for image matting and foreground estimation based on hierarchical graphs | |
Fang et al. | Rapid image completion system using multiresolution patch-based directional and nondirectional approaches | |
CN103049897A (en) | Adaptive training library-based block domain face super-resolution reconstruction method | |
Zhao et al. | Image super-resolution via adaptive sparse representation | |
Zhang et al. | Self-supervised low light image enhancement and denoising | |
US20160241884A1 (en) | Selective perceptual masking via scale separation in the spatial and temporal domains for use in data compression with motion compensation | |
Zong et al. | Key frame extraction based on dynamic color histogram and fast wavelet histogram | |
Andrushia et al. | An efficient visual saliency detection model based on Ripplet transform | |
Anwar et al. | Combined internal and external category-specific image denoising. | |
US8897378B2 (en) | Selective perceptual masking via scale separation in the spatial and temporal domains using intrinsic images for use in data compression | |
Banerjee et al. | Bacterial foraging-fuzzy synergism based image Dehazing | |
Georgiadis et al. | Texture representations for image and video synthesis | |
Suryanarayana et al. | Deep Learned Singular Residual Network for Super Resolution Reconstruction. | |
Yan et al. | Wavelet decomposition applied to image fusion | |
Yuan et al. | A generic video coding framework based on anisotropic diffusion and spatio-temporal completion |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: SHARP LABORATORIES OF AMERICA, INC., WASHINGTON Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHOUDHURY, ANUSTUP KUMAR;CHEN, XU;VAN BEEK, PETRUS J.L.;SIGNING DATES FROM 20150813 TO 20150814;REEL/FRAME:036332/0354 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
AS | Assignment |
Owner name: SHARP KABUSHIKI KAISHA, JAPAN Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNOR:SHARP LABORATORIES OF AMERICA, INC.;REEL/FRAME:041667/0991 Effective date: 20170321 |
|
CC | Certificate of correction | ||
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20210307 |