CN114170084A - Image super-resolution processing method, device and equipment - Google Patents
Image super-resolution processing method, device and equipment Download PDFInfo
- Publication number
- CN114170084A CN114170084A CN202111485136.2A CN202111485136A CN114170084A CN 114170084 A CN114170084 A CN 114170084A CN 202111485136 A CN202111485136 A CN 202111485136A CN 114170084 A CN114170084 A CN 114170084A
- Authority
- CN
- China
- Prior art keywords
- network model
- light field
- sisr
- field image
- super
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformation in the plane of the image
- G06T3/40—Scaling the whole image or part thereof
- G06T3/4053—Super resolution, i.e. output image resolution higher than sensor resolution
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Abstract
The invention discloses an image super-resolution processing method, device and equipment. The method comprises the following steps: acquiring an image to be processed; the super-resolution processing is carried out on the image to be processed through a target neural network model, the target neural network model is obtained through a target image sample set iterative training SISR network model, the target image sample comprises a light field image sample containing four-dimensional information, and the technical scheme of the invention not only solves the problem that the training data is a single picture data set, so that the high-resolution and low-resolution mapping of only one visual angle in one scene can be learned, and the limitation is large. The problem that input is limited to multiple pictures and is not suitable for super resolution of a single picture when pictures with different visual angles in the same scene are used for super resolution is solved, and super resolution capability of a network can be enhanced.
Description
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a method, a device and equipment for processing image super-resolution.
Background
With the development of the field of image-based computer vision, the spatial resolution of images becomes a fundamental and important research field. The mainstream image super-resolution method at present is divided into a traditional non-learning method and a learning method. Generally speaking, the learning method learns a complex mapping function from a low dimension to a high dimension, and the super-resolution effect is superior to that of the traditional method in terms of visual effect or parameter. Meanwhile, in the currently mainstream super-resolution learning method for a Single picture, Single image super-resolution (SISR) aims to predict and recover high-resolution information from a low-resolution picture. SISR has been studied for decades as a fundamental field of computer vision research, and the results achieved have also been widely applied in many computer vision applications, such as medical imaging, satellite imaging, and security.
Recently, the research of deep learning method brings a new exploration idea for SISR. Various methods for SISR based on deep learning are proposed to improve the effect of super resolution. However, the learning method only learns the high-low resolution mapping of a next view angle in a scene due to the limitation of the training data, and the limitation is large, so that the performance is limited to be further improved. In order to make full use of training data, some work utilizes the angle information of the fused input pictures (usually, angle information is generated only when a plurality of pictures are input), so that a better spatial super-resolution effect is obtained. However, these efforts rely on multi-picture input and do not raise the problem of single-picture input, i.e., SISR.
Disclosure of Invention
The embodiment of the invention provides an image super-resolution processing method, device and equipment, which can learn better super-resolution capability by a network through a mapping relation of four-dimensional information, and meanwhile, the network architecture is also suitable for solving super-resolution of a two-dimensional picture, namely, the super-resolution capability of a SISR network is enhanced based on multi-view picture training. And the method is suitable for all the SISR networks based on learning at present, and has high network portability.
In a first aspect, an embodiment of the present invention provides an image super-resolution processing method, including:
acquiring an image to be processed;
performing super-resolution processing on the image to be processed through a target neural network model, wherein the target neural network model is obtained through iterative training of a target image sample set SISR network model, and the target image sample comprises a light field image sample containing four-dimensional information.
Further, iteratively training the SISR network model through the target image sample set includes:
establishing a SISR network model;
determining K sub-view samples according to the light field image samples, wherein K is a positive integer greater than 1 or equal to 1;
inputting the K sub-view samples into the SISR network model respectively to obtain K first predicted images;
determining a predicted light field image according to the K first predicted images;
training parameters of the SISR network model according to an objective function formed by the predicted light field image and the light field image sample, wherein the objective function comprises: a content and loss function;
and returning to execute the operation of inputting the K sub-view samples into the SISR network model to obtain a predicted image until a target neural network model is obtained.
Further, the content and loss function is:
wherein the content of the first and second substances,a is the proportion of the loss of the structural similarity loss function SSIM,for the first light-field image sample,to predict light-field images,/kIs light ofThe kth sub-view corresponding to a field image sample,is an upsampling matrix.
Further, the objective function further includes: at least one of a variogram loss function and a disparity map loss function.
Further, training the parameters of the SISR network model according to the objective function formed by the predicted light field image and the light field image samples comprises:
acquiring a first super pixel corresponding to the light field image sample and a sub-pixel in the first super pixel;
acquiring a second super pixel corresponding to the predicted light field image and a sub-pixel in the second super pixel;
training parameters of the SISR network model according to a variogram loss function formed by the first super pixel, the sub-pixels in the first super pixel, the second super pixel and the sub-pixels in the second super pixel;
wherein the variogram loss function is:
wherein the content of the first and second substances,in order to predict the light-field image variogram,is a light field image sample variogram.
Further, training the parameters of the SISR network model according to the objective function formed by the predicted light field image and the light field image samples comprises:
acquiring a disparity map corresponding to the predicted light field image;
and training parameters of the SISR network model according to a disparity map loss function formed by the disparity map corresponding to the predicted light field image and the disparity map corresponding to the light field image sample.
Further, the target image sample further includes: a two-dimensional image sample.
In a second aspect, an embodiment of the present invention further provides an image super-resolution processing apparatus, including:
the acquisition module is used for acquiring an image to be processed;
and the processing module is used for performing super-resolution processing on the image to be processed through a target neural network model, the target neural network model is obtained by iteratively training a SISR network model through a target image sample set, and the target image sample comprises a light field image sample containing four-dimensional information.
In a third aspect, an embodiment of the present invention further provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the image super-resolution processing method according to any one of the embodiments of the present invention when executing the program.
According to the method and the device, the image to be processed is obtained, the super-resolution processing is carried out on the image to be processed through the target neural network model, the target neural network model is obtained through iterative training of a target image sample set on a SISR network model, the target image sample comprises a light field image sample containing four-dimensional information, and the problems that due to the fact that the training data is a single picture data set, only high-resolution and low-resolution mapping of a next view angle of one scene can be learned, and limitation is large are solved. And the problem that the super-resolution of a single picture is not suitable because the input is limited to a plurality of pictures when pictures with different visual angles in the same scene are used for super-resolution is solved. The network learns the mapping relation of the four-dimensional information, the super-resolution capability of the network is enhanced, and meanwhile, the network architecture is also suitable for solving the super-resolution of the two-dimensional image, namely the super-resolution capability of the SISR network is enhanced based on multi-view image training. And the method is suitable for all the SISR networks based on learning at present, and has high network portability.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a flowchart of a super-resolution processing method in an embodiment of the present invention;
FIG. 1a is a schematic view of a microarray light field camera configuration in an embodiment of the present invention;
FIG. 1b is a schematic diagram of an enlarged view of a four-dimensional light field in an embodiment of the present invention;
FIG. 1c is a diagram of a framework in an embodiment of the invention;
FIG. 1d is a diagram showing the result of image super-resolution processing in the embodiment of the present invention;
FIG. 2 is a schematic structural diagram of an image super-resolution processing apparatus in an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device in an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures. In addition, the embodiments and features of the embodiments in the present invention may be combined with each other without conflict.
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like. In addition, the embodiments and features of the embodiments in the present invention may be combined with each other without conflict.
The term "include" and variations thereof as used herein are intended to be open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment".
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
Fig. 1 is a flowchart of an image super-resolution processing method provided by an embodiment of the present invention, where the present embodiment is applicable to the case of image super-resolution processing, and the method can be executed by an image super-resolution processing apparatus in an embodiment of the present invention, and the apparatus can be implemented by software and/or hardware, as shown in fig. 1, the method specifically includes the following steps:
and S110, acquiring an image to be processed.
The image to be processed may be a light field image or a two-dimensional image, which is not limited in this embodiment of the present invention.
And S120, performing super-resolution processing on the image to be processed through a target neural network model, wherein the target neural network model is obtained through iterative training of a target image sample set SISR network model, and the target image sample comprises a light field image sample containing four-dimensional information.
Wherein the target image sample set comprises: a light-field image sample containing four-dimensional information. The light field image is used as a special multi-view picture and has four-dimensional information of angle and space combination, the light field image sample is acquired through a light field camera, the light field camera is different from a common monocular camera, the structure of the light field camera is shown in figure 1a, the image on a sensor is converted into sub-views of different view angles, and the information recorded by the light field camera can be converted into a set of multiple sub-views with slightly different view points through an algorithm. Each sub-view represents spatial information, while subtle changes in viewpoint represent angular information. These images from different viewpoints are referred to as sub-views of the light field image. These sub-views may be viewed as views with unknown disparity to each other. The missing information sparsely sampled in a certain sub-view may be captured by another sub-view or sub-views, and this information is called supplementary information. In the embodiment of the invention, the supplementary information is fully utilized in the training process, so that a better super-resolution effect can be obtained.
Wherein the target image sample comprises: a light-field image sample containing four-dimensional information, the target image sample further comprising: a two-dimensional image sample. The embodiments of the present invention are not limited in this regard.
Specifically, the training mode of the target neural network model may be: establishing a SISR network model; determining K sub-view samples according to the light field image samples, wherein K is a positive integer greater than 1 or equal to 1; inputting the K sub-view samples into the SISR network model respectively to obtain K first predicted images; determining a predicted light field image according to the K first predicted images; training parameters of the SISR network model according to an objective function formed by the predicted light field image and the light field image sample, wherein the objective function comprises: a content and loss function; and returning to execute the operation of inputting the K sub-view samples into the SISR network model to obtain a predicted image until a target neural network model is obtained. The training mode of the target neural network model can also be as follows: establishing a SISR network model; determining K sub-view samples from the light field image samples; inputting the K sub-view samples into the SISR network model respectively to obtain K first predicted images; determining a predicted light field image according to the K first predicted images; training the predicted light field image based on an objective function formed by the predicted light field image and the light field image samplesParameters of a SISR network model, wherein the objective function comprises: content and loss functionSum variance plot loss function(ii) a And returning to execute the operation of inputting the K sub-view samples into the SISR network model to obtain a predicted image until a target neural network model is obtained. The training mode of the target neural network model can also be as follows: establishing a SISR network model; determining K sub-view samples from the light field image samples; inputting the K sub-view samples into the SISR network model respectively to obtain K first predicted images; determining a predicted light field image according to the K first predicted images; training parameters of the SISR network model according to an objective function formed by the predicted light field image and the light field image sample, wherein the objective function comprises: content and loss functionAnd disparity map loss function(ii) a And returning to execute the operation of inputting the K sub-view samples into the SISR network model to obtain a predicted image until a target neural network model is obtained. The training mode of the target neural network model can also be as follows: establishing a SISR network model; determining K sub-view samples from the light field image samples; inputting the K sub-view samples into the SISR network model respectively to obtain K first predicted images; determining a predicted light field image according to the K first predicted images; training parameters of the SISR network model according to an objective function formed by the predicted light field image and the light field image sample, wherein the objective function comprises: content and loss functionVariance loss of the imageLoss functionAnd disparity map loss function(ii) a The final loss function is of the form:wherein, in the step (A),respectively, the proportion of the three loss functions. And returning to execute the operation of inputting the K sub-view samples into the SISR network model to obtain a predicted image until a target neural network model is obtained.
Optionally, the iteratively training the SISR network model through the target image sample set includes:
establishing a SISR network model;
determining K sub-view samples according to the light field image samples, wherein K is a positive integer greater than 1 or equal to 1;
inputting the K sub-view samples into the SISR network model respectively to obtain K first predicted images;
determining a predicted light field image according to the K first predicted images;
training parameters of the SISR network model according to an objective function formed by the predicted light field image and the light field image sample, wherein the objective function comprises: a content and loss function;
and returning to execute the operation of inputting the K sub-view samples into the SISR network model to obtain a predicted image until a target neural network model is obtained.
The SISR network model can also be a SISR super-resolution network based on other learning methods besides ESPCN, VDSR and RCAN.
Wherein, the light field image sample is light field data containing four-dimensional information (spatial dimension and angular dimension).
Specifically, the light field image sample is split into a plurality of sub-view samples under multiple viewing angles, and a plurality of first prediction images after super-resolution under the multiple viewing angles are obtained by respectively passing each sub-view sample through the same SISR network. These first predicted images are then combined into a four-dimensional light field image, i.e. a predicted light field image.
Optionally, the content and loss function is:
wherein the content of the first and second substances,a is the proportion of the loss of the structural similarity loss function SSIM,for the first light-field image sample,to predict light-field images,/kFor the kth sub-view corresponding to a light-field image sample,is an upsampling matrix.
Optionally, the objective function further includes: at least one of a variogram loss function and a disparity map loss function.
Specifically, the objective function may include: content and loss functionSum variance plot loss functionThe final loss function may be of the form:wherein, in the step (A),is a loss functionThe proportion of the active ingredients in the composition,is a loss functionThe ratio of the components; the objective function may include: content and loss functionAnd disparity map loss functionThe final loss function may be of the form:wherein, in the step (A),is a loss functionThe proportion of the active ingredients in the composition,is a loss functionThe ratio of the components; the objective function may further include: content and loss functionVariogram loss functionAnd disparity map loss functionThe final loss function may be of the form:. Wherein the content of the first and second substances,respectively, the proportion of the three loss functions.
Optionally, the training of parameters of the SISR network model according to the objective function formed by the predicted light field image and the light field image sample includes:
acquiring a first super pixel corresponding to the light field image sample and a sub-pixel in the first super pixel;
acquiring a second super pixel corresponding to the predicted light field image and a sub-pixel in the second super pixel;
training parameters of the SISR network model according to a variogram loss function formed by the first super pixel, the sub-pixels in the first super pixel, the second super pixel and the sub-pixels in the second super pixel;
wherein the variogram loss function is:
wherein the content of the first and second substances,in order to predict the light-field image variogram,is a light field image sample variogram.
Optionally, the training of parameters of the SISR network model according to the objective function formed by the predicted light field image and the light field image sample includes:
acquiring a disparity map corresponding to the predicted light field image;
and training parameters of the SISR network model according to a disparity map loss function formed by the disparity map corresponding to the predicted light field image and the disparity map corresponding to the light field image sample.
Wherein the disparity map loss function may be:
wherein the content of the first and second substances,for a disparity map corresponding to a predicted light field image,is a corresponding disparity map of the light-field image sample.
Optionally, the target image sample further includes: a two-dimensional image sample.
In a specific example, wherein for the conventional SISR method, the super-resolution model is:
wherein the content of the first and second substances,in order to be a low-resolution image,in order to obtain a super-resolution image,in order to up-sample the matrix, the sampling matrix,is noise. In most SISR methods, in order to find the best fitting function, the loss functionThe number is defined as
In the embodiment of the invention, in order to find a better oneAnd designing a new restriction item on the basis of the above. After each sub-image of the SISR, these super resolved sub-images are combined into a light field image. And the embodiment of the invention provides three loss functions to expect to achieve the expected effect.
The final loss function is of the form:
wherein the content of the first and second substances,respectively, the proportion of the three loss functions.
The content loss function is a function of the distribution of the super-resolution result and the real result similar while maintaining the structure of the four-dimensional optical field. The variogram loss function allows the object edge positions to be well preserved. The disparity map loss function further improves the super-resolution effect in the disparity field.
Content versus loss function: one of the most straightforward methods is to compute the l2 loss between the super-resolved and real reference light-field images. Unlike the l2 loss function in the general SISR method, the l2 loss function of the four-dimensional light field uses not only spatial two-dimensional information but also angular two-dimensional information, and the loss function is sensitive to the whole four-dimensional information, so that a better spatial super-resolution effect can be obtained. In addition, a structural similarity loss function SSIM is also used, so that the structure of the optical field is further improved. Therefore, the content and loss functions provided by the embodiments of the present invention are expressed as follows:
wherein the content of the first and second substances,a is the proportion of the loss of the structural similarity loss function SSIM,for the first light-field image sample,to predict light-field images,/kFor the kth sub-view corresponding to a light-field image sample,is an upsampling matrix.
Variogram loss function: microlens whole-column images are a special image format designed specifically for light-field images, which can represent a four-dimensional light field on a two-dimensional image. One important characteristic is the super-pixel and sub-pixels, as shown in fig. 1 b. In the embodiment of the invention, a super pixel comprises a plurality of sub-pixels, and the variance of the sub-pixels contained in the super pixel is obtained for each super pixel to form a variance map, wherein the length and the width of the variance map are respectively equal to the length and the width of a sub-view. Specifically, the variance of each superpixel can be expressed as:
wherein, among others,in order to predict the light-field image variogram,for the light-field image sample variance map, N is the number of sub-pixels in a super-pixel, and in the embodiment of the present invention, N = 81. The VM is a variance map, and the VM is a variance map,is the variance of the sub-pixels in a super-pixel,is the coordinates of the sub-pixel(s),is the coordinates of the super-pixel,is a light field image.
Disparity map loss function:
an accurate disparity map can be obtained from a light field image with a good structure through an algorithm, and conversely, the disparity map obtained from the light field image is more accurate, and the structure of the light field image can be better recovered. Therefore, in the embodiment of the invention, the disparity map of the recovered super-resolution light field is predicted through the deep neural network, and the disparity map is compared with the disparity map generated by the real light field, so that the difference between the two disparity maps is continuously reduced.
Finally, in view of the limited light field data and the large amount of two-dimensional picture data, the mixed data is used for training. Specifically, input light field image samples are input to a training network with a certain probability P, and a single two-dimensional image sample is input to be trained with a probability (1-P), so that overfitting can be prevented. In the present embodiment, P = 0.2.
In the prior art, for a single picture spatial super-resolution algorithm: various learning-based super-resolution algorithms have been proposed in the recent years. The algorithms can be mainly divided into two types, one type is a research direction for pursuing calculation efficiency, namely a network structure is simple, but a super-resolution mapping function cannot be well fitted by the network; on the contrary, the other type is to pursue super-resolution effect, and the network structure of the method is relatively complex, and the real-time property of super-resolution is difficult to guarantee. In any network, these learning-based SISR methods utilize a single picture training set (such as image-Net, BSD500, etc.) to search for a mapping relationship between two-dimensional pictures, but in reality, objects are three-dimensional, and a plenoptic function describing natural light reaches seven dimensions, so the SISR methods described above have no way to fundamentally obtain a better mapping fit network by using higher-dimensional information. For a multi-picture spatial super-resolution algorithm: the multi-view picture space super-distribution algorithm is a huge field, and can be roughly divided into video super-resolution and multi-view super-resolution in terms of a learning-based method. The video super-resolution is to explore the relation between a time domain and a space domain by means of supplementary information between adjacent frames of the video, and the training data of the multi-view super-resolution is objects shot at different angles at the same time, and the relation between the angle domain and the space domain is explored. In the field of subdivision of multi-view super-resolution, optical field super-resolution is a research hotspot direction. Due to the regular spatial and angular arrangement of the optical field, super-resolution using the optical field can be more easily achieved. Although a better super-resolution effect can be achieved for a scene by using a plurality of pictures, the algorithm for performing spatial super-resolution by using a plurality of pictures can only use a plurality of pictures for input in a test stage, and therefore, the algorithm does not have the super-resolution capability of a single picture. The embodiment of the invention aims to learn better super-resolution capability by enabling a network to learn the mapping relation of four-dimensional information. Meanwhile, the network architecture is suitable for solving the super-resolution of the two-dimensional image, namely the super-resolution capability of the SISR network is enhanced based on multi-view image training. The method is also suitable for all the SISR networks based on learning at present, namely, the network portability is high.
In one specific example, as shown in fig. 1c, the training process: the light field image sample is split into a plurality of sub-view samples under multiple viewing angles, and a plurality of first prediction images after super-resolution under the multiple viewing angles are obtained by respectively passing each sub-view sample through the same SISR network. These first predicted images are then combined into a four-dimensional light field image, i.e. a predicted light field image. And training parameters of the SISR network model according to a target function formed by a predicted light field image and the light field image sample until a target neural network model is obtained. And in the testing stage, performing super-resolution processing on the image to be processed through a target neural network model. As shown in fig. 1d, what is to be followed by LF in fig. 1d is a super-resolution processing result diagram.
The embodiment of the invention provides a single-picture spatial super-resolution algorithm using multiple pictures as training data. Firstly, SISR and Multi-picture spatial super-resolution (MISR) are combined, and even though SISR is adopted, the MISR method can be used for training, so that the effect is improved. Based on the prior of the information of the multiple pictures, the network framework of the SISR can be flexibly replaced, and the performance of all the existing SISR networks can be improved. Secondly, the light field image is used as the data of the MISR, the light field image provides abundant spatial information and angle information, and meanwhile, the information is arranged together regularly without additional calibration and correction.
According to the technical scheme, the image to be processed is obtained, super-resolution processing is carried out on the image to be processed through the target neural network model, the target neural network model is obtained through iterative training of a target image sample set on a SISR network model, the target image sample comprises a light field image sample containing four-dimensional information, and the problems that due to the fact that the training data is a single picture data set, high-resolution and low-resolution mapping of a next view angle of one scene can only be learned, and limitation is large are solved. And the problem that the super-resolution of a single picture is not suitable because the input is limited to a plurality of pictures when pictures with different visual angles in the same scene are used for super-resolution is solved. The network learns better super-resolution capability by learning the mapping relation of four-dimensional information, and meanwhile, the network architecture is also suitable for solving the super-resolution of two-dimensional pictures, namely the super-resolution capability of the SISR network is enhanced based on multi-view picture training. And the method is suitable for all the SISR networks based on learning at present, and has high network portability.
Fig. 2 is a schematic structural diagram of an image super-resolution processing apparatus according to an embodiment of the present invention. The present embodiment may be applied to the case of image super-resolution processing, and the apparatus may be implemented in software and/or hardware, and may be integrated in any device providing an image super-resolution processing function, as shown in fig. 2, where the apparatus specifically includes: an acquisition module 210 and a processing module 220.
The acquiring module 210 is configured to acquire an image to be processed;
the processing module 220 is configured to perform super-resolution processing on the image to be processed through a target neural network model, where the target neural network model is obtained by iteratively training a SISR network model through a target image sample set, and the target image sample includes a light field image sample containing four-dimensional information.
Optionally, the processing module is specifically configured to:
establishing a SISR network model;
determining K sub-view samples according to the light field image samples, wherein K is a positive integer greater than 1 or equal to 1;
inputting the K sub-view samples into the SISR network model respectively to obtain K first predicted images;
determining a predicted light field image according to the K first predicted images;
training parameters of the SISR network model according to an objective function formed by the predicted light field image and the light field image sample, wherein the objective function comprises: a content and loss function;
and returning to execute the operation of inputting the K sub-view samples into the SISR network model to obtain a predicted image until a target neural network model is obtained.
Optionally, the content and loss function is:
wherein the content of the first and second substances,a is the proportion of the loss of the structural similarity loss function SSIM,for the first light-field image sample,to predict light-field images,/kFor the kth sub-view corresponding to a light-field image sample,is an upsampling matrix.
Optionally, the objective function further includes: at least one of a variogram loss function and a disparity map loss function.
Optionally, the processing module is specifically configured to:
acquiring a first super pixel corresponding to the light field image sample and a sub-pixel in the first super pixel;
acquiring a second super pixel corresponding to the predicted light field image and a sub-pixel in the second super pixel;
training parameters of the SISR network model according to a variogram loss function formed by the first super pixel, the sub-pixels in the first super pixel, the second super pixel and the sub-pixels in the second super pixel;
wherein the variogram loss function is:
wherein the content of the first and second substances,in order to predict the light-field image variogram,is a light field image sample variogram.
Optionally, the processing module is specifically configured to:
acquiring a disparity map corresponding to the predicted light field image;
and training parameters of the SISR network model according to a disparity map loss function formed by the disparity map corresponding to the predicted light field image and the disparity map corresponding to the light field image sample.
Optionally, the target image sample further includes: a two-dimensional image sample.
The product can execute the method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
According to the technical scheme, the image to be processed is obtained, super-resolution processing is carried out on the image to be processed through the target neural network model, the target neural network model is obtained through iterative training of a target image sample set on a SISR network model, the target image sample comprises a light field image sample containing four-dimensional information, and the problems that due to the fact that the training data is a single picture data set, high-resolution and low-resolution mapping of a next view angle of one scene can only be learned, and limitation is large are solved. And the problem that the super-resolution of a single picture is not suitable because the input is limited to a plurality of pictures when pictures with different visual angles in the same scene are used for super-resolution is solved. The network learns better super-resolution capability by learning the mapping relation of four-dimensional information, and meanwhile, the network architecture is also suitable for solving the super-resolution of two-dimensional pictures, namely the super-resolution capability of the SISR network is enhanced based on multi-view picture training. And the method is suitable for all the SISR networks based on learning at present, and has high network portability.
Fig. 3 is a schematic structural diagram of an electronic device in an embodiment of the present invention. FIG. 3 illustrates a block diagram of an exemplary electronic device 12 suitable for use in implementing embodiments of the present invention. The electronic device 12 shown in fig. 3 is only an example and should not bring any limitations to the function and scope of use of the embodiments of the present invention.
As shown in FIG. 3, electronic device 12 is embodied in the form of a general purpose computing device. The components of electronic device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
The system Memory 28 may include computer system readable media in the form of volatile Memory, such as Random Access Memory (RAM) 30 and/or cache Memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 3, and commonly referred to as a "hard drive"). Although not shown in FIG. 3, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (a Compact disk-Read Only Memory (CD-ROM)), Digital Video disk (DVD-ROM), or other optical media may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. System memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
The processing unit 16 executes various functional applications and data processing by executing programs stored in the system memory 28, for example, implementing an image super-resolution processing method provided by an embodiment of the present invention:
acquiring an image to be processed;
performing super-resolution processing on the image to be processed through a target neural network model, wherein the target neural network model is obtained through iterative training of a target image sample set SISR network model, and the target image sample comprises a light field image sample containing four-dimensional information.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.
Claims (10)
1. An image super-resolution processing method is characterized by comprising the following steps:
acquiring an image to be processed;
performing super-resolution processing on the image to be processed through a target neural network model, wherein the target neural network model is obtained through iterative training of a target image sample set SISR network model, and the target image sample comprises a light field image sample containing four-dimensional information.
2. The method of claim 1, wherein iteratively training a SISR network model through a target image sample set comprises:
establishing a SISR network model;
determining K sub-view samples according to the light field image samples, wherein K is a positive integer greater than 1 or equal to 1;
inputting the K sub-view samples into the SISR network model respectively to obtain K first predicted images;
determining a predicted light field image according to the K first predicted images;
training parameters of the SISR network model according to an objective function formed by the predicted light field image and the light field image sample, wherein the objective function comprises: a content and loss function;
and returning to execute the operation of inputting the K sub-view samples into the SISR network model to obtain a predicted image until a target neural network model is obtained.
3. The method of claim 2, wherein the content and loss function is:
4. The method of claim 2, wherein the objective function further comprises: at least one of a variogram loss function and a disparity map loss function.
5. The method of claim 4, wherein training the parameters of the SISR network model according to the objective function formed by the predicted light field image and the light field image samples comprises:
acquiring a first super pixel corresponding to the light field image sample and a sub-pixel in the first super pixel;
acquiring a second super pixel corresponding to the predicted light field image and a sub-pixel in the second super pixel;
training parameters of the SISR network model according to a variogram loss function formed by the first super pixel, the sub-pixels in the first super pixel, the second super pixel and the sub-pixels in the second super pixel;
wherein the variogram loss function is:
6. The method of claim 4, wherein training the parameters of the SISR network model according to the objective function formed by the predicted light field image and the light field image samples comprises:
acquiring a disparity map corresponding to the predicted light field image;
and training parameters of the SISR network model according to a disparity map loss function formed by the disparity map corresponding to the predicted light field image and the disparity map corresponding to the light field image sample.
7. The method of claim 1, wherein the target image sample further comprises: a two-dimensional image sample.
8. An image super-resolution processing apparatus, comprising:
the acquisition module is used for acquiring an image to be processed;
and the processing module is used for performing super-resolution processing on the image to be processed through a target neural network model, the target neural network model is obtained by iteratively training a SISR network model through a target image sample set, and the target image sample comprises a light field image sample containing four-dimensional information.
9. The apparatus of claim 8, wherein the processing module is specifically configured to:
establishing a SISR network model;
determining K sub-view samples according to the light field image samples, wherein K is a positive integer greater than 1 or equal to 1;
inputting the K sub-view samples into the SISR network model respectively to obtain K first predicted images;
determining a predicted light field image according to the K first predicted images;
training parameters of the SISR network model according to an objective function formed by the predicted light field image and the light field image sample, wherein the objective function comprises: a content and loss function;
and returning to execute the operation of inputting the K sub-view samples into the SISR network model to obtain a predicted image until a target neural network model is obtained.
10. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the processors to implement the method of any of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111485136.2A CN114170084A (en) | 2021-12-07 | 2021-12-07 | Image super-resolution processing method, device and equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111485136.2A CN114170084A (en) | 2021-12-07 | 2021-12-07 | Image super-resolution processing method, device and equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114170084A true CN114170084A (en) | 2022-03-11 |
Family
ID=80483888
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111485136.2A Pending CN114170084A (en) | 2021-12-07 | 2021-12-07 | Image super-resolution processing method, device and equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114170084A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116782041A (en) * | 2023-05-29 | 2023-09-19 | 武汉工程大学 | Image quality improvement method and system based on liquid crystal microlens array |
-
2021
- 2021-12-07 CN CN202111485136.2A patent/CN114170084A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116782041A (en) * | 2023-05-29 | 2023-09-19 | 武汉工程大学 | Image quality improvement method and system based on liquid crystal microlens array |
CN116782041B (en) * | 2023-05-29 | 2024-01-30 | 武汉工程大学 | Image quality improvement method and system based on liquid crystal microlens array |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11694305B2 (en) | System and method for deep learning image super resolution | |
US10853916B2 (en) | Convolution deconvolution neural network method and system | |
US9824486B2 (en) | High resolution free-view interpolation of planar structure | |
WO2019011249A1 (en) | Method, apparatus, and device for determining pose of object in image, and storage medium | |
CN104599258B (en) | A kind of image split-joint method based on anisotropic character descriptor | |
US8619098B2 (en) | Methods and apparatuses for generating co-salient thumbnails for digital images | |
US20210027526A1 (en) | Lighting estimation | |
CN110637461B (en) | Compact optical flow handling in computer vision systems | |
Kang et al. | Two-view underwater 3D reconstruction for cameras with unknown poses under flat refractive interfaces | |
WO2021258579A1 (en) | Image splicing method and apparatus, computer device, and storage medium | |
An et al. | TR-MISR: Multiimage super-resolution based on feature fusion with transformers | |
Li et al. | Symmetrical feature propagation network for hyperspectral image super-resolution | |
CN108876716B (en) | Super-resolution reconstruction method and device | |
US20200034664A1 (en) | Network Architecture for Generating a Labeled Overhead Image | |
WO2021017589A1 (en) | Image fusion method based on gradient domain mapping | |
CN111861888A (en) | Image processing method, image processing device, electronic equipment and storage medium | |
CN114170084A (en) | Image super-resolution processing method, device and equipment | |
CN117115200A (en) | Hierarchical data organization for compact optical streaming | |
WO2023279920A1 (en) | Microscope-based super-resolution method and apparatus, device and medium | |
CN116912467A (en) | Image stitching method, device, equipment and storage medium | |
Schlosser et al. | Biologically inspired hexagonal deep learning for hexagonal image generation | |
CN115601820A (en) | Face fake image detection method, device, terminal and storage medium | |
Li et al. | Superresolution Image Reconstruction: Selective milestones and open problems | |
CN113537026A (en) | Primitive detection method, device, equipment and medium in building plan | |
CN115482285A (en) | Image alignment method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |