CN114170084A - Image super-resolution processing method, device and equipment - Google Patents

Image super-resolution processing method, device and equipment Download PDF

Info

Publication number
CN114170084A
CN114170084A CN202111485136.2A CN202111485136A CN114170084A CN 114170084 A CN114170084 A CN 114170084A CN 202111485136 A CN202111485136 A CN 202111485136A CN 114170084 A CN114170084 A CN 114170084A
Authority
CN
China
Prior art keywords
network model
light field
sisr
field image
super
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111485136.2A
Other languages
Chinese (zh)
Inventor
方璐
季梦奇
金鼎健
戴琼海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN202111485136.2A priority Critical patent/CN114170084A/en
Publication of CN114170084A publication Critical patent/CN114170084A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformation in the plane of the image
    • G06T3/40Scaling the whole image or part thereof
    • G06T3/4053Super resolution, i.e. output image resolution higher than sensor resolution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Abstract

The invention discloses an image super-resolution processing method, device and equipment. The method comprises the following steps: acquiring an image to be processed; the super-resolution processing is carried out on the image to be processed through a target neural network model, the target neural network model is obtained through a target image sample set iterative training SISR network model, the target image sample comprises a light field image sample containing four-dimensional information, and the technical scheme of the invention not only solves the problem that the training data is a single picture data set, so that the high-resolution and low-resolution mapping of only one visual angle in one scene can be learned, and the limitation is large. The problem that input is limited to multiple pictures and is not suitable for super resolution of a single picture when pictures with different visual angles in the same scene are used for super resolution is solved, and super resolution capability of a network can be enhanced.

Description

Image super-resolution processing method, device and equipment
Technical Field
The embodiment of the invention relates to the technical field of computers, in particular to a method, a device and equipment for processing image super-resolution.
Background
With the development of the field of image-based computer vision, the spatial resolution of images becomes a fundamental and important research field. The mainstream image super-resolution method at present is divided into a traditional non-learning method and a learning method. Generally speaking, the learning method learns a complex mapping function from a low dimension to a high dimension, and the super-resolution effect is superior to that of the traditional method in terms of visual effect or parameter. Meanwhile, in the currently mainstream super-resolution learning method for a Single picture, Single image super-resolution (SISR) aims to predict and recover high-resolution information from a low-resolution picture. SISR has been studied for decades as a fundamental field of computer vision research, and the results achieved have also been widely applied in many computer vision applications, such as medical imaging, satellite imaging, and security.
Recently, the research of deep learning method brings a new exploration idea for SISR. Various methods for SISR based on deep learning are proposed to improve the effect of super resolution. However, the learning method only learns the high-low resolution mapping of a next view angle in a scene due to the limitation of the training data, and the limitation is large, so that the performance is limited to be further improved. In order to make full use of training data, some work utilizes the angle information of the fused input pictures (usually, angle information is generated only when a plurality of pictures are input), so that a better spatial super-resolution effect is obtained. However, these efforts rely on multi-picture input and do not raise the problem of single-picture input, i.e., SISR.
Disclosure of Invention
The embodiment of the invention provides an image super-resolution processing method, device and equipment, which can learn better super-resolution capability by a network through a mapping relation of four-dimensional information, and meanwhile, the network architecture is also suitable for solving super-resolution of a two-dimensional picture, namely, the super-resolution capability of a SISR network is enhanced based on multi-view picture training. And the method is suitable for all the SISR networks based on learning at present, and has high network portability.
In a first aspect, an embodiment of the present invention provides an image super-resolution processing method, including:
acquiring an image to be processed;
performing super-resolution processing on the image to be processed through a target neural network model, wherein the target neural network model is obtained through iterative training of a target image sample set SISR network model, and the target image sample comprises a light field image sample containing four-dimensional information.
Further, iteratively training the SISR network model through the target image sample set includes:
establishing a SISR network model;
determining K sub-view samples according to the light field image samples, wherein K is a positive integer greater than 1 or equal to 1;
inputting the K sub-view samples into the SISR network model respectively to obtain K first predicted images;
determining a predicted light field image according to the K first predicted images;
training parameters of the SISR network model according to an objective function formed by the predicted light field image and the light field image sample, wherein the objective function comprises: a content and loss function;
and returning to execute the operation of inputting the K sub-view samples into the SISR network model to obtain a predicted image until a target neural network model is obtained.
Further, the content and loss function is:
Figure 321530DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 582878DEST_PATH_IMAGE002
a is the proportion of the loss of the structural similarity loss function SSIM,
Figure 97036DEST_PATH_IMAGE003
for the first light-field image sample,
Figure 961087DEST_PATH_IMAGE004
to predict light-field images,/kIs light ofThe kth sub-view corresponding to a field image sample,
Figure 634513DEST_PATH_IMAGE005
is an upsampling matrix.
Further, the objective function further includes: at least one of a variogram loss function and a disparity map loss function.
Further, training the parameters of the SISR network model according to the objective function formed by the predicted light field image and the light field image samples comprises:
acquiring a first super pixel corresponding to the light field image sample and a sub-pixel in the first super pixel;
acquiring a second super pixel corresponding to the predicted light field image and a sub-pixel in the second super pixel;
training parameters of the SISR network model according to a variogram loss function formed by the first super pixel, the sub-pixels in the first super pixel, the second super pixel and the sub-pixels in the second super pixel;
wherein the variogram loss function is:
Figure 191397DEST_PATH_IMAGE006
wherein the content of the first and second substances,
Figure 192851DEST_PATH_IMAGE007
in order to predict the light-field image variogram,
Figure 735959DEST_PATH_IMAGE008
is a light field image sample variogram.
Further, training the parameters of the SISR network model according to the objective function formed by the predicted light field image and the light field image samples comprises:
acquiring a disparity map corresponding to the predicted light field image;
and training parameters of the SISR network model according to a disparity map loss function formed by the disparity map corresponding to the predicted light field image and the disparity map corresponding to the light field image sample.
Further, the target image sample further includes: a two-dimensional image sample.
In a second aspect, an embodiment of the present invention further provides an image super-resolution processing apparatus, including:
the acquisition module is used for acquiring an image to be processed;
and the processing module is used for performing super-resolution processing on the image to be processed through a target neural network model, the target neural network model is obtained by iteratively training a SISR network model through a target image sample set, and the target image sample comprises a light field image sample containing four-dimensional information.
In a third aspect, an embodiment of the present invention further provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor implements the image super-resolution processing method according to any one of the embodiments of the present invention when executing the program.
According to the method and the device, the image to be processed is obtained, the super-resolution processing is carried out on the image to be processed through the target neural network model, the target neural network model is obtained through iterative training of a target image sample set on a SISR network model, the target image sample comprises a light field image sample containing four-dimensional information, and the problems that due to the fact that the training data is a single picture data set, only high-resolution and low-resolution mapping of a next view angle of one scene can be learned, and limitation is large are solved. And the problem that the super-resolution of a single picture is not suitable because the input is limited to a plurality of pictures when pictures with different visual angles in the same scene are used for super-resolution is solved. The network learns the mapping relation of the four-dimensional information, the super-resolution capability of the network is enhanced, and meanwhile, the network architecture is also suitable for solving the super-resolution of the two-dimensional image, namely the super-resolution capability of the SISR network is enhanced based on multi-view image training. And the method is suitable for all the SISR networks based on learning at present, and has high network portability.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.
FIG. 1 is a flowchart of a super-resolution processing method in an embodiment of the present invention;
FIG. 1a is a schematic view of a microarray light field camera configuration in an embodiment of the present invention;
FIG. 1b is a schematic diagram of an enlarged view of a four-dimensional light field in an embodiment of the present invention;
FIG. 1c is a diagram of a framework in an embodiment of the invention;
FIG. 1d is a diagram showing the result of image super-resolution processing in the embodiment of the present invention;
FIG. 2 is a schematic structural diagram of an image super-resolution processing apparatus in an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device in an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures. In addition, the embodiments and features of the embodiments in the present invention may be combined with each other without conflict.
Before discussing exemplary embodiments in more detail, it should be noted that some exemplary embodiments are described as processes or methods depicted as flowcharts. Although a flowchart may describe the operations (or steps) as a sequential process, many of the operations can be performed in parallel, concurrently or simultaneously. In addition, the order of the operations may be re-arranged. The process may be terminated when its operations are completed, but may have additional steps not included in the figure. The processes may correspond to methods, functions, procedures, subroutines, and the like. In addition, the embodiments and features of the embodiments in the present invention may be combined with each other without conflict.
The term "include" and variations thereof as used herein are intended to be open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment".
It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.
Fig. 1 is a flowchart of an image super-resolution processing method provided by an embodiment of the present invention, where the present embodiment is applicable to the case of image super-resolution processing, and the method can be executed by an image super-resolution processing apparatus in an embodiment of the present invention, and the apparatus can be implemented by software and/or hardware, as shown in fig. 1, the method specifically includes the following steps:
and S110, acquiring an image to be processed.
The image to be processed may be a light field image or a two-dimensional image, which is not limited in this embodiment of the present invention.
And S120, performing super-resolution processing on the image to be processed through a target neural network model, wherein the target neural network model is obtained through iterative training of a target image sample set SISR network model, and the target image sample comprises a light field image sample containing four-dimensional information.
Wherein the target image sample set comprises: a light-field image sample containing four-dimensional information. The light field image is used as a special multi-view picture and has four-dimensional information of angle and space combination, the light field image sample is acquired through a light field camera, the light field camera is different from a common monocular camera, the structure of the light field camera is shown in figure 1a, the image on a sensor is converted into sub-views of different view angles, and the information recorded by the light field camera can be converted into a set of multiple sub-views with slightly different view points through an algorithm. Each sub-view represents spatial information, while subtle changes in viewpoint represent angular information. These images from different viewpoints are referred to as sub-views of the light field image. These sub-views may be viewed as views with unknown disparity to each other. The missing information sparsely sampled in a certain sub-view may be captured by another sub-view or sub-views, and this information is called supplementary information. In the embodiment of the invention, the supplementary information is fully utilized in the training process, so that a better super-resolution effect can be obtained.
Wherein the target image sample comprises: a light-field image sample containing four-dimensional information, the target image sample further comprising: a two-dimensional image sample. The embodiments of the present invention are not limited in this regard.
Specifically, the training mode of the target neural network model may be: establishing a SISR network model; determining K sub-view samples according to the light field image samples, wherein K is a positive integer greater than 1 or equal to 1; inputting the K sub-view samples into the SISR network model respectively to obtain K first predicted images; determining a predicted light field image according to the K first predicted images; training parameters of the SISR network model according to an objective function formed by the predicted light field image and the light field image sample, wherein the objective function comprises: a content and loss function; and returning to execute the operation of inputting the K sub-view samples into the SISR network model to obtain a predicted image until a target neural network model is obtained. The training mode of the target neural network model can also be as follows: establishing a SISR network model; determining K sub-view samples from the light field image samples; inputting the K sub-view samples into the SISR network model respectively to obtain K first predicted images; determining a predicted light field image according to the K first predicted images; training the predicted light field image based on an objective function formed by the predicted light field image and the light field image samplesParameters of a SISR network model, wherein the objective function comprises: content and loss function
Figure 139258DEST_PATH_IMAGE009
Sum variance plot loss function
Figure 867043DEST_PATH_IMAGE010
(ii) a And returning to execute the operation of inputting the K sub-view samples into the SISR network model to obtain a predicted image until a target neural network model is obtained. The training mode of the target neural network model can also be as follows: establishing a SISR network model; determining K sub-view samples from the light field image samples; inputting the K sub-view samples into the SISR network model respectively to obtain K first predicted images; determining a predicted light field image according to the K first predicted images; training parameters of the SISR network model according to an objective function formed by the predicted light field image and the light field image sample, wherein the objective function comprises: content and loss function
Figure 355793DEST_PATH_IMAGE009
And disparity map loss function
Figure 686280DEST_PATH_IMAGE011
(ii) a And returning to execute the operation of inputting the K sub-view samples into the SISR network model to obtain a predicted image until a target neural network model is obtained. The training mode of the target neural network model can also be as follows: establishing a SISR network model; determining K sub-view samples from the light field image samples; inputting the K sub-view samples into the SISR network model respectively to obtain K first predicted images; determining a predicted light field image according to the K first predicted images; training parameters of the SISR network model according to an objective function formed by the predicted light field image and the light field image sample, wherein the objective function comprises: content and loss function
Figure 944086DEST_PATH_IMAGE009
Variance loss of the imageLoss function
Figure 718138DEST_PATH_IMAGE010
And disparity map loss function
Figure 428605DEST_PATH_IMAGE012
(ii) a The final loss function is of the form:
Figure 438149DEST_PATH_IMAGE013
wherein, in the step (A),
Figure 409517DEST_PATH_IMAGE014
respectively, the proportion of the three loss functions. And returning to execute the operation of inputting the K sub-view samples into the SISR network model to obtain a predicted image until a target neural network model is obtained.
Optionally, the iteratively training the SISR network model through the target image sample set includes:
establishing a SISR network model;
determining K sub-view samples according to the light field image samples, wherein K is a positive integer greater than 1 or equal to 1;
inputting the K sub-view samples into the SISR network model respectively to obtain K first predicted images;
determining a predicted light field image according to the K first predicted images;
training parameters of the SISR network model according to an objective function formed by the predicted light field image and the light field image sample, wherein the objective function comprises: a content and loss function;
and returning to execute the operation of inputting the K sub-view samples into the SISR network model to obtain a predicted image until a target neural network model is obtained.
The SISR network model can also be a SISR super-resolution network based on other learning methods besides ESPCN, VDSR and RCAN.
Wherein, the light field image sample is light field data containing four-dimensional information (spatial dimension and angular dimension).
Specifically, the light field image sample is split into a plurality of sub-view samples under multiple viewing angles, and a plurality of first prediction images after super-resolution under the multiple viewing angles are obtained by respectively passing each sub-view sample through the same SISR network. These first predicted images are then combined into a four-dimensional light field image, i.e. a predicted light field image.
Optionally, the content and loss function is:
Figure 479104DEST_PATH_IMAGE015
wherein the content of the first and second substances,
Figure 552233DEST_PATH_IMAGE002
a is the proportion of the loss of the structural similarity loss function SSIM,
Figure 365468DEST_PATH_IMAGE003
for the first light-field image sample,
Figure 456921DEST_PATH_IMAGE004
to predict light-field images,/kFor the kth sub-view corresponding to a light-field image sample,
Figure 697410DEST_PATH_IMAGE005
is an upsampling matrix.
Optionally, the objective function further includes: at least one of a variogram loss function and a disparity map loss function.
Specifically, the objective function may include: content and loss function
Figure 382469DEST_PATH_IMAGE009
Sum variance plot loss function
Figure 609182DEST_PATH_IMAGE016
The final loss function may be of the form:
Figure 430507DEST_PATH_IMAGE017
wherein, in the step (A),
Figure 966531DEST_PATH_IMAGE018
is a loss function
Figure 138886DEST_PATH_IMAGE019
The proportion of the active ingredients in the composition,
Figure 903711DEST_PATH_IMAGE020
is a loss function
Figure 110702DEST_PATH_IMAGE021
The ratio of the components; the objective function may include: content and loss function
Figure 692993DEST_PATH_IMAGE019
And disparity map loss function
Figure 477278DEST_PATH_IMAGE022
The final loss function may be of the form:
Figure 436007DEST_PATH_IMAGE023
wherein, in the step (A),
Figure 966345DEST_PATH_IMAGE018
is a loss function
Figure 860483DEST_PATH_IMAGE019
The proportion of the active ingredients in the composition,
Figure 7430DEST_PATH_IMAGE020
is a loss function
Figure 769850DEST_PATH_IMAGE022
The ratio of the components; the objective function may further include: content and loss function
Figure 544908DEST_PATH_IMAGE019
Variogram loss function
Figure 203422DEST_PATH_IMAGE021
And disparity map loss function
Figure 837666DEST_PATH_IMAGE022
The final loss function may be of the form:
Figure 747984DEST_PATH_IMAGE024
. Wherein the content of the first and second substances,
Figure 784074DEST_PATH_IMAGE025
respectively, the proportion of the three loss functions.
Optionally, the training of parameters of the SISR network model according to the objective function formed by the predicted light field image and the light field image sample includes:
acquiring a first super pixel corresponding to the light field image sample and a sub-pixel in the first super pixel;
acquiring a second super pixel corresponding to the predicted light field image and a sub-pixel in the second super pixel;
training parameters of the SISR network model according to a variogram loss function formed by the first super pixel, the sub-pixels in the first super pixel, the second super pixel and the sub-pixels in the second super pixel;
wherein the variogram loss function is:
Figure 879068DEST_PATH_IMAGE006
wherein the content of the first and second substances,
Figure 859663DEST_PATH_IMAGE007
in order to predict the light-field image variogram,
Figure 963885DEST_PATH_IMAGE008
is a light field image sample variogram.
Optionally, the training of parameters of the SISR network model according to the objective function formed by the predicted light field image and the light field image sample includes:
acquiring a disparity map corresponding to the predicted light field image;
and training parameters of the SISR network model according to a disparity map loss function formed by the disparity map corresponding to the predicted light field image and the disparity map corresponding to the light field image sample.
Wherein the disparity map loss function may be:
Figure 588901DEST_PATH_IMAGE026
wherein the content of the first and second substances,
Figure 741883DEST_PATH_IMAGE027
for a disparity map corresponding to a predicted light field image,
Figure 85139DEST_PATH_IMAGE028
is a corresponding disparity map of the light-field image sample.
Optionally, the target image sample further includes: a two-dimensional image sample.
In a specific example, wherein for the conventional SISR method, the super-resolution model is:
Figure 993052DEST_PATH_IMAGE029
wherein the content of the first and second substances,
Figure 862788DEST_PATH_IMAGE030
in order to be a low-resolution image,
Figure 565165DEST_PATH_IMAGE031
in order to obtain a super-resolution image,
Figure 130139DEST_PATH_IMAGE032
in order to up-sample the matrix, the sampling matrix,
Figure 451530DEST_PATH_IMAGE033
is noise. In most SISR methods, in order to find the best fitting function, the loss functionThe number is defined as
Figure 51138DEST_PATH_IMAGE034
In the embodiment of the invention, in order to find a better one
Figure 924416DEST_PATH_IMAGE032
And designing a new restriction item on the basis of the above. After each sub-image of the SISR, these super resolved sub-images are combined into a light field image. And the embodiment of the invention provides three loss functions to expect to achieve the expected effect.
1. Content and structure loss function
Figure 101320DEST_PATH_IMAGE002
2. Loss function of variance map
Figure 85456DEST_PATH_IMAGE021
3. Disparity map loss function
Figure 539571DEST_PATH_IMAGE022
The final loss function is of the form:
Figure 459117DEST_PATH_IMAGE024
wherein the content of the first and second substances,
Figure 264262DEST_PATH_IMAGE025
respectively, the proportion of the three loss functions.
The content loss function is a function of the distribution of the super-resolution result and the real result similar while maintaining the structure of the four-dimensional optical field. The variogram loss function allows the object edge positions to be well preserved. The disparity map loss function further improves the super-resolution effect in the disparity field.
Content versus loss function: one of the most straightforward methods is to compute the l2 loss between the super-resolved and real reference light-field images. Unlike the l2 loss function in the general SISR method, the l2 loss function of the four-dimensional light field uses not only spatial two-dimensional information but also angular two-dimensional information, and the loss function is sensitive to the whole four-dimensional information, so that a better spatial super-resolution effect can be obtained. In addition, a structural similarity loss function SSIM is also used, so that the structure of the optical field is further improved. Therefore, the content and loss functions provided by the embodiments of the present invention are expressed as follows:
Figure 52089DEST_PATH_IMAGE035
wherein the content of the first and second substances,
Figure 485345DEST_PATH_IMAGE002
a is the proportion of the loss of the structural similarity loss function SSIM,
Figure 700426DEST_PATH_IMAGE003
for the first light-field image sample,
Figure 727287DEST_PATH_IMAGE004
to predict light-field images,/kFor the kth sub-view corresponding to a light-field image sample,
Figure 928593DEST_PATH_IMAGE036
is an upsampling matrix.
Variogram loss function: microlens whole-column images are a special image format designed specifically for light-field images, which can represent a four-dimensional light field on a two-dimensional image. One important characteristic is the super-pixel and sub-pixels, as shown in fig. 1 b. In the embodiment of the invention, a super pixel comprises a plurality of sub-pixels, and the variance of the sub-pixels contained in the super pixel is obtained for each super pixel to form a variance map, wherein the length and the width of the variance map are respectively equal to the length and the width of a sub-view. Specifically, the variance of each superpixel can be expressed as:
Figure 357300DEST_PATH_IMAGE037
Figure 743282DEST_PATH_IMAGE038
wherein, among others,
Figure 523019DEST_PATH_IMAGE007
in order to predict the light-field image variogram,
Figure 511704DEST_PATH_IMAGE008
for the light-field image sample variance map, N is the number of sub-pixels in a super-pixel, and in the embodiment of the present invention, N = 81. The VM is a variance map, and the VM is a variance map,
Figure 794918DEST_PATH_IMAGE039
is the variance of the sub-pixels in a super-pixel,
Figure 617380DEST_PATH_IMAGE040
is the coordinates of the sub-pixel(s),
Figure 494200DEST_PATH_IMAGE041
is the coordinates of the super-pixel,
Figure 161942DEST_PATH_IMAGE042
is a light field image.
Disparity map loss function:
an accurate disparity map can be obtained from a light field image with a good structure through an algorithm, and conversely, the disparity map obtained from the light field image is more accurate, and the structure of the light field image can be better recovered. Therefore, in the embodiment of the invention, the disparity map of the recovered super-resolution light field is predicted through the deep neural network, and the disparity map is compared with the disparity map generated by the real light field, so that the difference between the two disparity maps is continuously reduced.
Figure 299662DEST_PATH_IMAGE026
Finally, in view of the limited light field data and the large amount of two-dimensional picture data, the mixed data is used for training. Specifically, input light field image samples are input to a training network with a certain probability P, and a single two-dimensional image sample is input to be trained with a probability (1-P), so that overfitting can be prevented. In the present embodiment, P = 0.2.
In the prior art, for a single picture spatial super-resolution algorithm: various learning-based super-resolution algorithms have been proposed in the recent years. The algorithms can be mainly divided into two types, one type is a research direction for pursuing calculation efficiency, namely a network structure is simple, but a super-resolution mapping function cannot be well fitted by the network; on the contrary, the other type is to pursue super-resolution effect, and the network structure of the method is relatively complex, and the real-time property of super-resolution is difficult to guarantee. In any network, these learning-based SISR methods utilize a single picture training set (such as image-Net, BSD500, etc.) to search for a mapping relationship between two-dimensional pictures, but in reality, objects are three-dimensional, and a plenoptic function describing natural light reaches seven dimensions, so the SISR methods described above have no way to fundamentally obtain a better mapping fit network by using higher-dimensional information. For a multi-picture spatial super-resolution algorithm: the multi-view picture space super-distribution algorithm is a huge field, and can be roughly divided into video super-resolution and multi-view super-resolution in terms of a learning-based method. The video super-resolution is to explore the relation between a time domain and a space domain by means of supplementary information between adjacent frames of the video, and the training data of the multi-view super-resolution is objects shot at different angles at the same time, and the relation between the angle domain and the space domain is explored. In the field of subdivision of multi-view super-resolution, optical field super-resolution is a research hotspot direction. Due to the regular spatial and angular arrangement of the optical field, super-resolution using the optical field can be more easily achieved. Although a better super-resolution effect can be achieved for a scene by using a plurality of pictures, the algorithm for performing spatial super-resolution by using a plurality of pictures can only use a plurality of pictures for input in a test stage, and therefore, the algorithm does not have the super-resolution capability of a single picture. The embodiment of the invention aims to learn better super-resolution capability by enabling a network to learn the mapping relation of four-dimensional information. Meanwhile, the network architecture is suitable for solving the super-resolution of the two-dimensional image, namely the super-resolution capability of the SISR network is enhanced based on multi-view image training. The method is also suitable for all the SISR networks based on learning at present, namely, the network portability is high.
In one specific example, as shown in fig. 1c, the training process: the light field image sample is split into a plurality of sub-view samples under multiple viewing angles, and a plurality of first prediction images after super-resolution under the multiple viewing angles are obtained by respectively passing each sub-view sample through the same SISR network. These first predicted images are then combined into a four-dimensional light field image, i.e. a predicted light field image. And training parameters of the SISR network model according to a target function formed by a predicted light field image and the light field image sample until a target neural network model is obtained. And in the testing stage, performing super-resolution processing on the image to be processed through a target neural network model. As shown in fig. 1d, what is to be followed by LF in fig. 1d is a super-resolution processing result diagram.
The embodiment of the invention provides a single-picture spatial super-resolution algorithm using multiple pictures as training data. Firstly, SISR and Multi-picture spatial super-resolution (MISR) are combined, and even though SISR is adopted, the MISR method can be used for training, so that the effect is improved. Based on the prior of the information of the multiple pictures, the network framework of the SISR can be flexibly replaced, and the performance of all the existing SISR networks can be improved. Secondly, the light field image is used as the data of the MISR, the light field image provides abundant spatial information and angle information, and meanwhile, the information is arranged together regularly without additional calibration and correction.
According to the technical scheme, the image to be processed is obtained, super-resolution processing is carried out on the image to be processed through the target neural network model, the target neural network model is obtained through iterative training of a target image sample set on a SISR network model, the target image sample comprises a light field image sample containing four-dimensional information, and the problems that due to the fact that the training data is a single picture data set, high-resolution and low-resolution mapping of a next view angle of one scene can only be learned, and limitation is large are solved. And the problem that the super-resolution of a single picture is not suitable because the input is limited to a plurality of pictures when pictures with different visual angles in the same scene are used for super-resolution is solved. The network learns better super-resolution capability by learning the mapping relation of four-dimensional information, and meanwhile, the network architecture is also suitable for solving the super-resolution of two-dimensional pictures, namely the super-resolution capability of the SISR network is enhanced based on multi-view picture training. And the method is suitable for all the SISR networks based on learning at present, and has high network portability.
Fig. 2 is a schematic structural diagram of an image super-resolution processing apparatus according to an embodiment of the present invention. The present embodiment may be applied to the case of image super-resolution processing, and the apparatus may be implemented in software and/or hardware, and may be integrated in any device providing an image super-resolution processing function, as shown in fig. 2, where the apparatus specifically includes: an acquisition module 210 and a processing module 220.
The acquiring module 210 is configured to acquire an image to be processed;
the processing module 220 is configured to perform super-resolution processing on the image to be processed through a target neural network model, where the target neural network model is obtained by iteratively training a SISR network model through a target image sample set, and the target image sample includes a light field image sample containing four-dimensional information.
Optionally, the processing module is specifically configured to:
establishing a SISR network model;
determining K sub-view samples according to the light field image samples, wherein K is a positive integer greater than 1 or equal to 1;
inputting the K sub-view samples into the SISR network model respectively to obtain K first predicted images;
determining a predicted light field image according to the K first predicted images;
training parameters of the SISR network model according to an objective function formed by the predicted light field image and the light field image sample, wherein the objective function comprises: a content and loss function;
and returning to execute the operation of inputting the K sub-view samples into the SISR network model to obtain a predicted image until a target neural network model is obtained.
Optionally, the content and loss function is:
Figure 417660DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 906410DEST_PATH_IMAGE002
a is the proportion of the loss of the structural similarity loss function SSIM,
Figure 112263DEST_PATH_IMAGE003
for the first light-field image sample,
Figure 245436DEST_PATH_IMAGE004
to predict light-field images,/kFor the kth sub-view corresponding to a light-field image sample,
Figure 409701DEST_PATH_IMAGE005
is an upsampling matrix.
Optionally, the objective function further includes: at least one of a variogram loss function and a disparity map loss function.
Optionally, the processing module is specifically configured to:
acquiring a first super pixel corresponding to the light field image sample and a sub-pixel in the first super pixel;
acquiring a second super pixel corresponding to the predicted light field image and a sub-pixel in the second super pixel;
training parameters of the SISR network model according to a variogram loss function formed by the first super pixel, the sub-pixels in the first super pixel, the second super pixel and the sub-pixels in the second super pixel;
wherein the variogram loss function is:
Figure 120168DEST_PATH_IMAGE006
wherein the content of the first and second substances,
Figure 254346DEST_PATH_IMAGE007
in order to predict the light-field image variogram,
Figure 632238DEST_PATH_IMAGE008
is a light field image sample variogram.
Optionally, the processing module is specifically configured to:
acquiring a disparity map corresponding to the predicted light field image;
and training parameters of the SISR network model according to a disparity map loss function formed by the disparity map corresponding to the predicted light field image and the disparity map corresponding to the light field image sample.
Optionally, the target image sample further includes: a two-dimensional image sample.
The product can execute the method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
According to the technical scheme, the image to be processed is obtained, super-resolution processing is carried out on the image to be processed through the target neural network model, the target neural network model is obtained through iterative training of a target image sample set on a SISR network model, the target image sample comprises a light field image sample containing four-dimensional information, and the problems that due to the fact that the training data is a single picture data set, high-resolution and low-resolution mapping of a next view angle of one scene can only be learned, and limitation is large are solved. And the problem that the super-resolution of a single picture is not suitable because the input is limited to a plurality of pictures when pictures with different visual angles in the same scene are used for super-resolution is solved. The network learns better super-resolution capability by learning the mapping relation of four-dimensional information, and meanwhile, the network architecture is also suitable for solving the super-resolution of two-dimensional pictures, namely the super-resolution capability of the SISR network is enhanced based on multi-view picture training. And the method is suitable for all the SISR networks based on learning at present, and has high network portability.
Fig. 3 is a schematic structural diagram of an electronic device in an embodiment of the present invention. FIG. 3 illustrates a block diagram of an exemplary electronic device 12 suitable for use in implementing embodiments of the present invention. The electronic device 12 shown in fig. 3 is only an example and should not bring any limitations to the function and scope of use of the embodiments of the present invention.
As shown in FIG. 3, electronic device 12 is embodied in the form of a general purpose computing device. The components of electronic device 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, and a bus 18 that couples various system components including the system memory 28 and the processing unit 16.
Bus 18 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, an Industry Standard Architecture (ISA) bus, a Micro Channel Architecture (MCA) bus, an enhanced ISA bus, a Video Electronics Standards Association (VESA) local bus, and a Peripheral Component Interconnect (PCI) bus.
Electronic device 12 typically includes a variety of computer system readable media. Such media may be any available media that is accessible by electronic device 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system Memory 28 may include computer system readable media in the form of volatile Memory, such as Random Access Memory (RAM) 30 and/or cache Memory 32. The electronic device 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 3, and commonly referred to as a "hard drive"). Although not shown in FIG. 3, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk (a Compact disk-Read Only Memory (CD-ROM)), Digital Video disk (DVD-ROM), or other optical media may be provided. In these cases, each drive may be connected to bus 18 by one or more data media interfaces. System memory 28 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored, for example, in system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. Program modules 42 generally carry out the functions and/or methodologies of the described embodiments of the invention.
Electronic device 12 may also communicate with one or more external devices 14 (e.g., keyboard, pointing device, display 24, etc.), with one or more devices that enable a user to interact with electronic device 12, and/or with any devices (e.g., network card, modem, etc.) that enable electronic device 12 to communicate with one or more other computing devices. Such communication may be through an input/output (I/O) interface 22. In the electronic device 12 of the present embodiment, the display 24 is not provided as a separate body but is embedded in the mirror surface, and when the display surface of the display 24 is not displayed, the display surface of the display 24 and the mirror surface are visually integrated. Also, the electronic device 12 may communicate with one or more networks (e.g., a Local Area Network (LAN), Wide Area Network (WAN), and/or a public Network such as the internet) via the Network adapter 20. As shown, the network adapter 20 communicates with other modules of the electronic device 12 via the bus 18. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 12, including but not limited to: microcode, device drivers, Redundant processing units, external disk drive Arrays, disk array (RAID) systems, tape drives, and data backup storage systems, to name a few.
The processing unit 16 executes various functional applications and data processing by executing programs stored in the system memory 28, for example, implementing an image super-resolution processing method provided by an embodiment of the present invention:
acquiring an image to be processed;
performing super-resolution processing on the image to be processed through a target neural network model, wherein the target neural network model is obtained through iterative training of a target image sample set SISR network model, and the target image sample comprises a light field image sample containing four-dimensional information.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (10)

1. An image super-resolution processing method is characterized by comprising the following steps:
acquiring an image to be processed;
performing super-resolution processing on the image to be processed through a target neural network model, wherein the target neural network model is obtained through iterative training of a target image sample set SISR network model, and the target image sample comprises a light field image sample containing four-dimensional information.
2. The method of claim 1, wherein iteratively training a SISR network model through a target image sample set comprises:
establishing a SISR network model;
determining K sub-view samples according to the light field image samples, wherein K is a positive integer greater than 1 or equal to 1;
inputting the K sub-view samples into the SISR network model respectively to obtain K first predicted images;
determining a predicted light field image according to the K first predicted images;
training parameters of the SISR network model according to an objective function formed by the predicted light field image and the light field image sample, wherein the objective function comprises: a content and loss function;
and returning to execute the operation of inputting the K sub-view samples into the SISR network model to obtain a predicted image until a target neural network model is obtained.
3. The method of claim 2, wherein the content and loss function is:
Figure DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 520657DEST_PATH_IMAGE002
a is the proportion of the loss of the structural similarity loss function SSIM,
Figure 108633DEST_PATH_IMAGE003
for the first light-field image sample,
Figure 656289DEST_PATH_IMAGE004
to predict light-field images,/kFor the kth sub-view corresponding to a light-field image sample,
Figure 888688DEST_PATH_IMAGE005
is an upsampling matrix.
4. The method of claim 2, wherein the objective function further comprises: at least one of a variogram loss function and a disparity map loss function.
5. The method of claim 4, wherein training the parameters of the SISR network model according to the objective function formed by the predicted light field image and the light field image samples comprises:
acquiring a first super pixel corresponding to the light field image sample and a sub-pixel in the first super pixel;
acquiring a second super pixel corresponding to the predicted light field image and a sub-pixel in the second super pixel;
training parameters of the SISR network model according to a variogram loss function formed by the first super pixel, the sub-pixels in the first super pixel, the second super pixel and the sub-pixels in the second super pixel;
wherein the variogram loss function is:
Figure 4542DEST_PATH_IMAGE006
wherein the content of the first and second substances,
Figure 689601DEST_PATH_IMAGE007
in order to predict the light-field image variogram,
Figure 306528DEST_PATH_IMAGE008
for light field imagesA sample variogram.
6. The method of claim 4, wherein training the parameters of the SISR network model according to the objective function formed by the predicted light field image and the light field image samples comprises:
acquiring a disparity map corresponding to the predicted light field image;
and training parameters of the SISR network model according to a disparity map loss function formed by the disparity map corresponding to the predicted light field image and the disparity map corresponding to the light field image sample.
7. The method of claim 1, wherein the target image sample further comprises: a two-dimensional image sample.
8. An image super-resolution processing apparatus, comprising:
the acquisition module is used for acquiring an image to be processed;
and the processing module is used for performing super-resolution processing on the image to be processed through a target neural network model, the target neural network model is obtained by iteratively training a SISR network model through a target image sample set, and the target image sample comprises a light field image sample containing four-dimensional information.
9. The apparatus of claim 8, wherein the processing module is specifically configured to:
establishing a SISR network model;
determining K sub-view samples according to the light field image samples, wherein K is a positive integer greater than 1 or equal to 1;
inputting the K sub-view samples into the SISR network model respectively to obtain K first predicted images;
determining a predicted light field image according to the K first predicted images;
training parameters of the SISR network model according to an objective function formed by the predicted light field image and the light field image sample, wherein the objective function comprises: a content and loss function;
and returning to execute the operation of inputting the K sub-view samples into the SISR network model to obtain a predicted image until a target neural network model is obtained.
10. An electronic device, comprising:
one or more processors;
a memory for storing one or more programs;
the one or more programs, when executed by the one or more processors, cause the processors to implement the method of any of claims 1-7.
CN202111485136.2A 2021-12-07 2021-12-07 Image super-resolution processing method, device and equipment Pending CN114170084A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111485136.2A CN114170084A (en) 2021-12-07 2021-12-07 Image super-resolution processing method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111485136.2A CN114170084A (en) 2021-12-07 2021-12-07 Image super-resolution processing method, device and equipment

Publications (1)

Publication Number Publication Date
CN114170084A true CN114170084A (en) 2022-03-11

Family

ID=80483888

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111485136.2A Pending CN114170084A (en) 2021-12-07 2021-12-07 Image super-resolution processing method, device and equipment

Country Status (1)

Country Link
CN (1) CN114170084A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116782041A (en) * 2023-05-29 2023-09-19 武汉工程大学 Image quality improvement method and system based on liquid crystal microlens array

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116782041A (en) * 2023-05-29 2023-09-19 武汉工程大学 Image quality improvement method and system based on liquid crystal microlens array
CN116782041B (en) * 2023-05-29 2024-01-30 武汉工程大学 Image quality improvement method and system based on liquid crystal microlens array

Similar Documents

Publication Publication Date Title
US11694305B2 (en) System and method for deep learning image super resolution
US10853916B2 (en) Convolution deconvolution neural network method and system
US9824486B2 (en) High resolution free-view interpolation of planar structure
WO2019011249A1 (en) Method, apparatus, and device for determining pose of object in image, and storage medium
CN104599258B (en) A kind of image split-joint method based on anisotropic character descriptor
US8619098B2 (en) Methods and apparatuses for generating co-salient thumbnails for digital images
US20210027526A1 (en) Lighting estimation
CN110637461B (en) Compact optical flow handling in computer vision systems
Kang et al. Two-view underwater 3D reconstruction for cameras with unknown poses under flat refractive interfaces
WO2021258579A1 (en) Image splicing method and apparatus, computer device, and storage medium
An et al. TR-MISR: Multiimage super-resolution based on feature fusion with transformers
Li et al. Symmetrical feature propagation network for hyperspectral image super-resolution
CN108876716B (en) Super-resolution reconstruction method and device
US20200034664A1 (en) Network Architecture for Generating a Labeled Overhead Image
WO2021017589A1 (en) Image fusion method based on gradient domain mapping
CN111861888A (en) Image processing method, image processing device, electronic equipment and storage medium
CN114170084A (en) Image super-resolution processing method, device and equipment
CN117115200A (en) Hierarchical data organization for compact optical streaming
WO2023279920A1 (en) Microscope-based super-resolution method and apparatus, device and medium
CN116912467A (en) Image stitching method, device, equipment and storage medium
Schlosser et al. Biologically inspired hexagonal deep learning for hexagonal image generation
CN115601820A (en) Face fake image detection method, device, terminal and storage medium
Li et al. Superresolution Image Reconstruction: Selective milestones and open problems
CN113537026A (en) Primitive detection method, device, equipment and medium in building plan
CN115482285A (en) Image alignment method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination