CN112634139B

CN112634139B - Optical field super-resolution imaging method, device and equipment

Info

Publication number: CN112634139B
Application number: CN202110211014.8A
Authority: CN
Inventors: 方璐; 王滨; 季梦奇; 袁肖赟; 王星; 林克章
Original assignee: Hangzhou Hikvision Digital Technology Co Ltd
Current assignee: Hangzhou Hikvision Digital Technology Co Ltd
Priority date: 2021-02-25
Filing date: 2021-02-25
Publication date: 2021-05-28
Anticipated expiration: 2041-02-25
Also published as: CN112634139A

Abstract

The application provides a light field super-resolution imaging method, a device and equipment, wherein the method comprises the following steps: and inputting the image to be processed to the trained target network model to obtain a super-resolution image. The target network model is obtained by training in the following way: inputting the original image sequence to an initial network model to obtain a target image sequence; acquiring content loss values of a target image in the target image sequence and a sample image in the sample image sequence, and structure loss values of the target image in the target image sequence and the sample image in the sample image sequence; determining whether the initial network model has converged according to the content loss value and the structure loss value; if yes, determining the initial network model as a trained target network model; if not, the initial network model is adjusted, the adjusted network model is determined to be the initial network model, and the original image sequence is input to the initial network model to obtain the target image sequence. By the technical scheme, the resolution of the image is effectively improved.

Description

Optical field super-resolution imaging method, device and equipment

Technical Field

The application relates to the field of image processing, in particular to a light field super-resolution imaging method, a light field super-resolution imaging device and light field super-resolution imaging equipment.

Background

The resolution of an image refers to the amount of information stored in the image, and is how many Pixels are in each Inch of the image, and the unit of the resolution is PPI (Pixels Per Inch), obviously, the larger the resolution is, the higher the definition of the image is, and the smaller the resolution is, the lower the definition of the image is. In practical applications, the resolution of the image, the wide size of the image and the high size of the image together determine the size and quality of the image.

With the development of super-resolution imaging technology, more and more application scenarios, such as automatic driving, medical imaging, satellite imaging, movie and television, AR (Augmented Reality)/VR (Virtual Reality), etc., require reconstruction of low-resolution images to obtain high-resolution images.

In order to reconstruct an image with low resolution into an image with high resolution, it is necessary to acquire an image with low resolution, and then process the image with low resolution by using an algorithm such as image interpolation to obtain an image with high resolution. However, by adopting algorithms such as image interpolation, the resolution improvement effect is limited, and the requirement of high resolution cannot be met, and especially in application scenes such as automatic driving and AR/VR, the resolution of the image cannot meet the requirement of a user.

Disclosure of Invention

The application provides a light field super-resolution imaging method, which comprises the following steps:

inputting an image to be processed to a trained target network model to obtain a super-resolution image corresponding to the image to be processed; the resolution of the super-resolution image is greater than that of the image to be processed;

wherein the target network model is obtained by training in the following way:

inputting an original image sequence to an initial network model to obtain a target image sequence, wherein the original image sequence comprises K frames of original images under different visual angles, the number of the target images in the target image sequence is the same as that of the original images in the original image sequence, and K is a positive integer greater than 1;

acquiring content loss values of a target image in the target image sequence and a sample image in the sample image sequence, and structure loss values of the target image in the target image sequence and the sample image in the sample image sequence; wherein the sample image sequence comprises K frame sample images at different view angles; determining whether the initial network model has converged in dependence on the content loss value and the structural loss value;

if yes, determining the initial network model as a trained target network model; if not, the initial network model is adjusted, the adjusted network model is determined to be the initial network model, and the original image sequence is input to the initial network model to obtain the target image sequence.

In a possible implementation, the determining whether the initial network model has converged according to the content loss value and the structure loss value includes:

determining a target loss value according to the content loss value and the structure loss value; wherein the target loss value is a loss value used to determine whether an initial network model has converged;

determining whether the initial network model has converged according to the target loss value.

In a possible implementation, the determining a target loss value according to the content loss value and the structure loss value includes: acquiring a variance loss value of a target image in the target image sequence and a sample image in the sample image sequence, and determining the target loss value according to the content loss value, the structure loss value and the variance loss value; or acquiring a parallax loss value of a target image in the target image sequence and a sample image in the sample image sequence, and determining the target loss value according to the content loss value, the structure loss value and the parallax loss value; or, determining the target loss value according to the content loss value, the structure loss value, the variance loss value and the parallax loss value.

In a possible implementation, the obtaining a content loss value of a target image in the target image sequence and a sample image in the sample image sequence includes:

determining a first light field matrix based on each frame of a target image in the sequence of target images;

determining a second light field matrix based on each frame of sample images in the sequence of sample images;

and determining the content loss value based on the difference value between the pixel value of each pixel point in the first light field matrix and the pixel value of the corresponding pixel point in the second light field matrix.

In a possible embodiment, the obtaining the structural loss value of the target image in the target image sequence and the sample image in the sample image sequence includes:

calculating the first light field matrix and the second light field matrix by adopting a configured structure similarity loss function to obtain a function value of the structure similarity loss function;

determining the structural loss value based on a function value of the structural similarity loss function.

In one possible embodiment, the obtaining the loss of variance value between the target image in the target image sequence and the sample image in the sample image sequence includes:

generating a target variance image based on each frame of target images in the sequence of target images; the pixel value of each super pixel point in the target variance image is determined based on the variance of the pixel values of the sub pixel points corresponding to the super pixel point in each frame of target image;

generating a sample variance image based on each frame of sample images in the sequence of sample images; the pixel value of each super pixel point in the sample variance image is determined based on the variance of the pixel values of the sub pixel points corresponding to the super pixel point in each frame of sample image;

and determining the variance loss value based on the difference between the pixel value of each super pixel point in the target variance image and the pixel value of the corresponding super pixel point in the sample variance image.

In a possible embodiment, the obtaining a disparity loss value between a target image in the target image sequence and a sample image in the sample image sequence includes:

generating a target parallax image based on each frame of target images in the target image sequence; the parallax value of each target pixel point in the target parallax image is determined based on the parallax information of the sub-pixel point corresponding to the target pixel point in each frame of target image;

generating a sample parallax image based on each frame sample image in the sample image sequence; the parallax value of each sample pixel point in the sample parallax image is determined based on the parallax information of the sub-pixel point corresponding to the sample pixel point in each frame of sample image;

and determining the parallax loss value based on the difference value between the parallax value of each target pixel point in the target parallax image and the parallax value of the corresponding sample pixel point in the sample parallax image.

In a possible implementation, before the inputting the original image sequence to the initial network model to obtain the target image sequence, the method further includes:

acquiring the sample image sequence of a target scene through a light field camera, and performing down-sampling processing on each frame of sample image in the sample image sequence to obtain an original image corresponding to the frame of sample image;

and combining the original images corresponding to each frame of sample image into the original image sequence.

The application provides a light field super-resolution imaging device, the device includes: the acquisition module is used for inputting the image to be processed to the trained target network model to obtain a super-resolution image corresponding to the image to be processed; the resolution of the super-resolution image is greater than that of the image to be processed;

the training module is used for training to obtain the target network model in the following modes:

The application provides a light field super-resolution imaging device, includes: a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor; wherein the processor is configured to execute the machine executable instructions to perform the steps of:

wherein the target network model is obtained by training in the following way:

According to the technical scheme, in the embodiment of the application, the initial network model can be trained based on K frame original images under different visual angles and K frame sample images under different visual angles to obtain a trained target network model, and images under multiple visual angles provide abundant spatial information and angle information, so that the super-resolution effect of the target network model can be improved, the super-resolution effect is remarkably improved, the processing performance of the target network model is improved, the target network model learns better super-resolution capability, and the super-resolution capability of the target network model is enhanced. The initial network model can be trained according to the content loss value and the structure loss value to obtain a trained target network model, and the content loss value and the structure loss value can better reflect the difference between the images, so that the processing performance of the target network model can be improved, and the super-resolution capability of the target network model is enhanced. In summary, because the processing performance of the target network model is better, after the image to be processed is processed based on the target network model, the super-resolution image can be obtained, and the resolution of the super-resolution image is greater than that of the image to be processed, i.e., the resolution of the super-resolution image is larger, the resolution of the image is effectively improved, the improvement effect of the resolution is good, the requirement of the high resolution can be met, and the resolution of the image can also meet the requirement of a user even in the scenes of automatic driving, AR/VR and the like.

Drawings

FIG. 1 is a schematic flow chart of a light field super-resolution imaging method according to an embodiment of the present application;

FIG. 2 is a schematic flow chart of a light field super-resolution imaging method according to an embodiment of the present application;

FIG. 3 is a schematic diagram of light field super-resolution imaging in one embodiment of the present application;

FIG. 4 is a schematic diagram of a light field camera acquiring a sequence of images in one embodiment of the present application;

FIG. 5 is a block diagram of a light field super-resolution imaging device according to an embodiment of the present application;

fig. 6 is a block diagram of a light field super-resolution imaging device in an embodiment of the present application.

Detailed Description

The terminology used in the embodiments of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein is meant to encompass any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in the embodiments of the present application to describe various information, the information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. Depending on the context, moreover, the word "if" as used may be interpreted as "at … …" or "when … …" or "in response to a determination".

With the development of super-resolution imaging technology, more and more application scenes need to reconstruct images with low resolution to obtain images with high resolution. With the development of the deep learning technology, the deep learning technology is applied to super-resolution imaging to form an implementation mode, and the super-resolution imaging is realized by adopting the deep learning technology, so that the low-resolution image can be better reconstructed, and the high-resolution image with better effect can be obtained.

SISR (Single Image Super Resolution) is a mode of realizing Super-Resolution imaging by adopting a deep learning technology, and high-Resolution information is predicted and recovered from a low-Resolution Image by training a SISR network model and using the SISR network model, so that the low-Resolution Image is reconstructed into a high-Resolution Image. In order to train the SISR network model, a monoscopic data set is created, which includes a large number of training images at the same viewing angle, and the SISR network model is trained by using these training images.

However, since the SISR network model is trained by using a large number of training images at the same viewing angle, the performance of the SISR network model is poor, the super-resolution effect cannot be improved, and the improvement of the performance is limited. For example, the training images are two-dimensional images, the SISR network model can only represent the mapping relationship between the two-dimensional images, but the real object is three-dimensional, and the SISR network model cannot represent the mapping relationship with higher dimensions.

In view of the above findings, the embodiment of the present application provides a light field super-resolution imaging method, which can train an SISR network model based on images at different viewing angles, and since images at multiple viewing angles provide relatively rich spatial information and angle information, the super-resolution effect of the SISR network model can be improved without increasing the computational complexity, so that the super-resolution effect is significantly improved, and the performance of the SISR network model can be improved. Images under multiple visual angles can be acquired through a light field camera, relatively rich spatial information and angle information are provided, the information is regularly arranged, extra calibration and correction are not needed, four-dimensional information (such as horizontal and vertical spatial information and horizontal and vertical angle information) can be reflected, and therefore the SISR network model can learn the mapping relation of the four-dimensional information and learn better super-resolution capability. The SISR network model can be all types of SISR networks and has high network portability.

The technical solutions of the embodiments of the present application are described below with reference to specific embodiments.

The embodiment of the application provides a light field super-resolution imaging method, which can be used for obtaining a target network model by training, and processing a low-resolution image by using the target network model after obtaining the target network model to obtain a high-resolution image. For example, the image to be processed may be input to a trained target network model, and the image to be processed is processed by the target network model to obtain a super-resolution image corresponding to the image to be processed, where a resolution of the super-resolution image may be greater than a resolution of the image to be processed.

Referring to fig. 1, a schematic flow chart of a light field super-resolution imaging method is shown, where the light field super-resolution imaging method is used to train a target network model and then implement light field super-resolution imaging based on the target network model. In the process of training the target network model, the target network model can be trained by the following steps:

step 101, inputting an original image sequence to an initial network model to obtain a target image sequence, where the original image sequence includes K frames of original images at different viewing angles, the number of target images in the target image sequence is the same as the number of original images in the original image sequence, and K is a positive integer greater than 1.

For example, before step 101, an original image sequence is acquired, where the original image sequence includes K frames of original images at different viewing angles, and the original images in the original image sequence are images requiring resolution enhancement. In order to acquire the original image sequence, the following steps can be adopted:

step S11, a sample image sequence of the target scene is acquired by the light field camera, where the sample image sequence may include K frames of sample images at different viewing angles, and the acquisition time of the K frames of sample images is the same.

For example, for a target scene, videos of the target scene, such as a video at view 1 (including multiple frame images at different capturing moments), a video at view 2, …, and a video at view k, may be captured by a light field camera, where images at different viewing angles captured at the same capturing moment are combined into one image sequence, and in this embodiment, the image sequence is referred to as a sample image sequence, and images in the sample image sequence are referred to as sample images. For example, a sample image of view 1 (located in a video of view 1) acquired at the acquisition time 1, a sample image of view 2, …, and a sample image of view k may be combined into one sample image sequence 1, a sample image of view 2, …, and a sample image of view k acquired at the acquisition time 2 may be combined into one sample image sequence 2, and so on, so that a plurality of sample image sequences may be obtained.

Obviously, in each sample image sequence, the sample image sequence includes K frame sample images at different views, i.e., sample image of view 1, sample image of view 2, …, sample image of view K.

Of course, the light field camera is only an example, and other types of cameras can be used to capture the sample image sequence of the target scene as long as the sample images from different viewing angles can be captured at the same capture time.

For example, when the light field camera images a target scene, sample images at different viewing angles at the same time may be acquired through a lens array of the light field camera, that is, multiple frame sample images are acquired, and the viewing angles of the different sample images are different, so that light field information of the target scene is represented in the sample images. The light field is a kind of image information covering multiple visual angles, and is a four-dimensional parametric representation, including the four-dimensional light radiation field of space information and angle information, covering the information of light in propagation. Compared with a common image, the sample image acquired by the light field camera can obtain more spatial information and angle information, for example, all object color perception information is obtained on a spatial dimension and is displayed in a form of multiple visual angles, namely the sample image of the multiple visual angles.

The spatial information can be horizontal spatial information and vertical spatial information, and the horizontal spatial information and the vertical spatial information can be embodied by the pixel value of each pixel point in the sample image. The angle information may be horizontal angle information and vertical angle information, and the viewing angle difference of different sample images is horizontal angle information (viewing angle difference in the horizontal direction) and vertical angle information (viewing angle difference in the vertical direction). Obviously, in the sample image sequence, the spatial information and the angular information are very regularly arranged together, and four-dimensional information can be reflected.

Step S12, performing downsampling on each frame of sample image in the sample image sequence to obtain original images corresponding to the frame of sample image, and combining the original images corresponding to each frame of sample image into an original image sequence, that is, the original image sequence may include K frames of original images at different viewing angles.

For example, for each frame of sample image in the sample image sequence, the sample image may be downsampled to obtain an original image. For example, a sample image of size M × N is sampled s times to obtain an original image of size (M/s) × (N/s), where s is the common divisor of M and N. In the down-sampling process, the sub-image in the s-s window in the sample image is changed into a pixel, the value of the pixel point can be the average value of all pixel values in the s-s window, and the original image can be obtained after the sub-image in all the s-s windows is processed. Of course, the above is only an example of the down-sampling process, and the present invention is not limited thereto.

Because the sample image sequence comprises K frames of sample images under different visual angles, K frames of original images under different visual angles are obtained after each frame of sample image is subjected to down-sampling processing, and the K frames of original images under different visual angles are combined into an original image sequence. For example, the sample image sequence includes a sample image of view 1, a sample image of view 2, …, and a sample image of view k, which are arranged in the order of view 1, view 2, …, and view k. And performing downsampling processing on the sample image of the view angle 1 to obtain an original image of the view angle 1, performing downsampling processing on the sample image of the view angle 2 to obtain an original image of the view angle 2, and performing downsampling processing on the sample image of the view angle k in the same way to obtain an original image of the view angle k. In summary, an original image sequence can be obtained, where the original image sequence includes K frames of original images at different viewing angles, for example, an original image at viewing angle 1, an original image at viewing angle 2, …, and an original image at viewing angle K, and these original images are arranged in the order of viewing angle 1, viewing angle 2, …, and viewing angle K.

Illustratively, before step 101, a preconfigured initial network model (the unfinished network model is referred to as an initial network model) needs to be obtained, and the initial network model is a network model for increasing the resolution of the image, and the initial network model is not limited as long as the resolution of the image can be increased.

Illustratively, in step 101, after obtaining the original image sequence and the initial network model, the original image sequence is input to the initial network model to obtain a target image sequence, where the number of target images in the target image sequence is the same as the number of original images in the original image sequence. For example, the original image sequence includes K frames of original images at different perspectives, and the target image sequence includes K frames of target images at different perspectives.

For example, an original image of a view angle 1 in an original image sequence may be first input to an initial network model, and the initial network model processes the original image of the view angle 1, which is not limited to this processing manner and is related to the structure and function of the initial network model, for example, if the initial network model is used to improve the image resolution, the processing manner is to improve the resolution of the original image of the view angle 1, and after the processing is completed, a target image of the view angle 1 may be obtained. Then, the original image of the view angle 2 in the original image sequence may be input to the initial network model, and the initial network model processes the original image of the view angle 2 to obtain the target image of the view angle 2. And by analogy, inputting the original image of the view angle k in the original image sequence to the initial network model, and processing the original image of the view angle k by the initial network model to obtain the target image of the view angle k.

In summary, a target image sequence can be obtained, where the target image sequence includes K frames of target images at different viewing angles, such as a target image at viewing angle 1, a target image at viewing angle 2, …, and a target image at viewing angle K, and the target images are arranged in order of viewing angle 1, viewing angle 2, …, and viewing angle K.

For example, in the training process, the initial network model may be trained through a plurality of original image sequences, and therefore, a training data set may be obtained first, where the training data set may include a plurality of original image sequences (each obtained by down-sampling a sample image sequence), and the initial network model may be trained using the original image sequences. Because the training process of each original image sequence is similar, in step 101, the training process of one original image sequence is taken as an example for description, and in practical application, a plurality of original image sequences need to be used to perform the training process, which is not repeated.

Step 102, obtaining a content loss value of a target image in the target image sequence and a sample image in the sample image sequence, and a structure loss value of the target image in the target image sequence and the sample image in the sample image sequence.

Illustratively, the content loss value is a loss value on the content between the target image and the sample image, and may be understood as a loss value between pixel values, such as a loss value between a pixel value of a pixel (x, y) in the target image and a pixel value of a pixel (x, y) in the sample image. The structural loss value is a structural loss value between the target image and the sample image, i.e. a loss value between image structures, which can be understood as a loss value between blocks of pixels, such as a loss value between a pixel value of a block of pixels P1 in the target image and a pixel value of a block of pixels P2 in the sample image. The pixel value of the pixel block P1 may be determined based on the pixel values of M pixels in the target image, such as an average value, and the pixel value of the pixel block P2 may be determined based on the pixel values of M pixels in the sample image. For example, the value of M is 4, M pixel points in the target image are (x 1, y 1), (x 1, y 2), (x 2, y 1), (x 2, y 2), and M pixel points in the sample image are (x 1, y 1), (x 1, y 2), (x 2, y 1), (x 2, y 2), that is, the target image corresponds to the pixel points in the sample image.

In one possible implementation, the content loss value may be obtained by:

step S21, a first light field matrix is determined based on each frame of target image in the sequence of target images.

For example, K frames of target images at different viewing angles may be included in the target image sequence, and the K frames of target images may be grouped into the first light field matrix according to the arrangement order of the K frames of target images.

For example, if K is 4, 4 frames of target images may be combined into a first light field matrix of 2 × 2, where the first row and the first column are target images at view angle 1, the first row and the second column are target images at view angle 2, the second row and the first column are target images at view angle 3, and the second row and the second column are target images at view angle 4.

For another example, if K is 9, 9 frames of target images may be combined into a first light field matrix of 3 × 3, the first row and the first column are target images of view angle 1, the first row and the second column are target images of view angle 2, the first row and the third column are target images of view angle 3, the second row and the first column are target images of view angle 4, and so on.

For another example, if K is 6, 6 frames of target images may be combined into a first light field matrix of 2 × 3, or if K is 6, 6 frames of target images may be combined into a first light field matrix of 3 × 2.

For another example, if K is 81, 81 target images may be combined into a first light field matrix of 9 × 9.

Illustratively, in the first light field matrix, the first row and the first column are a block, the size of the block is the same as the size of the target image (the size of all the target images is the same), and assuming that the size of the target image is 400 × 300, the first row and the first column are 400 × 300 pixel points, each pixel point is a pixel value of a corresponding position of the target image, and similarly, the first row and the second column are also 400 × 300 pixel points, and so on.

Step S22, a second light field matrix is determined based on each frame of sample images in the sequence of sample images.

For example, K frames of sample images at different viewing angles may be included in the sample image sequence, and the K frames of sample images may be grouped into the second light field matrix according to the arrangement order of the K frames of sample images.

For example, if K is 4, 4 frame sample images may be combined into a second light field matrix of 2 × 2, where the first row and the first column are sample images of view 1, the first row and the second column are sample images of view 2, the second row and the first column are sample images of view 3, and the second row and the second column are sample images of view 4.

For another example, if K is 9, 9 frames of sample images may be combined into a second light field matrix of 3 × 3, the first row and the first column are sample images of view angle 1, the first row and the second column are sample images of view angle 2, the first row and the third column are sample images of view angle 3, the second row and the first column are sample images of view angle 4, and so on.

Obviously, the arrangement order of the sample images in the second light field matrix is the same as the arrangement order of the target images in the first light field matrix, i.e. the positions of the images with the same viewing angle in the light field matrix are the same.

Illustratively, in the second light field matrix, the first row and the first column is a block, which is the same size as the sample image (all sample images are the same size), and so on.

Step S23, determining the content loss value based on a difference between a pixel value of each pixel in the first light field matrix and a pixel value of a corresponding pixel in the second light field matrix.

For example, a difference between a pixel value of a pixel a1 in the first light field matrix and a pixel value of a pixel B1 in the second light field matrix corresponding to the pixel a1 is determined, a difference between a pixel value of the pixel a2 in the first light field matrix and a pixel value of a pixel B2 in the second light field matrix corresponding to the pixel a2 is determined, and … is determined, and a difference between a pixel value of each pixel in the first light field matrix and a pixel value of a pixel in the second light field matrix corresponding to the pixel can be determined. Then, the content loss value may be determined based on all the difference values, for example, an average value of all the difference values may be determined, and the average value may be used as the content loss value.

Illustratively, the content loss value may be determined by the following formula:

in the above-mentioned formula,

representing a high resolution 4D light field, i.e. a second light field matrix,

representing a 4D light field of low resolution, i.e. a first light field matrix.

Meaning that each element in the first light field matrix is subtracted from the second light field matrix, then the square of the difference is determined, and then all the squared values are summed.

For example, a difference 1 between the pixel value of the pixel a1 and the pixel value of the pixel B1, and a difference 2 between the pixel value of the pixel a2 and the pixel value of the pixel B2 are determined, and …, and if 100 pixels exist in both the first light field matrix and the second light field matrix, a total of 100 differences are obtained. Then, the square of each difference is determined, such as the square of difference 1, the square of difference 2, …, i.e., a total of 100 differences are obtained. The squared values of all differences are then summed to obtain a content loss value.

In one possible embodiment, the structure loss value obtained by the following steps can be used:

step S31, a first light field matrix is determined based on each frame of target image in the sequence of target images.

Step S32, a second light field matrix is determined based on each frame of sample images in the sequence of sample images.

Step S31 and step S32 can refer to step S21 and step S22, which are not described herein again.

And step S33, performing an operation on the first light field matrix and the second light field matrix by using the configured structure similarity loss function to obtain a function value of the structure similarity loss function.

For example, a structure SIMilarity loss function may be configured in advance, and the structure SIMilarity loss function may be SSIM (Structural SIMilarity), and the type of the structure SIMilarity loss function is not limited. Based on the configured structure similarity loss function, after the first light field matrix and the second light field matrix are obtained, the first light field matrix and the second light field matrix can be substituted into the structure similarity loss function to obtain a function value of the structure similarity loss function. For example, the pixel value of each pixel point in the first light field matrix and the pixel value of each pixel point in the second light field matrix are substituted into the structure similarity loss function, so as to obtain the function value of the structure similarity loss function. How to use the structure similarity loss function for processing is not limited in this embodiment, and the structure similarity loss function is related to an expression of the structure similarity loss function, where the expression is related to a pixel value of each pixel point in the first light field matrix and a pixel value of each pixel point in the second light field matrix.

Illustratively, the structural similarity loss function is SSIM, which is used to determine structural similarity (also called structural similarity) between the first light field matrix and the second light field matrix, where the structural similarity is an index for measuring similarity between two images, and the structural similarity may range from-1 to 1, and when two images are identical, the value of the structural similarity is equal to 1. As an implementation of structural similarity, structural similarity is defined as an attribute reflecting the structure of objects in a scene independent of brightness, contrast, and the mean may be used as a measure of brightness, the standard deviation as a measure of contrast, and the covariance as a measure of structural similarity.

Illustratively, the function value of SSIM may be determined by the following formula:

，

，

representing a second light field matrix of the light field,

a first light field matrix is represented and,

the first light field matrix and the second light field matrix are operated by using the SSIM, and the operation result is the function value of the SSIM.

In step S34, a structural loss value is determined based on the function value of the structural similarity loss function.

For example, the function value of the structure-similarity loss function may be directly determined as the structure loss value, or the structure loss value may be obtained after the function value of the structure-similarity loss function is adjusted, and of course, the structure loss value may be determined in other manners, and the determination manner is not limited.

in the above-mentioned formula, the above formula,

，

function values representing structurally similar loss functions.

In summary, after the target image sequence and the sample image sequence are obtained, the content loss value and the structure loss value may be obtained based on the target image in the target image sequence and the sample image in the sample image sequence.

Step 103, determining whether the initial network model has converged according to the content loss value and the structure loss value.

If so, step 104 may be performed, and if not, step 105 may be performed.

For example, after an original image sequence is input to an initial network model to obtain a target image sequence corresponding to the original image sequence, and a content loss value and a structural loss value are obtained based on the target image sequence, it may be determined whether the initial network model has converged according to the content loss value and the structural loss value.

In one possible implementation, for step 103, a target loss value may be determined according to the content loss value and the structure loss value, and the target loss value may be a loss value used for determining whether the initial network model has converged. Then, it is determined whether the initial network model has converged according to the target loss value.

For example, whether the initial network model has converged may be determined according to a target loss value, for example, if the target loss value is not greater than a preset threshold (configured empirically, without limitation), the initial network model is determined to have converged, and if the target loss value is greater than the preset threshold, the initial network model is determined to have not converged.

For example, in each iteration process, the initial network model of the last iteration process is adjusted to obtain an adjusted initial network model, and the target loss value is determined based on the adjusted initial network model (see step 101-103), that is, one target loss value may be obtained in each iteration process.

And then determining a change amplitude curve of the plurality of target loss values, if the change amplitude curve determines that the change amplitude of the target loss values is stable and the target loss value of the last iteration process is not greater than the threshold, determining that the initial network model of the last iteration process is converged, otherwise, determining that the initial network model of the last iteration process is not converged, continuing the next iteration process to obtain the target loss value of the next iteration process, re-determining the change amplitude curve of the plurality of target loss values, and so on until the change amplitude of the target loss values is stable and the target loss value of the last iteration process is not greater than the threshold, and determining that the initial network model of the last iteration process is converged.

For example, assuming that the minimum value of the target loss value number of the change amplitude curve is 10, performing 10 iterations to obtain a target loss value 1-a target loss value 10, determining a change amplitude curve of the target loss value 1-the target loss value 10, if it is determined according to the change amplitude curve that the change amplitude of the target loss value is not stable, performing 11 th iterations to obtain a target loss value 11, determining a change amplitude curve of the target loss value 1-the target loss value 11, if it is determined according to the change amplitude curve that the change amplitude of the target loss value is not stable, performing 12 th iterations to obtain a target loss value 12, determining a change amplitude curve of the target loss value 1-the target loss value 12, and so on. Assuming that after the 15 th iteration process is performed, the change amplitude of the target loss value is determined to be stable based on the change amplitude curve of the target loss value 1-the target loss value 15, and the target loss value 15 is not greater than the threshold, it may be determined that the initial network model of the 15 th iteration process has converged.

The target loss value change amplitude is smoothly expressed: the target loss value of the continuous multiple iterations is not changed, or the change amplitude is small (may be smaller than a certain threshold), which is not limited.

In summary, based on the multiple target loss values in the multiple iteration processes, if it is determined that the variation amplitude of the target loss value is stable and the target loss value in the last iteration process is not greater than the threshold, it is determined that the initial network model in the last iteration process is converged, otherwise, it is determined that the initial network model is not converged.

In practical applications, it may also be determined whether the initial network model converges in other manners, which is not limited to this. For example, if the iteration number reaches a preset number threshold, it is determined that the initial network model has converged; for another example, if the iteration duration reaches the preset duration threshold, it is determined that the initial network model has converged.

In one possible embodiment, the target loss value may be determined as follows:

mode 1, a target loss value is determined from the content loss value and the structure loss value.

Illustratively, after obtaining the content loss value and the structure loss value, the method may be based onThe content loss value and the structure loss value determine a target loss value, which may be determined, for example, using the following formula:

. In the above-mentioned formula,

the target loss value is represented by a target loss value,

a value indicative of a loss of content is indicated,

the value of the structural loss is represented,

a weight coefficient representing a content loss value,

a weight coefficient representing a structure loss value.

And

the value of (a) can be configured empirically, and the sum of the two is 1.

And 2, acquiring a variance loss value of a target image in the target image sequence and a sample image in the sample image sequence, and determining the target loss value according to the content loss value, the structure loss value and the variance loss value.

Illustratively, the variance loss value is a loss value in the variance between the target image and the sample image, and may be understood as a loss value in the variance of the pixel values, such as the variance of the pixel values of the pixel points (x, y) in all the target images and the variance of the pixel values of the pixel points (x, y) in all the sample images.

Illustratively, content is being obtainedAfter the loss value, the structural loss value, and the variance loss value, a target loss value may be determined based on the content loss value, the structural loss value, and the variance loss value. For example, the target loss value may be determined using the following equation:

。

in the above-mentioned formula,

the target loss value is represented by a target loss value,

a value indicative of a loss of content is indicated,

the value of the structural loss is represented,

representing the variance loss value.

A weight coefficient representing a content loss value,

a weight coefficient representing a value of the structural loss,

a weight coefficient representing a loss of variance value. In an exemplary manner, the first and second electrodes are,

、

and

the values of (A) can be configured according to experience without limitation, and three weightsThe sum of the weight factors is 1.

And 3, acquiring the parallax loss value of the target image in the target image sequence and the sample image in the sample image sequence, and determining the target loss value according to the content loss value, the structure loss value and the parallax loss value.

Illustratively, the parallax loss value is a loss value on the parallax between the target image and the sample image, and may be understood as a loss value between the parallaxes of the pixel values, such as the parallax between the pixel values of the pixel points (x, y) in all the target images and the parallax between the pixel values of the pixel points (x, y) in all the sample images.

For example, after obtaining the content loss value, the structural loss value, and the parallax loss value, the target loss value may be determined according to the content loss value, the structural loss value, and the parallax loss value. For example, the target loss value may be determined using the following equation:

。

in the above-mentioned formula,

the target loss value is represented by a target loss value,

a value indicative of a loss of content is indicated,

the value of the structural loss is represented,

representing the parallax loss value.

A weight coefficient representing a content loss value,

a weight coefficient representing a value of the structural loss,

a weight coefficient representing the parallax loss value. In an exemplary manner, the first and second electrodes are,

、

and

the values of (a) can be configured according to experience without limitation, and the sum of the three weight coefficients is 1.

And 4, acquiring a variance loss value of a target image in the target image sequence and a sample image in the sample image sequence, acquiring a parallax loss value of the target image in the target image sequence and the sample image in the sample image sequence, and determining the target loss value according to the content loss value, the structure loss value, the variance loss value and the parallax loss value.

For example, after obtaining the content loss value, the structural loss value, the variance loss value, and the parallax loss value, a target loss value may be determined according to the content loss value, the structural loss value, the variance loss value, and the parallax loss value. For example, the target loss value may be determined using the following equation:

。

in the above-mentioned formula,

the target loss value is represented by a target loss value,

a value indicative of a loss of content is indicated,

the value of the structural loss is represented,

the value of the loss of variance is represented,

representing the parallax loss value.

A weight coefficient representing a content loss value,

a weight coefficient representing a value of the structural loss,

a weight coefficient representing a loss of variance value,

a weight coefficient representing the parallax loss value. For example, the values of the four weight coefficients may be configured empirically, and are not limited thereto, and the sum of the four weight coefficients is 1.

For example, for the mode 2 and the mode 4, the following steps may be adopted to obtain the variance loss value:

step S41, a target variance image is generated based on each frame of target image in the target image sequence.

Illustratively, the pixel value of each super pixel point in the target variance image is determined based on the variance of the pixel values of the sub-pixel points corresponding to the super pixel point in each frame of the target image.

For example, K frames of target images at different viewing angles may be included in the target image sequence, the K frames of target images have the same size, and a target variance image may be generated based on the K frames of target images, and the size of the target variance image is the same as that of the target image. In this embodiment, each pixel point in the target variance image may be referred to as a super pixel point, and each pixel point in the target variance image may be referred to as a sub pixel point.

For example, for a super pixel (x, y) in the target variance image, a variance of pixel values of sub-pixels (x, y) in all target images (i.e., K frames of target images) may be determined, and the pixel value of the super pixel (x, y) in the target variance image is determined based on the variance, that is, the pixel value of the super pixel (x, y) is the variance. In summary, the pixel value of each super pixel in the target variance image can be obtained, and the pixel values of all the super pixels can form the target variance image.

Step S42, a sample variance image is generated based on each frame of sample image in the sample image sequence.

Illustratively, the pixel value of each super pixel point in the sample variance image is determined based on the variance of the pixel values of the sub-pixel points corresponding to the super pixel point in each frame of the sample variance image.

For example, K frame sample images at different viewing angles may be included in the sample image sequence, the K frame sample images have the same size, and a sample variance image may be generated based on the K frame sample images, and the size of the sample variance image is the same as that of the sample image. In this embodiment, each pixel point in the sample variance image may be referred to as a super pixel point, and each pixel point in the sample variance image may be referred to as a sub pixel point.

For example, for a super pixel point (x, y) in the sample variance image, a variance of pixel values of sub-pixel points (x, y) in all sample images (i.e., K frame sample images) may be determined, and the pixel value of the super pixel point (x, y) in the sample variance image is determined based on the variance, that is, the pixel value of the super pixel point (x, y) is the variance. In summary, the pixel value of each super pixel in the sample variance image can be obtained, and the pixel values of all super pixels can form the sample variance image.

Step S43, determining a variance loss value based on a difference between a pixel value of each super pixel in the target variance image and a pixel value of a corresponding super pixel in the sample variance image.

For example, a difference between a pixel value of a super pixel point a1 in the target variance image and a pixel value of a super pixel point B1 in the sample variance image corresponding to the super pixel point a1 is determined (the difference herein refers to an absolute value of the difference, that is, a positive value), a difference between a pixel value of the super pixel point a2 in the target variance image and a pixel value of a super pixel point B2 in the sample variance image corresponding to the super pixel point a2 is determined, and …, and a difference between a pixel value of each super pixel point in the target variance image and a pixel value of a super pixel point in the sample variance image corresponding to the super pixel point can be determined. Then, the variance loss value is determined based on all the difference values, for example, an average value of all the difference values may be determined, and the average value is used as the variance loss value.

Illustratively, the variance loss value may be determined by the following equation:

in the above-mentioned formula, the above formula,

the target variance image may be represented as,

the sample variance image may be represented.

Meaning that the target variance image is subtracted from each element in the sample variance image, then the square of the difference is determined, and then all the squared values are summed.

For example, a difference 1 between a pixel value of the super pixel point a1 and a pixel value of the super pixel point B1, and differences 2 and … between a pixel value of the super pixel point a2 and a pixel value of the super pixel point B2 are determined, and 100 differences are obtained in total assuming that 100 super pixel points exist in both the target variance image and the sample variance image. The square of each difference is determined, such as the square of difference 1, the square of difference 2, …, which results in a total of 100 differences. And summing the square values of all the difference values to obtain a variance loss value.

For example, for the mode 3 and the mode 4, the following steps may be adopted to obtain the parallax loss value:

step S51 generates a target parallax image based on each frame of target images in the target image sequence.

Illustratively, the disparity value of each target pixel point in the target disparity image is determined based on the disparity information of the sub-pixel point corresponding to the target pixel point in each frame of the target image.

For example, K frames of target images at different viewing angles may be included in the target image sequence, the K frames of target images have the same size, and a target parallax image may be generated based on the K frames of target images, and the size of the target parallax image is the same as that of the target image. In this embodiment, each pixel in the target parallax image may be referred to as a target pixel, and each pixel in the target parallax image may be referred to as a sub-pixel.

For example, for a target pixel (x, y) in a target parallax image, parallax information between sub-pixels (x, y) in all target images (i.e., K frames of target images) may be determined, and a parallax value of the target pixel (x, y) in the target parallax image is determined based on the parallax information, that is, the parallax value of the target pixel (x, y) is the parallax information. In summary, the parallax value of each target pixel in the target parallax image can be obtained, and the parallax values of all the target pixels form the target parallax image.

Step S52 generates a sample parallax image based on each frame sample image in the sample image sequence.

For example, the disparity value of each sample pixel point in the sample disparity image is determined based on the disparity information of the sub-pixel point corresponding to the sample pixel point in each frame of sample image.

For example, K frame sample images at different viewing angles may be included in the sample image sequence, the K frame sample images have the same size, and a sample parallax image may be generated based on the K frame sample images, and the size of the sample parallax image is the same as that of the sample image. In this embodiment, each pixel point in the sample parallax image may be referred to as a sample pixel point, and each pixel point in the sample parallax image may be referred to as a sub-pixel point.

For example, for a sample pixel (x, y) in a sample parallax image, parallax information between sub-pixels (x, y) in all sample images (i.e., K frame sample images) may be determined, and a parallax value of the sample pixel (x, y) in the sample parallax image is determined based on the parallax information, that is, the parallax value of the sample pixel (x, y) is the parallax information. To sum up, the parallax value of each sample pixel in the sample parallax image can be obtained, and the parallax values of all the sample pixels form the sample parallax image.

In a possible implementation manner, a deep neural network may be trained in advance, the input of the deep neural network is a multi-frame image, the output of the deep neural network is a parallax image, the parallax image is used for reflecting parallax information of the multi-frame image, no limitation is imposed on the structure and the training process of the deep neural network, and the deep neural network only needs to meet the input and output requirements.

In summary, based on the trained deep neural network, all target images in the target image sequence may be input to the deep neural network, and the deep neural network outputs a target parallax image corresponding to the target image sequence, where the target parallax image is used to reflect parallax information of all target images. And all sample images in the sample image sequence can be input to the deep neural network, and the sample parallax images corresponding to the sample image sequence are output by the deep neural network and are used for reflecting the parallax information of all the sample images. Thus, a target parallax image and a sample parallax image can be obtained. Of course, the target parallax image and the sample parallax image may be obtained in other manners, which is not limited in this respect.

Step S53, determining a parallax loss value based on a difference between the parallax value of each target pixel in the target parallax image and the parallax value of the corresponding sample pixel in the sample parallax image.

For example, a difference between the parallax value of the target pixel point a1 in the target parallax image and the parallax value of the sample pixel point B1 corresponding to the target pixel point a1 in the sample parallax image is determined …, and a difference between the parallax value of each target pixel point in the target parallax image and the parallax value of the sample pixel point corresponding to the target pixel point in the sample parallax image can be determined. Then, the parallax loss value may be determined based on all the difference values, for example, an average value of all the difference values is determined, and the average value is taken as the parallax loss value. Of course, the parallax loss value may be determined in other manners, and the determination manner is not limited.

The parallax loss value can be determined by the following formula:

in the above-mentioned formula, the above formula,

a parallax image of the object is represented,

representing a sample parallax image.

This means that the target parallax image is subtracted from each element in the sample parallax image, then the square of the difference is determined, and then all the square values are summed.

For example, differences 1 and … between the parallax value of the target pixel point a1 and the parallax value of the sample pixel point B1 are determined, and 100 differences are obtained assuming that 100 super pixel points exist in both the target parallax image and the sample parallax image. And determining the square value of each difference value, and summing the square values of all the difference values to obtain a parallax loss value.

And step 104, determining the initial network model as a trained target network model.

For example, if the initial network model has converged, the initial network model may be determined as a trained target network model, and thus, the training process of the network model is completed. In subsequent processes, light field super-resolution imaging can be achieved based on the trained target network model. For example, an image to be processed (i.e., an image whose resolution needs to be improved, which may be a frame of image, rather than an image sequence including multiple frames of images with different viewing angles) may be input to the trained target network model, and the image to be processed is processed by the target network model, so as to obtain a super-resolution image corresponding to the image to be processed. The processing method of the target network model is not limited in this embodiment, and is related to the structure and function of the target network model. Obviously, since the target network model is used to improve the resolution of the image, the target network model can improve the resolution of the image to be processed, that is, the resolution of the super-resolution image can be greater than the resolution of the image to be processed.

And 105, adjusting the initial network model, determining the adjusted network model as the initial network model, and returning to execute the step 101. When step 101 is executed again, the original image sequence is input to the adjusted initial network model, i.e. the target image sequence is output by the adjusted initial network model.

Obviously, by adjusting the initial network model, a new target image sequence can be output by using the adjusted initial network model, whether the adjusted initial network model is converged is determined again based on the new target image sequence, and by analogy, the initial network model is continuously adjusted, so that the target loss value between the target image sequence and the sample image sequence is smaller and smaller, namely the performance of the initial network model is better and better, until the initial network model is converged, the performance of the initial network model meets the requirement, and the initial network model with the performance meeting the requirement can be determined as the trained target network model, so that the model training process is completed.

For example, when the initial network model is adjusted, the network parameters used for improving the image resolution in the initial network model may be adjusted, or other network parameters in the initial network model may be adjusted, and the adjustment process is not limited as long as the network parameters of the initial network model can be adjusted. By adjusting the network parameters of the initial network model, the performance of the initial network model can be improved, namely, the effect of the initial network model on improving the image resolution is better and better. For example, the network parameters of the initial network model may be adjusted by using a back propagation algorithm, which is not limited to this.

In one possible embodiment, the target network model and the initial network model may be SISR network models, or may be other types of network models, without limitation. The SISR network model may be an SISR network model such as an ESPCN (efficient sub-pixel convolutional neural network), a VDSR (Accurate Image Super-Resolution), an RCAN (Residual Channel Attention network), or other SISR network models, which is not limited herein.

In a possible embodiment, in addition to training the initial network model by using the image sequence (i.e., the original image sequence, the target image sequence, and the sample image sequence) (see steps 101-105 in the training process), the initial network model may also be trained by using a common two-dimensional image (i.e., not an image sequence including multiple frames of images at different viewing angles, but one frame of image), and the training process is similar to steps 101-105, except that the input data is a two-dimensional image instead of an image sequence, and thus, the description is not repeated here.

For example, a hybrid training data set may be constructed, which may include a plurality of common two-dimensional images and a plurality of image sequences, and the initial network model may be trained by the hybrid training data set. For example, a sequence of images is input to the initial network model with a probability P, so that the initial network model is trained by the sequence of images, and a normal two-dimensional image is input to the initial network model with a probability (1-P), so that the initial network model is trained by the normal two-dimensional image. The initial network model is trained by using the mixed training data set, so that overfitting of the initial network model can be prevented, the training effect of the network model is improved, the processing performance of the network model is improved, and the training effect of the network model is better.

The value of P may be configured empirically, such as P =0.2, P =0.3, and the like, without limitation.

For example, the execution sequence is only an example given for convenience of description, and in practical applications, the execution sequence between the steps may also be changed, and the execution sequence is not limited. Moreover, in other embodiments, the steps of the respective methods do not have to be performed in the order shown and described herein, and the methods may include more or less steps than those described herein. Moreover, a single step described in this specification may be broken down into multiple steps for description in other embodiments; multiple steps described in this specification may be combined into a single step in other embodiments.

According to the technical scheme, in the embodiment of the application, the initial network model can be trained based on K frame original images under different visual angles and K frame sample images under different visual angles to obtain a trained target network model, and images under multiple visual angles provide abundant spatial information and angle information, so that the super-resolution effect of the target network model can be improved, the super-resolution effect is remarkably improved, the processing performance of the target network model is improved, the target network model learns better super-resolution capability, and the super-resolution capability of the target network model is enhanced. The initial network model can be trained according to the content loss value and the structure loss value to obtain a trained target network model, and the content loss value and the structure loss value can better reflect the difference between the images, so that the processing performance of the target network model can be improved, and the super-resolution capability of the target network model is enhanced. In summary, because the processing performance of the target network model is better, after the image to be processed is processed based on the target network model, the super-resolution image can be obtained, and the resolution of the super-resolution image is greater than that of the image to be processed, i.e., the resolution of the super-resolution image is larger, the resolution of the image is effectively improved, the resolution improving effect is good, the requirement of high resolution can be met, and even in application scenes such as automatic driving, AR/VR and the like, the resolution of the image can also meet the requirement of a user.

The embodiment of the application provides a light field super-resolution imaging method, which is used for converting an image with low resolution into an image with high resolution, so that the resolution of the image is improved, and the requirement of an application scene is met. Referring to fig. 2, a flow chart of a light field super-resolution imaging method is shown, where the method may include:

step 201, acquiring an image to be processed, where the image to be processed is an image whose resolution needs to be improved, and the image to be processed may be a frame of image instead of an image sequence including multiple frames of images with different viewing angles.

Step 202, inputting the image to be processed to the trained target network model to obtain a super-resolution image corresponding to the image to be processed, wherein the resolution of the super-resolution image is greater than that of the image to be processed.

For example, the training mode of the target network model may refer to steps 101 to 105, which are not described herein again, and based on the trained target network model, the image to be processed may be input to the target network model, and the image to be processed is processed by the target network model, so as to obtain the super-resolution image corresponding to the image to be processed. Since the target network model is used to improve the resolution of the image, the target network model can improve the resolution of the image to be processed, that is, the resolution of the super-resolution image can be greater than the resolution of the image to be processed.

The above technical solution of the embodiment of the present application is described below with reference to specific application scenarios.

Referring to fig. 3, which is a schematic diagram of light field super-resolution imaging, the imaging process may include:

training process: the sample image sequence is light field data containing four-dimensional information (spatial dimension and angular dimension), including K frame sample images at different viewing angles. And performing downsampling processing on K frames of sample images in the sample image sequence to obtain K frames of original images under different viewing angles, and forming the K frames of original images into an original image sequence. And inputting each frame of original image in the original image sequence to the initial network model to obtain K frame target images under different viewing angles, and forming the K frame target images into a target image sequence.

And determining a target loss value based on the K frames of sample images in the sample image sequence and the K frames of target images in the target image sequence, and determining whether the initial network model is converged according to the target loss value. If yes, the initial network model is determined to be the trained target network model. If not, the initial network model is adjusted, the adjusted network model is determined as the initial network model, and the training process is executed again.

The testing process comprises the following steps: and inputting the image to be processed to the trained target network model to obtain a super-resolution image corresponding to the image to be processed, wherein the resolution of the super-resolution image is greater than that of the image to be processed.

For example, the training process uses a sample image sequence acquired by a light field camera, the sample image sequence is light field data containing four-dimensional information (spatial dimension and angular dimension), as shown in fig. 4, the (x, y) plane is information of the spatial dimension, i.e. pixel values of pixel points, the (u, v) plane is information of the angular dimension, i.e. information of a horizontal viewing angle and a vertical viewing angle, and the sensor in fig. 4 may be a light field camera.

The information recorded by the light field camera can be converted into a set of multiple frame sample images (sample image sequence) in which slightly different viewpoints exist, each frame sample image representing spatial information, and slight variations of the viewpoint representing angular information. Each frame of sample image can be regarded as an image with unknown parallax, missing information obtained by sparse sampling of one frame of sample image can be captured by another frame or multiple frames of sample images, the information is called supplementary information, and a better super-resolution effect can be obtained by using the supplementary information in a training process.

Illustratively, the initial network model may be represented by the following formula:

，

and

respectively representing a low resolution image and a high resolution image,

a matrix of up-sampling is represented,

representing noise. In order to find a suitable one

The present embodiment provides 4 loss values, which are a content loss value (i.e., a function value of a content loss function), a structure loss value (i.e., a function value of a structure loss function), a variance loss value (i.e., a function value of a variance loss function), and a parallax loss value (i.e., a function value of a parallax loss function), respectively, that is, a target loss value can be determined according to the content loss value, the structure loss value, the variance loss value, and the parallax loss value. Through the joint optimization, the sum of the content loss value, the structure loss value, the variance loss value and the parallax loss value is reduced to the minimum, and therefore the super-resolution performance of the network is improved on the premise that the network structure is not changed.

Illustratively, the content loss value and the structure loss value are used for enabling the distribution of the super-resolution result and the distribution of the real result to be similar, the structure of the four-dimensional optical field is maintained, the variance loss value is used for enabling the edge position of the object to be well stored, and the parallax loss value is used for enabling the super-resolution effect to be further improved in the parallax field.

The content loss value is the content loss between the sample image in the sample image sequence and the target image in the target image sequence, the content loss value is determined by using not only the two-dimensional information of the space but also the two-dimensional information of the angle, and even if the content loss value is determined by using the four-dimensional information, a better spatial super-resolution effect can be obtained.

The structural loss value is the structural loss between the sample image in the sample image sequence and the target image in the target image sequence, and the structural loss value can be determined by using SSIM, so that the structure of the optical field is improved.

The variance loss value is a variance loss between a sample image in the sample image sequence and a target image in the target image sequence. For example, a sample variance image is generated based on all sample images in the sample image sequence, each pixel point in the sample variance image is called a super pixel point, and each pixel point in the sample image is called a sub pixel point. One super pixel point may correspond to a plurality of sub pixel points (located in a plurality of sample images).

Therefore, for each super pixel point, the variance of all sub-pixel points corresponding to the super pixel point can be determined, and the variances of all super pixel points form a sample variance image, wherein the length and the width of the sample variance image are respectively equal to the length and the width of the sample image. The variance of each super pixel point can be expressed as:

in the above formula, VM represents a sample variance image, N represents the number of sub-pixels corresponding to one super-pixel point, and if the sample image sequence includes 81 frames of sample images, N = 81.

And representing the pixel values of the sub-pixel points in the sample image. Of course, the above formula is only an example of variance, and is not limited thereto.

Similarly, a target variance image may be generated based on all target images in the sequence of target images.

The disparity loss value is a disparity loss between a sample image in the sample image sequence and a target image in the target image sequence. For example, an accurate disparity map can be obtained from a light field with a good structure, conversely, the disparity map obtained from the light field becomes more accurate, and the light field structure can also be better restored.

Based on the same application concept as the above method, an embodiment of the present application provides a light field super-resolution imaging apparatus, as shown in fig. 5, which is a schematic structural diagram of the apparatus, and the apparatus may include: an obtaining module 51, configured to input an image to be processed to a trained target network model to obtain a super-resolution image corresponding to the image to be processed; the resolution of the super-resolution image is greater than that of the image to be processed; a training module 52, configured to train and obtain the target network model by:

Illustratively, the training module 52 determines whether the initial network model has converged according to the content loss value and the structure loss value, and is specifically configured to: determining a target loss value according to the content loss value and the structure loss value; wherein the target loss value is a loss value used to determine whether an initial network model has converged; determining whether the initial network model has converged according to the target loss value.

Illustratively, the training module 52 is specifically configured to, when determining the target loss value according to the content loss value and the structure loss value: acquiring a variance loss value of a target image in the target image sequence and a sample image in the sample image sequence, and determining the target loss value according to the content loss value, the structure loss value and the variance loss value; or acquiring a parallax loss value of a target image in the target image sequence and a sample image in the sample image sequence, and determining the target loss value according to the content loss value, the structure loss value and the parallax loss value; or, determining the target loss value according to the content loss value, the structure loss value, the variance loss value and the parallax loss value.

For example, when the training module 52 obtains the content loss values of the target image in the target image sequence and the sample image in the sample image sequence, it is specifically configured to: determining a first light field matrix based on each frame of a target image in the sequence of target images; determining a second light field matrix based on each frame of sample images in the sequence of sample images; and determining the content loss value based on the difference value between the pixel value of each pixel point in the first light field matrix and the pixel value of the corresponding pixel point in the second light field matrix.

For example, when the training module 52 obtains the structure loss value of the target image in the target image sequence and the sample image in the sample image sequence, it is specifically configured to: determining a first light field matrix based on each frame of a target image in the sequence of target images; determining a second light field matrix based on each frame of sample images in the sequence of sample images; calculating the first light field matrix and the second light field matrix by adopting a configured structure similarity loss function to obtain a function value of the structure similarity loss function; determining the structural loss value based on a function value of the structural similarity loss function.

The training module 52 is specifically configured to, when obtaining a variance loss value between a target image in the target image sequence and a sample image in the sample image sequence: generating a target variance image based on each frame of target image in the target image sequence; the pixel value of each super pixel point in the target variance image is determined based on the variance of the pixel values of the sub pixel points corresponding to the super pixel point in each frame of target image; generating a sample variance image based on each frame of sample image in the sample image sequence; the pixel value of each super pixel point in the sample variance image is determined based on the variance of the pixel values of the sub pixel points corresponding to the super pixel point in each frame of sample image; and determining the variance loss value based on the difference between the pixel value of each super pixel point in the target variance image and the pixel value of the corresponding super pixel point in the sample variance image.

The training module 52 is specifically configured to, when obtaining the parallax loss value between the target image in the target image sequence and the sample image in the sample image sequence: generating a target parallax image based on each frame of target images in the target image sequence; the parallax value of each target pixel point in the target parallax image is determined based on the parallax information of the sub-pixel point corresponding to the target pixel point in each frame of target image; generating a sample parallax image based on each frame sample image in the sample image sequence; the parallax value of each sample pixel point in the sample parallax image is determined based on the parallax information of the sub-pixel point corresponding to the sample pixel point in each frame of sample image; and determining the parallax loss value based on the difference value between the parallax value of each target pixel point in the target parallax image and the parallax value of the corresponding sample pixel point in the sample parallax image.

Based on the same application concept as the above method, an embodiment of the present application provides a light field super-resolution imaging apparatus, which is shown in fig. 6 and may include: a processor 61 and a machine-readable storage medium 62, the machine-readable storage medium 62 storing machine-executable instructions executable by the processor 61; the processor 61 is configured to execute machine executable instructions to perform the following steps: inputting an image to be processed to a trained target network model to obtain a super-resolution image corresponding to the image to be processed; the resolution of the super-resolution image is greater than that of the image to be processed;

wherein the target network model is obtained by training in the following way:

Based on the same application concept as the method, embodiments of the present application further provide a machine-readable storage medium, where a plurality of computer instructions are stored on the machine-readable storage medium, and when the computer instructions are executed by a processor, the light field super-resolution imaging method disclosed in the above example of the present application can be implemented.

The machine-readable storage medium may be any electronic, magnetic, optical, or other physical storage device that can contain or store information such as executable instructions, data, and the like. For example, the machine-readable storage medium may be: a RAM (random Access Memory), a volatile Memory, a non-volatile Memory, a flash Memory, a storage drive (e.g., a hard drive), a solid state drive, any type of storage disk (e.g., an optical disk, a dvd, etc.), or similar storage medium, or a combination thereof.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

The above description is only an example of the present application and is not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A light field super-resolution imaging method, the method comprising:

wherein the target network model is obtained by training in the following way:

acquiring a content loss value and a structure loss value of a target image in the target image sequence and a sample image in the sample image sequence, and acquiring a variance loss value and/or a parallax loss value of the target image in the target image sequence and the sample image in the sample image sequence; wherein the sample image sequence comprises K frame sample images at different view angles;

determining whether the initial network model has converged in dependence on the content loss value and the structural loss value, and the variance loss value and/or the disparity loss value;

if yes, determining the initial network model as a trained target network model;

if not, the initial network model is adjusted, the adjusted network model is determined to be the initial network model, and the original image sequence is input to the initial network model to obtain the target image sequence.

2. The method of claim 1, wherein determining whether the initial network model has converged based on the content loss value and the structure loss value, and the variance loss value and/or the disparity loss value comprises:

determining a target loss value from the content loss value, the structure loss value, and the variance loss value; or, determining a target loss value according to the content loss value, the structure loss value and the parallax loss value; or, determining a target loss value according to the content loss value, the structure loss value, the variance loss value and the parallax loss value; wherein the target loss value is a loss value used to determine whether an initial network model has converged;

3. The method according to claim 1 or 2, wherein the obtaining of the content loss value of the target image in the target image sequence and the sample image in the sample image sequence comprises:

4. The method according to claim 1 or 2, wherein the obtaining of the structure loss value of the target image in the target image sequence and the sample image in the sample image sequence comprises:

5. The method of claim 1 or 2, wherein obtaining a loss of variance value for a target image in the sequence of target images and a sample image in the sequence of sample images comprises:

6. The method according to claim 1 or 2, wherein the obtaining of the disparity loss value between the target image in the target image sequence and the sample image in the sample image sequence comprises:

7. The method according to any one of claims 1 or 2, wherein before inputting the original image sequence into the initial network model to obtain the target image sequence, the method further comprises:

8. A light field super-resolution imaging apparatus, the apparatus comprising:

the acquisition module is used for inputting the image to be processed to the trained target network model to obtain a super-resolution image corresponding to the image to be processed; the resolution of the super-resolution image is greater than that of the image to be processed;

9. A light field super-resolution imaging device, comprising: a processor and a machine-readable storage medium storing machine-executable instructions executable by the processor; wherein the processor is configured to execute the machine executable instructions to perform the steps of:

wherein the target network model is obtained by training in the following way: