CN117115047A - Image enhancement method, device, equipment and storage medium - Google Patents

Image enhancement method, device, equipment and storage medium Download PDF

Info

Publication number
CN117115047A
CN117115047A CN202311384163.XA CN202311384163A CN117115047A CN 117115047 A CN117115047 A CN 117115047A CN 202311384163 A CN202311384163 A CN 202311384163A CN 117115047 A CN117115047 A CN 117115047A
Authority
CN
China
Prior art keywords
image
sample
training
view
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311384163.XA
Other languages
Chinese (zh)
Inventor
周昆
李文博
蒋念娟
吕江波
沈小勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Simou Intelligent Technology Co ltd
Shenzhen Smartmore Technology Co Ltd
Original Assignee
Suzhou Simou Intelligent Technology Co ltd
Shenzhen Smartmore Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Simou Intelligent Technology Co ltd, Shenzhen Smartmore Technology Co Ltd filed Critical Suzhou Simou Intelligent Technology Co ltd
Priority to CN202311384163.XA priority Critical patent/CN117115047A/en
Publication of CN117115047A publication Critical patent/CN117115047A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T5/00Image enhancement or restoration
    • G06T5/50Image enhancement or restoration using two or more images, e.g. averaging or subtraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/005General purpose rendering architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2200/00Indexing scheme for image data processing or generation, in general
    • G06T2200/04Indexing scheme for image data processing or generation, in general involving 3D image data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • G06T2207/10012Stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20212Image combination
    • G06T2207/20221Image fusion; Image merging
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Graphics (AREA)
  • Image Processing (AREA)

Abstract

The application relates to an image enhancement method, an image enhancement device and a storage medium, wherein the image enhancement method comprises the following steps: firstly, acquiring an image rendered by NeRF as a target image to be enhanced; then selecting a first reference image and a second reference image which are most similar to the target image from other visual angle images rendered by NeRF; and then, inputting the target image, the first reference image and the second reference image into a pre-constructed multi-view image aggregation model for image enhancement processing to obtain an image enhancement result corresponding to the target image. By adopting the method, the multi-view image aggregation model capable of carrying out more efficient information fusion on multi-frame images under each view angle is obtained through carrying out iterative aggregation on pixel levels and image block levels on the degradation training image and the reference training image in advance, so that the enhancement effect and the image rendering quality can be effectively improved when the target image is enhanced by utilizing the model and the reference image.

Description

Image enhancement method, device, equipment and storage medium
Technical Field
The present application relates to the field of image processing technologies, and in particular, to an image enhancement method, apparatus, device, and storage medium.
Background
Currently, in order to generate a highly realistic 3D scene, a neural radiation field (Neural Radiance Fields, abbreviated as NeRF) is used as an advanced computer graphics technology, and has been widely used in the fields of enhanced display, virtual street view, and the like. It can generate high quality rendering with fewer data sets by training data over a limited input view and can render from any angle, producing an exclusionary high quality rendering effect.
However, the image rendering effect of NeRF is very dependent on accurate camera calibration results, and inaccurate camera calibration results may result in lower quality of new view angle images based on NeRF synthesis. In addition, for new scenes, the existing NeRF model needs to carry out a large amount of parameter adjustment, even retraining to improve the rendering quality, so that the training process of the model is complex, the generalization is poor and deployment is difficult. In addition, some of the existing NeRF models generally require a longer viewing angle to complete training of a single scene, e.g., tens of hours, etc. It can be seen that the existing NeRF model-based rendering of images is less effective.
Disclosure of Invention
Based on the foregoing, it is necessary to provide an image enhancement method, device, apparatus and storage medium, which can effectively improve the enhancement effect on the rendering of images based on the NeRF model, and further improve the image rendering quality.
In a first aspect, the present application provides an image enhancement method, including:
acquiring a target image to be enhanced; the target image is an image rendered by a nerve radiation field NeRF;
selecting a first reference image and a second reference image which are most similar to the target image from other visual angle images rendered by NeRF;
inputting the target image, the first reference image and the second reference image into a pre-constructed multi-view image aggregation model for image enhancement processing to obtain an image enhancement result corresponding to the target image;
the multi-view image aggregation model is obtained by performing iterative aggregation training on a pixel level and an image block level on a degraded training image and a reference training image; the degradation training image is obtained by carrying out degradation treatment on the sample training image; the reference training image and the sample training image are consecutive frame images contained in a sample video sequence.
In a second aspect, an embodiment of the present application further provides an image enhancement apparatus, including:
the first acquisition module is used for acquiring a target image to be enhanced; the target image is an image rendered by a nerve radiation field NeRF;
the matching module is used for selecting a first reference image and a second reference image which are most similar to the target image from other visual angle images rendered by NeRF;
The enhancement module is used for inputting the target image, the first reference image and the second reference image into a pre-constructed multi-view image aggregation model to carry out image enhancement processing, so as to obtain an image enhancement result corresponding to the target image;
the multi-view image aggregation model is obtained by performing aggregation iterative training on a pixel level and an image block level on a degraded training image and a reference training image; the degradation training image is obtained by carrying out degradation treatment on the sample training image; the reference training image and the sample training image are consecutive frame images contained in a sample video sequence.
In a third aspect, the application provides a computer device comprising a memory storing a computer program and a processor implementing the steps of the method described above when the processor executes the computer program.
In a fourth aspect, the present application provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of the method described above.
In a fifth aspect, the application provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method described above.
The image enhancement method, the device, the equipment and the storage medium have the following beneficial effects:
when the target image rendered by NeRF is enhanced, firstly, the multi-view image aggregation model is obtained by carrying out iterative aggregation training on the pixel level and the image block level on the degradation training image and the reference training image, so that the model can carry out more efficient information fusion on multi-frame images under each view angle, and the enhancement effect can be further improved when the target image is enhanced by utilizing the model and the reference image similar to the target image, and the image rendering quality of the target image is further improved.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of an image enhancement method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of selecting a first reference image and a second reference image that are most similar to a target image from other perspective images rendered by NeRF by using a multi-perspective similarity matching algorithm according to an embodiment of the present application;
FIG. 3 is a schematic structural diagram of a multi-view image aggregation model according to an embodiment of the present application;
FIG. 4 is a schematic diagram of an aggregation process of a multi-view image aggregation model according to an embodiment of the present application;
FIG. 5 is a diagram illustrating an exemplary process for synthesizing degraded training images from a sample video sequence using an image degradation simulator according to an embodiment of the present application;
FIG. 6 is a diagram showing a comparison example of a target image and its image enhancement result according to an embodiment of the present application;
fig. 7 is a schematic diagram of an image enhancement device according to an embodiment of the present application;
FIG. 8 is a diagram illustrating an internal architecture of a computer device according to an embodiment of the present application;
FIG. 9 is an internal block diagram of another computer device according to an embodiment of the present application;
fig. 10 is an internal structural diagram of a computer-readable storage medium according to an embodiment of the present application.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
With the advent and development of computer technology, artificial intelligence, and other scientific technologies, neRF, as an advanced computer graphics technology, has been widely applied to the fields of 3D visual display, enhanced display, virtual street view, and the like, in order to be able to generate highly realistic 3D scenes. The model is a three-dimensional scene reconstruction method based on a neural network, the color and depth values of each pixel point can be predicted from a single or a few view angles, and the image of any view angle can be predicted by using the learned neural radiation field function, so that image rendering is realized.
The inventor of the present application has found that, when using various inverse rendering techniques such as NeRF to improve the speed and quality of new view angle synthesis, there are several defects: firstly, the accurate camera calibration result is very depended, but the multi-view acquired in an outdoor scene is very sparse, so that the camera calibration accuracy is not ideal, and the inaccurate camera calibration result can cause the quality of a new view angle image based on NeRF synthesis to be lower. Secondly, under the normal condition, the material and geometry of the object in the outdoor scene are very complex, the environmental illumination, the light and the like are difficult to obtain, and the material, the environmental illumination, the geometry of the scene and the like of the object are difficult to accurately recover only by a few acquired images, so that the problems of complex environmental illumination, unknown material of the object and the like are also unfavorable for generating a high-quality new view angle image. Thirdly, for a new scene, the existing NeRF model needs to carry out a large amount of parameter adjustment and even retraining to improve the rendering quality, so that the training process of the model is complex, the generalization is poor and the deployment is difficult. Fourth, some of the existing NeRF models usually require a longer viewing angle to complete training of a single scene, such as tens of hours.
Therefore, how to solve the problems of low synthesis quality, long training viewing angle, difficult training and the like existing in the new view angle image rendering process of the NeRF model, so as to improve the image rendering effect is a technical problem to be solved urgently at present.
Based on the above, in order to solve the above-mentioned drawbacks, the present application provides an image enhancement method, which firstly performs iterative aggregation training on a degraded training image and a reference training image at a pixel level and an image block level to obtain a multi-view image aggregation model, so that the model can perform more efficient information fusion on multi-frame images under each view angle, thereby effectively improving the enhancement effect when the model and a reference image similar to a target image are used for enhancing the target image, and further enhancing the rendering quality of the target image rendered by various NeRF models.
As shown in fig. 1, an embodiment of the present application provides an image enhancement method, which includes the following steps:
s101: acquiring a target image to be enhanced; wherein the target image is an image rendered by the neural radiation field NeRF.
In this embodiment, any image that needs enhancement processing after being rendered by NeRF is defined as a target image, and is represented by I. Note that the present embodiment is not limited to the type of the target image, and the target image may be a color image composed of three primary colors of red (R), green (G), and blue (B), a grayscale image, or the like, for example. Also, the present application is not limited to the specific content of the target image, and the target image may be, for example, a landscape image, a figure image, an animal image, or the like.
It will be appreciated that the target image may be rendered by various types of NeRF models according to actual needs, for example, the target image may be an image obtained after rendering by using a NeRF model such as tensor radiation field (tensorrf), canonical neural radiation field (RegNeRF) or non-neural network radiation field (Plenoxels). These target images often have problems of rendering noise, blurring, etc., and the enhancement processing of the target images can be implemented by executing the subsequent steps S102-103, so as to improve the rendering quality of the images.
S102: the first reference image and the second reference image which are most similar to the target image are selected from other view images rendered by NeRF.
In this embodiment, after the target image to be enhanced is obtained through step S101, in order to effectively improve the enhancement effect on the target image to improve the image rendering quality, further, the existing or future image selection method may be used to select two view images most similar to the target image from the other view images rendered by NeRF, for example, randomly select two view images similar to the target image from the other view images rendered by NeRF, as the first reference image and the second reference image, and use And->Indicated to perform the subsequent step S103.
An alternative implementation manner is that after the target image to be enhanced is acquired, a multi-View similarity matching algorithm (the specific algorithm content is not limited, for example, a multi-View similarity matcher (View Selection) may be adopted), and the first reference image and the second reference image which are most similar to the target image are selected from other View images after NeRF rendering. The specific implementation process may include the following steps S1021-S1027:
s1021: simplifying a scene rendered by NeRF into a scene in which the same amount of light is projected to a 3D bounding sphere through N view angles, and taking a view angle image corresponding to an ith view angle in the N view angles as a target image; wherein N is a positive integer greater than 1; i is a positive integer greater than 0 and not greater than N.
In this implementation manner, it should be noted that, when the image is rendered by the NeRF, the images of multiple views are often processed at the same time, after the image of one view is taken as the target image, in order to improve the enhancement effect on the target image by executing the subsequent step S103, the first reference image that is most similar to the target image may be selected from the images of other views after the image is rendered by the NeRF by further using the multi-view similarity matching algorithm And a second reference picture->Is used as the enhancement basis. Specifically, first, the scene rendered by NeRF may be simplified into a scene in which the same amount of light is projected to a 3D bounding sphere through N views, as shown in fig. 2, and the view image corresponding to the i-th view of the N views is used as the target image to execute the subsequent step S1022.
S1022: for the ith intersection point of the sphere projected by the ith view angle, the point with the smallest Euclidean distance between the ith intersection point and the point is calculated as the intersection point which is most matched with the ith intersection point from all intersection points of the sphere projected by the jth view angle in N view angles; wherein j is a positive integer greater than 0 and not greater than N.
In this implementation, for the ith and jth of the N views, both cast the same number of rays to the sphere, and these rays would intersect the sphere to form an intersection point,the set of intersection points of these two views can be represented as follows:
wherein,representing the intersection point set formed by the light ray projected to the sphere from the ith view angle and the sphere, wherein the number of intersection points in the set is M i ;/>Representing the intersection point set formed by the light ray projected to the sphere from the j-th view angle and the sphere, wherein the number of intersection points in the set is M j
The ith intersection point projected to the sphere for the ith view angleIn other words, the point having the smallest Euclidean distance to the ith point can be calculated as the point +_which matches the ith point from the jth view of the N views projected to all the points of intersection of the sphere>The specific calculation formula is as follows:
s1023: and analogically, respectively calculating the intersection points which are most matched with all the intersection points of the sphere projected from the ith view angle, and calculating the first cost of the ith view angle matched with the jth view angle by utilizing the intersection points.
Determining the intersection with the ith point by the above step S1022Intersection of the best match->Then, in a similar way, the intersection point which is most matched with each intersection point of the sphere projected from the jth view angle can be calculated from all the intersection points of the sphere projected from the jth view angle, and the distance between each intersection point and each most matched intersection point is calculated as the first cost of the jth view angle matched with the ith view angle and is defined as->The specific calculation formula is as follows:
s1024: and for the jth intersection point of the sphere projected from the jth view angle in the N view angles, calculating the point with the smallest Euclidean distance between the jth intersection point and the jth intersection point as the intersection point which is matched with the jth intersection point.
Similar to the above-mentioned execution of step S1022, for the jth intersection point where the jth view angle is projected onto the sphere, the point with the smallest euclidean distance between the jth intersection point and the point may be calculated from all the intersection points where the jth view angle is projected onto the sphere, as the intersection point that is the closest match with the jth intersection point, so as to execute the subsequent step S1025.
S1025: and analogically, respectively calculating the intersection points which are most matched with all the intersection points of the sphere projected by the j-th view from all the intersection points of the sphere projected by the i-th view, and calculating the second cost of the j-th view matched with the i-th view by utilizing the intersection points.
After the point having the smallest Euclidean distance to the jth intersection point is calculated from all the intersection points projected to the sphere from the ith view angle through the above step S1024, the intersection points which are most matched with the respective intersection points projected to the sphere from the jth view angle can be calculated from all the intersection points projected to the sphere from the ith view angle in a similar manner to the execution of the above step S1023, respectivelyCalculating the distance between each intersection point and the most matched intersection point as the second cost of the j-th view matching the i-th view, and defining the distance asTo perform the subsequent step S1026.
S1026: and carrying out summation calculation on the first cost and the second cost, and taking the obtained calculation result as the matching cost between the ith view angle and the jth view angle.
Calculating the first price of the ith view angle matching the jth view angle through the above step S1023And calculating a second cost +_for the j-th view matching the i-th view through the above step S1025>Then, the two can be further summed and calculated, and the obtained calculation result is used as the matching cost between the ith view angle and the jth view angle by +.>The specific calculation formula is shown as follows:
s1027: and by analogy, calculating the matching cost between the ith view angle and each view angle in the N view angles; and selecting two view images corresponding to the minimum matching cost from the two view images to serve as a first reference image and a second reference image which are most similar to the target image respectively.
Matching cost between the ith view and the jth view calculated in step S1026Similar to the execution process of the (c), the matching cost between each view angle in the ith view angle and the N view angles can be calculated respectively; and further select from all matching costs obtainedSelecting two view images corresponding to the minimum matching cost as first reference images (I) which are most similar to the target image respectively >And a second reference picture->
S103: inputting a target image, a first reference image and a second reference image into a pre-constructed multi-view image aggregation model for image enhancement processing to obtain an image enhancement result corresponding to the target image, wherein the multi-view image aggregation model is obtained after performing iterative aggregation training on a pixel level and an image block level on a degradation training image and a reference training image; the degradation training image is obtained by carrying out degradation treatment on the sample training image; the reference training image and the sample training image are consecutive frame images contained in the sample video sequence.
In the present embodiment, a target image I to be enhanced is acquired through step S101, and a first reference image most similar to the target image I is determined through step S102And a second reference picture->After that, the target image I, the first reference image +.>And a second reference picture->Inputting a pre-constructed multi-view image aggregation model to perform image enhancement processing so as to obtain an image enhancement result corresponding to the target image.
In order to improve the image enhancement effect on the target image, the embodiment uses a large number of degraded training images and reference training images thereof in advance to perform iterative aggregation training at the pixel level and the image block level, wherein the degraded training images are obtained by performing degradation processing on sample training images; the reference training image and the sample training image are consecutive frame images contained in the sample video sequence. Therefore, a multi-view image aggregation model with better image enhancement effect is constructed, and the model can perform more efficient information fusion on multi-frame images under each view angle. In addition, the application does not limit the specific network composition structure of the multi-view image aggregation model, and can be selected and set according to actual conditions. The pre-constructed multi-view image aggregation model may be a multi-view image aggregator (IVM), and may include, but is not limited to, a first encoder, a second encoder, a pixel aggregation module, an image block aggregation module, and a reconstructor.
Therefore, after the multi-view image aggregation model is constructed by performing iterative aggregation training on the pixel level and the image block level on the degradation training image and the reference training image, the image enhancement precision and efficiency of the model can be effectively improved, and the enhancement effect can be effectively improved when the multi-view image aggregation model is used for performing image enhancement processing on the target image.
Specifically, in the step S103, the specific implementation process of inputting the target image, the first reference image and the second reference image into the pre-constructed multi-view image aggregation model to perform the image enhancement processing to obtain the image enhancement result corresponding to the target image may include the following steps S1031-S1033:
s1031: inputting the target image into a first encoder of a multi-view image aggregation model to obtain a target coding vector corresponding to the target image; and inputting the first reference image and the second reference image into a second encoder of the multi-view image aggregation model to obtain a target reference vector.
In the present implementation, the target image I and the most similar first reference image are determinedAnd a second reference picture->Further, as shown in fig. 3 and the dotted line top view of fig. 4, the target image I may be input to a first encoder of the multi-view image aggregation model to obtain a target encoding vector f corresponding to the target image. The first reference picture can be +. >And a second reference picture->A second encoder for inputting the multi-view image aggregation model to obtain a target reference vector +.>And->To perform the subsequent step S1032.
S1032: and inputting the target coding vector and the target reference vector into a pixel aggregation module and an image block aggregation module of the multi-view image aggregation model to perform iterative aggregation processing to obtain the target aggregation vector.
Obtaining a target coding vector f and a target reference vector corresponding to the target image through step S1031And->Then, the depth feature iterative aggregation processing can be further performed by inputting the depth feature iterative aggregation processing into a pixel aggregation module and an image block aggregation module of the multi-view image aggregation model, as shown in the upper graph of the dotted line in fig. 4, to obtain a target aggregation vector +_>And->To perform the subsequent step S1032.
As shown in the dashed lower diagram in fig. 4, the pixel aggregation module may include, but is not limited to, a convolution layer, a residual layer, and a deformable convolution layer, and the image block aggregation module may include, but is not limited to, a 3D partition layer (3D patch partition), a Linear coding layer (Linear coding), and a 3D sliding window multi-head self-attention architecture (video swin transformer block) replacing MSA in a standard transform. It should be noted that, the two iterative aggregation shown in the dotted line upper diagram of fig. 4 is only an example, and the specific number of iterations, and the specific structural compositions of the pixel aggregation module and the image block aggregation module of the image aggregation model may be set according to the actual situation and the empirical value, which is not limited in this embodiment.
S1033: and inputting the target aggregate vector into a reconstructor of the multi-view image aggregate model for image reconstruction to obtain an image enhancement result corresponding to the target image.
Obtaining the target aggregate vector through step S1032And->Then, the image can be further reconstructed by a reconstructor which inputs the image into the multi-view image aggregation model, as shown in the upper graph of the dotted line in fig. 3 and 4, to obtain an image enhancement result corresponding to the target image ∈>
The specific structural composition of the reconstructor (Reconstruction) may be set according to practical situations and empirical values, and the embodiment is not limited thereto, for example, as shown in the upper graph of the dotted line in fig. 4, the reconstructor may include, but is not limited to, two-layer convolution layers and multiple residual layers.
It can be seen that, in this embodiment, when the multi-view image aggregation model shown in fig. 3 and fig. 4 is used to perform image enhancement processing on the target image, it may perform more efficient iterative information fusion processing on the target image and the target reference image similar to the target image, so that the image enhancement effect may be improved, and further the image rendering quality of the target image may be improved.
Next, the present embodiment will describe in detail a training process of the multi-view image aggregation model. A specific training process may include the following steps a-B:
Step A: acquiring a sample video sequence, and synthesizing a degradation training image from the sample video sequence by using an image degradation simulator; the sample video sequence includes a sample training image and a reference training image.
In this embodiment, in order to construct the multi-view image aggregation model, a large amount of preparation work needs to be performed in advance, that is, a large amount of images including noise and blur need to be prepared, and the images after NeRF rendering are simulated and used as training data to obtain the multi-view image aggregation model through subsequent training in step B. Specifically, a large number of clean video sequences directly captured by a camera or other capturing device can be obtained as sample video sequences (including sample training images and reference training images), and then the sample training images in the sample video sequences are subjected to degradation (such as noise adding, blurring, etc.) one by an image degradation simulator (NeRF-style degradation simulator, abbreviated as NDS) to synthesize degraded training images one by one and define the degraded training images as I 1 To perform the subsequent step B in order to achieve training of the multi-view image aggregation model.
An alternative implementation is when a sample video sequence contains three consecutive frame images 、/>、/>) At this time, any one frame image can be selected from the three consecutive frame images (e.g. +.>) As sample training images, and two other frames of images (++>、/>) All are used as reference training images, and the implementation process of synthesizing the degraded training images from the sample video sequence by using the image degradation simulator in the step A can specifically comprise the following steps A1-A4:
step A1: and carrying out degradation treatment on the sample training image by using the projected Gaussian noise SGN to obtain a first sample degradation training image.
In the present implementation, upon determining the sample training imageThen, as shown in fig. 5, for the characteristics of NeRF rendering noise, the sample training image ++can be trained using projected gaussian noise (splatted Gaussian noise, abbreviated as SGN)>Performing degradation processing to obtain a first sample degradation training image, and defining the first sample degradation training image as I D1 The specific calculation formula is as follows:
where n (mean 0, variance 0.02) and g (gaussian blur convolution kernel, which may be 3*3) represent gaussian noise and isotropic gaussian blur, respectively.
Step A2: and carrying out repositioning random degradation processing on the first sample degradation training image to obtain a second sample degradation training image.
It should be noted that, since the camera calibration accuracy is not enough, the projection position is inaccurate during the rendering process to generate rendering noise, and in this case, the first sample degradation training image I is obtained in step A1 D1 Thereafter, as shown in FIG. 5, the present application proposes that the training image I can be degraded for the first sample D1 Performing repositioningA (re-positioning) random degradation process is carried out to obtain a second sample degradation training image, and the second sample degradation training image is defined as I D2 The specific calculation formula is as follows:
where p represents a random probability, the correct position of the pixel is usually maintained with a probability of 0.9, and the nearby pixel is randomly selected with a probability of 0.1 (e.g. offset,/>) The value of (2) is assigned to the current pixel position (/ -)>)。
Step A3: and performing degradation treatment on the second sample degradation training image by utilizing anisotropic Gaussian image blurring to obtain a third sample degradation training image.
It should be noted that, since there may be a blur in the NeRF-rendered image, in this case, the second sample-degraded training image I is obtained in step A2 D2 Later, as shown in FIG. 5, the application proposes that the second sample can be degenerated to train the image I by utilizing Anisotropic Gaussian image Blur (A-Blur for short) D2 And performing degradation processing to obtain a third sample degradation training image.
Step A4: and realizing self-adaptive regional degradation when carrying out degradation treatment on the sample training image, the first sample degradation training image and the second sample degradation training image by utilizing a regional self-adaptive degradation strategy, and obtaining an optimized third sample degradation training image as a degradation training image.
It should be noted that, because of the non-uniform spatial distribution of the NeRF input image, the rendering quality is better in the region with dense input viewing angle, whereas the rendering quality is worse, therefore, the application proposes that the region can be utilizedAn adaptive degradation strategy (Region adaptive strategy, RA for short) implements adaptive regional degradation when performing degradation processing on the sample training image, the first sample degradation training image, and the second sample degradation training image, as shown in fig. 5, to obtain an optimized third sample degradation training image as a degradation training image, and define it as I 1 . Specifically, the adaptive image degradation operation can be performed by using a two-dimensional anisotropic gaussian mask, and the degradation formula is as follows:
wherein,representing standard deviation; g () represents a two-dimensional anisotropic Gaussian; / >And->Representing the mean, i.e., the center point coordinates of the gaussian; a represents the angle of each anisotropy.
Thus, by utilizing the three degradation modes and RA degradation strategies, a large number of degradation training images I with good degradation effect can be obtained 1 And further, the subsequent step B can be executed by utilizing the degradation training images, so that the multi-view image aggregation model with better enhancement effect is trained.
And (B) step (B): and inputting the degradation training image and the reference training image into an initial multi-view image aggregation model, calculating the value of the target loss function by utilizing the pixel values of the sample enhancement image and the sample training image output by the model, and stopping updating the model parameters until the value meets the preset condition, and training to obtain the multi-view image aggregation model.
In this implementation manner, after a large number of degraded training images are obtained in step a, the degraded training images and the reference training images may be further input into an initial multi-view image aggregation model, and then the pixel values of the sample enhanced image and the sample training image output by the reconstructor in the model are used to calculate the value of the target loss function, until the value meets the preset condition (if the value is reduced to be basically unchanged), updating of the model parameters is stopped, and the multi-view image aggregation model is obtained through training. The value of the target loss function is used for updating constraint model parameters so as to improve the enhancement effect of the sample enhanced image.
It should be noted that the initial multi-view image aggregation model used in the present application may be based on a multi-view image aggregator IVM network, including but not limited to a first encoder, a second encoder, a pixel aggregation module, an image block aggregation module, and a reconstructor, as shown in fig. 3 and 4.
Specifically, an alternative implementation manner may specifically include the following steps B1-B4:
step B1: inputting the degraded training image into a first encoder for encoding to obtain a sample encoding vector corresponding to the degraded training image; and inputting the reference training image into a second encoder for encoding to obtain a sample reference encoding vector corresponding to the reference training image.
In this implementation, the initial multi-view image aggregation model includes a first encoder, a second encoder, a pixel aggregation module, an image block aggregation module, and a reconstructor, as shown in fig. 3 and 4, when model training is performed, the degraded training image I is first performed 1 Inputting the first encoder to extract depth characteristics to obtain sample coding vectors corresponding to the degraded training images, and simultaneously, referencing the training imagesAnd->Inputting the second encoder to perform depth feature extraction to obtain reference training image +. >And->The corresponding samples refer to the encoded vector for performing the subsequent step B2.
Step B2: inputting the sample coding vector and the sample reference coding vector into a pixel aggregation module for aggregation treatment to obtain a first sample aggregation vector; and inputting the first sample aggregate vector into an image block aggregate module for aggregate processing to obtain a second sample aggregate vector.
In the implementation manner, after obtaining the sample coding vector corresponding to the degraded training image and the sample reference coding vector corresponding to the reference training image through the step B1, the sample coding vector and the sample reference coding vector can be further input into a pixel aggregation module to be aggregated, so as to obtain a first sample aggregation vector; and inputting the first sample aggregation vector into an image block aggregation module for aggregation treatment to obtain a second sample aggregation vector, and realizing aggregation of image depth characteristics for executing the subsequent step B3.
Step B3: inputting the second sample aggregate vector into a pixel aggregate module and an image block aggregate module for iterative aggregate processing, and then analogizing until the preset requirement is met, so as to obtain a sample aggregate vector; and inputting the sample aggregate vector into a reconstructor for image reconstruction to obtain a sample enhanced image corresponding to the degraded training image.
In this implementation manner, after the second sample aggregation vector is obtained through the aggregation processing of the pixel aggregation module and the image block aggregation module, the second sample aggregation vector may be further input into the pixel aggregation module and the image block aggregation module to perform iterative aggregation processing, so as to implement sufficient aggregation of depth features, and so on, after the depth feature aggregation is performed multiple times, until a preset requirement is met (the specific content is not limited, for example, the preset aggregation times are reached), so as to obtain the sample aggregation vector. Then the sample aggregate vector is input into a reconstructor for image reconstruction, and then a sample enhanced image corresponding to the degraded training image can be obtained and defined asTo perform the subsequent step B4.
Step B4: calculating the value of the target loss function by using the pixel values of the sample enhanced image and the sample training image, and training the initial multi-view image aggregation model by using the value until the value meets the preset condition, and stopping updating the model parameters to obtain the trained multi-view image aggregation model.
In the present implementation, upon determining the sample training imageSample enhanced image +.>After that, further, the image can be enhanced with samples +. >Sample training image->Calculating the value of the target loss function, training the initial multi-view image aggregation model by using the value, and continuously updating the model parameters of the initial multi-view image aggregation model according to the value change of the target loss function in the training process until the function value of the target loss function meets the preset condition, such as reaching the minimum value and having small change amplitude (basically unchanged), stopping updating the model parameters, completing the training of the initial multi-view image aggregation model, and obtaining the trained multi-view image aggregation model.
Wherein the objective loss function is used for restricting the updating of model parameters to improve the enhancement effect of the sample enhanced image, and an alternative implementation way is that when the sample enhanced imageSample training image->When each pixel includes K (K is a positive integer greater than 0), the calculation process of the target loss function may be: first, sample enhanced image ++>The pixel values of K pixels of (1) are respectively +.>The pixel values of corresponding pixel points in the K pixel points are subtracted to obtain K subtraction results, then the K subtraction results are respectively taken as absolute values and then summed and averaged for calculation, and the obtained calculation result is taken as the value of a target loss function, and the specific calculation formula is as follows:
Wherein,representing the target loss function.
Thus, during the training process, the training process can be based onContinuously updating model parameters of the initial multi-view image aggregation model until +.>If the function values of the model parameters meet the preset conditions, such as that the minimum value is reached and the change amplitude is very small (basically unchanged), updating the model parameters is stopped, training of the multi-view image aggregation model is completed, and the trained multi-view image aggregation model is obtained.
It can be seen that by executing the steps S101 to S103, the image rendered by any NeRF model of any scene can be uniformly enhanced by using the pre-constructed view image aggregation model. In addition, the enhancement method of the application does not need accurate camera calibration, does not need the conditions of defining the material, illumination and the like of the object, does not need retraining a model for a new scene, and can complete high-quality new view angle image synthesis by directly carrying out image enhancement processing. Meanwhile, the application can shorten the training visual angle of the existing NeRF model, and can remarkably improve the synthetic quality. For example, assuming that the left side of fig. 6 is a target image rendered by the NeRF model, after performing the enhancement processing on the target image by performing the above steps S101 to S103, an image enhancement result shown in the right side of fig. 6 can be obtained, and it can be seen that the enhanced image is clearer than the target image.
In summary, in the image enhancement method provided in the present embodiment, after the target image rendered by the NeRF is obtained, first, the first reference image and the second reference image that are most similar to the target image are selected from the other perspective images rendered by the NeRF; and then inputting the target image, the first reference image and the second reference image into a pre-constructed multi-view image aggregation model for image enhancement processing to obtain an image enhancement result corresponding to the target image. In this way, through carrying out iterative aggregation on the pixel level and the image block level on the degradation training image and the reference training image in advance, a multi-view image aggregation model capable of carrying out more efficient information fusion on multi-frame images under each view angle is obtained through training, and therefore when the model and the reference image are utilized to enhance the target image, the enhancement effect and the image rendering quality can be effectively improved.
Based on the same inventive concept, the embodiment of the application also provides an image enhancement device. The implementation of the solution provided by the apparatus is similar to the implementation described in the above method, so the specific limitation in the embodiments of the image enhancement apparatus provided in the following may be referred to the limitation of the image enhancement method hereinabove, and will not be repeated here.
As shown in fig. 7, an embodiment of the present application provides an image enhancement apparatus 700, including:
a first obtaining module 701, configured to obtain a target image to be enhanced; the target image is an image rendered by a nerve radiation field NeRF;
a matching module 702, configured to select a first reference image and a second reference image that are most similar to the target image from other perspective images rendered by NeRF;
the enhancement module 703 is configured to input the target image, the first reference image, and the second reference image into a pre-constructed multi-view image aggregation model for image enhancement processing, so as to obtain an image enhancement result corresponding to the target image;
the multi-view image aggregation model is obtained by performing aggregation iterative training on a pixel level and an image block level on a degraded training image and a reference training image; the degradation training image is obtained by carrying out degradation treatment on the sample training image; the reference training image and the sample training image are consecutive frame images contained in a sample video sequence.
In some embodiments, the image enhancement apparatus 700 further comprises:
the second acquisition module is used for acquiring a sample video sequence and synthesizing a degradation training image from the sample video sequence by utilizing an image degradation simulator; the sample video sequence comprises a sample training image and a reference training image;
The training module is used for inputting the degradation training image and the reference training image into an initial multi-view image aggregation model, calculating the value of the target loss function by utilizing the pixel values of the sample enhancement image and the sample training image output by the model, stopping updating the model parameters until the value meets the preset condition, and training to obtain the multi-view image aggregation model;
the initial multi-view image aggregation model comprises a first encoder, a second encoder, a pixel aggregation module, an image block aggregation module and a reconstructor; the value of the target loss function is used for updating constraint model parameters so as to improve the enhancement effect of the sample enhanced image.
In some embodiments, the sample video sequence contains three consecutive frame images; the image enhancement apparatus 700 further includes:
the selection module is used for selecting one frame of image from three continuous frame images as a sample training image, and taking the other two frames of images as reference training images;
the second acquisition module is specifically configured to:
carrying out degradation treatment on the sample training image by using the projected Gaussian noise SGN to obtain a first sample degradation training image;
performing repositioning random degradation treatment on the first sample degradation training image to obtain a second sample degradation training image;
Performing degradation treatment on the second sample degradation training image by utilizing anisotropic Gaussian image blurring to obtain a third sample degradation training image;
and realizing self-adaptive regional degradation when carrying out degradation treatment on the sample training image, the first sample degradation training image and the second sample degradation training image by utilizing a regional self-adaptive degradation strategy, and obtaining an optimized third sample degradation training image as a degradation training image.
In some embodiments, the training module is specifically configured to:
inputting the degraded training image into a first encoder for encoding to obtain a sample encoding vector corresponding to the degraded training image; inputting the reference training image into a second encoder for encoding to obtain a sample reference encoding vector corresponding to the reference training image;
inputting the sample coding vector and the sample reference coding vector into a pixel aggregation module for aggregation treatment to obtain a first sample aggregation vector; inputting the first sample aggregate vector into an image block aggregate module for aggregate treatment to obtain a second sample aggregate vector;
inputting the second sample aggregate vector into a pixel aggregate module and an image block aggregate module for iterative aggregate processing, and then analogizing until the preset requirement is met, so as to obtain a sample aggregate vector; inputting the sample aggregate vector into a reconstructor for image reconstruction to obtain a sample enhanced image corresponding to the degraded training image;
Calculating the value of the target loss function by using the pixel values of the sample enhanced image and the sample training image, and training the initial multi-view image aggregation model by using the value until the value meets the preset condition, and stopping updating the model parameters to obtain the trained multi-view image aggregation model.
In some embodiments, the sample enhanced image and the sample training image each comprise K pixels, where K is a positive integer greater than 0; the training module is also specifically configured to:
respectively subtracting the pixel values of the K pixel points of the sample enhanced image from the pixel values of the corresponding pixel points in the K pixel points of the sample training image to obtain K subtraction results;
and taking the absolute values of the K subtraction results respectively, then carrying out summation average calculation, and taking the obtained calculation result as the value of the target loss function.
In some embodiments, the matching module 702 is specifically configured to:
and selecting a first reference image and a second reference image which are most similar to the target image from other view images rendered by NeRF by using a multi-view similarity matching algorithm.
In some embodiments, the matching module 702 is specifically configured to:
simplifying the NeRF rendering process into a 3D bounding sphere, determining N view angles for projecting the same amount of light rays to the sphere, and taking a view angle image corresponding to the ith view angle in the N view angles as a target image; wherein N is a positive integer greater than 1; i is a positive integer greater than 0 and not greater than N;
For the ith intersection point of the sphere projected by the ith view angle, the point with the smallest Euclidean distance between the ith intersection point and the point is calculated as the intersection point which is most matched with the ith intersection point from all intersection points of the sphere projected by the jth view angle in N view angles; wherein j is a positive integer greater than 0 and not greater than N;
and analogically, projecting the first view angle to all the intersection points of the sphere from the jth view angle, respectively calculating the intersection points which are most matched with the intersection points of the sphere projected from the ith view angle, and calculating the first cost of the ith view angle matched with the jth view angle by utilizing the intersection points;
for the j-th intersection point of the sphere projected from the j-th view angle in the N view angles, calculating the point with the smallest Euclidean distance between the point and the j-th intersection point as the intersection point which is most matched with the j-th intersection point;
and then, respectively calculating the intersection points which are most matched with all intersection points of the sphere projected by the jth view from all intersection points of the sphere projected by the ith view, and calculating the second cost of the jth view matched with the ith view by utilizing the intersection points;
summing the first cost and the second cost, and taking the obtained calculation result as the matching cost between the ith view angle and the jth view angle;
And by analogy, calculating the matching cost between the ith view angle and each view angle in the N view angles; and selecting two view images corresponding to the minimum matching cost from the two view images to serve as a first reference image and a second reference image which are most similar to the target image respectively.
In some embodiments, the enhancement module 703 is specifically configured to:
inputting the target image into a first encoder of a multi-view image aggregation model to obtain a target coding vector corresponding to the target image; inputting the first reference image and the second reference image into a second encoder of the multi-view image aggregation model to obtain a target reference vector;
inputting the target coding vector and the target reference vector into a pixel aggregation module and an image block aggregation module of the multi-view image aggregation model to carry out iterative aggregation treatment to obtain a target aggregation vector;
and inputting the target aggregate vector into a reconstructor of the multi-view image aggregate model for image reconstruction to obtain an image enhancement result corresponding to the target image.
The respective modules in the above-described image enhancement apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In some embodiments, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 8. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used to store the design drawings. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement the steps in the image enhancement method described above.
In some embodiments, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 9. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input device. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement the steps in the image enhancement method described above. The display unit of the computer device is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen; the input device of the computer equipment can be a touch layer covered on a display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structures shown in fig. 8 or 9 are merely block diagrams of portions of structures associated with aspects of the application and are not intended to limit the computer device to which aspects of the application may be applied, and that a particular computer device may include more or fewer components than those shown, or may combine certain components, or may have a different arrangement of components.
In some embodiments, a computer device is provided, comprising a memory storing a computer program and a processor implementing the steps of the method embodiments described above when the computer program is executed.
In some embodiments, an internal structural diagram of a computer-readable storage medium is provided as shown in fig. 10, the computer-readable storage medium storing a computer program that, when executed by a processor, implements the steps of the method embodiments described above.
In some embodiments, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
It should be noted that, the user related information (including but not limited to user equipment information, user operation information, etc.) and data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.
Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program, which may be stored on a non-transitory computer readable storage medium and which, when executed, may comprise the steps of the above-described embodiments of the methods. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application, which are described in detail and are not to be construed as limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims (11)

1. An image enhancement method, comprising:
acquiring a target image to be enhanced; the target image is an image rendered through a nerve radiation field NeRF;
selecting a first reference image and a second reference image which are most similar to the target image from other visual angle images rendered by the NeRF;
inputting the target image, the first reference image and the second reference image into a pre-constructed multi-view image aggregation model for image enhancement processing to obtain an image enhancement result corresponding to the target image;
The multi-view image aggregation model is obtained by performing iterative aggregation training on a pixel level and an image block level on a degraded training image and a reference training image; the degradation training image is obtained by carrying out degradation treatment on the sample training image; the reference training image and the sample training image are consecutive frame images contained in a sample video sequence.
2. The method according to claim 1, wherein the multi-view image aggregation model is constructed as follows:
acquiring a sample video sequence, and synthesizing a degradation training image from the sample video sequence by using an image degradation simulator; the sample video sequence comprises a sample training image and a reference training image;
inputting the degradation training image and the reference training image into an initial multi-view image aggregation model, calculating the value of a target loss function by utilizing the pixel values of a sample enhanced image and the sample training image output by the model, and stopping updating model parameters until the value meets a preset condition, and training to obtain the multi-view image aggregation model;
the initial multi-view image aggregation model comprises a first encoder, a second encoder, a pixel aggregation module, an image block aggregation module and a reconstructor; and the value of the target loss function is used for updating constraint model parameters so as to improve the enhancement effect of the sample enhanced image.
3. The method of claim 2, wherein the sample video sequence comprises three consecutive frame images; the method further comprises the steps of:
selecting one frame of image from the three continuous frame images as a sample training image, and taking the other two frames of images as reference training images;
the synthesizing, with an image degradation simulator, a degraded training image from the sample video sequence, comprising:
performing degradation treatment on the sample training image by using projected Gaussian noise SGN to obtain a first sample degradation training image;
performing repositioning random degradation treatment on the first sample degradation training image to obtain a second sample degradation training image;
performing degradation treatment on the second sample degradation training image by utilizing anisotropic Gaussian image blurring to obtain a third sample degradation training image;
and realizing self-adaptive regional degradation when the sample training image, the first sample degradation training image and the second sample degradation training image are subjected to degradation treatment by utilizing a regional self-adaptive degradation strategy, and obtaining an optimized third sample degradation training image as a degradation training image.
4. The method according to claim 2, wherein the inputting the degraded training image and the reference training image into an initial multi-view image aggregation model, calculating a value of a target loss function by using pixel values of a sample enhanced image and the sample training image output in the model, and stopping updating model parameters until the value meets a preset condition, and training to obtain the multi-view image aggregation model includes:
Inputting the degraded training image into the first encoder for encoding to obtain a sample encoding vector corresponding to the degraded training image; inputting the reference training image into the second encoder for encoding to obtain a sample reference encoding vector corresponding to the reference training image;
inputting the sample coding vector and the sample reference coding vector into the pixel aggregation module for aggregation treatment to obtain a first sample aggregation vector; inputting the first sample aggregation vector into the image block aggregation module for aggregation treatment to obtain a second sample aggregation vector;
inputting the second sample aggregate vector into the pixel aggregate module and the image block aggregate module for iterative aggregate processing, and then analogizing until the second sample aggregate vector meets the preset requirement to obtain a sample aggregate vector; inputting the sample aggregate vector into the reconstructor for image reconstruction to obtain a sample enhanced image corresponding to the degraded training image;
calculating the value of the target loss function by using the pixel values of the sample enhanced image and the sample training image, and training the initial multi-view image aggregation model by using the value until the value meets a preset condition, stopping updating the model parameters, and obtaining the trained multi-view image aggregation model.
5. The method of claim 4, wherein the sample enhanced image and the sample training image each comprise K pixels, the K being a positive integer greater than 0; the calculating the value of the target loss function by using the pixel values of the sample enhanced image and the sample training image comprises the following steps:
respectively subtracting the pixel values of the K pixel points of the sample enhanced image from the pixel values of the corresponding pixel points in the K pixel points of the sample training image to obtain K subtraction results;
and respectively taking absolute values of the K subtraction results, then carrying out summation average calculation, and taking the obtained calculation results as the value of the target loss function.
6. The method of claim 1, wherein the selecting the first and second reference images that are most similar to the target image from the other perspective images rendered by the NeRF comprises:
and selecting a first reference image and a second reference image which are most similar to the target image from other visual angle images rendered by the NeRF by using a multi-visual angle similarity matching algorithm.
7. The method of claim 6, wherein selecting the first reference image and the second reference image that are most similar to the target image from the other perspective images rendered by the NeRF using a multi-perspective similarity matching algorithm comprises:
Simplifying the scene rendered by the NeRF into a scene in which the same amount of light rays are projected to a 3D bounding sphere through N view angles, and taking a view angle image corresponding to the ith view angle in the N view angles as the target image; the N is a positive integer greater than 1; the i is a positive integer which is more than 0 and not more than N;
for the ith intersection point of the sphere projected from the ith view angle, the point with the smallest Euclidean distance between the ith intersection point and the point is calculated as the intersection point which is most matched with the ith intersection point from all intersection points of the sphere projected from the jth view angle in the N view angles; the j is a positive integer which is more than 0 and not more than N;
and analogically, respectively calculating the intersection points which are most matched with all intersection points of the sphere projected from the ith view angle from all intersection points of the sphere projected from the jth view angle, and calculating the first cost of the ith view angle matched with the jth view angle by utilizing the intersection points;
for the j-th intersection point of the sphere projected from the j-th view angle in the N view angles, calculating a point with the smallest Euclidean distance between the point and the j-th intersection point as an intersection point which is most matched with the j-th intersection point;
And analogically, respectively calculating the intersection points which are most matched with all intersection points of the sphere projected by the j-th view angle from all intersection points of the sphere projected by the i-th view angle, and calculating the second cost of the j-th view angle matched with the i-th view angle by utilizing the intersection points;
summing the first cost and the second cost, and taking the obtained calculation result as the matching cost between the ith view angle and the jth view angle;
and by analogy, calculating the matching cost between the ith view angle and each view angle in the N view angles; and selecting two view images corresponding to the minimum matching cost from the two view images to be respectively used as a first reference image and a second reference image which are most similar to the target image.
8. The method according to any one of claims 1 to 7, wherein the inputting the target image, the first reference image, and the second reference image into a pre-constructed multi-view image aggregation model for image enhancement processing, to obtain an image enhancement result corresponding to the target image, includes:
inputting the target image into a first encoder of the multi-view image aggregation model to obtain a target coding vector corresponding to the target image; inputting the first reference image and the second reference image into a second encoder of the multi-view image aggregation model to obtain a target reference vector;
Inputting the target coding vector and the target reference vector into a pixel aggregation module and an image block aggregation module of the multi-view image aggregation model to perform iterative aggregation processing to obtain a target aggregation vector;
inputting the target aggregate vector into a reconstructor of the multi-view image aggregate model for image reconstruction to obtain an image enhancement result corresponding to the target image.
9. An image enhancement apparatus, comprising:
the first acquisition module is used for acquiring a target image to be enhanced; the target image is an image rendered through a nerve radiation field NeRF;
the matching module is used for selecting a first reference image and a second reference image which are most similar to the target image from other visual angle images rendered by the NeRF;
the enhancement module is used for inputting the target image, the first reference image and the second reference image into a pre-constructed multi-view image aggregation model to carry out image enhancement processing, so as to obtain an image enhancement result corresponding to the target image;
the multi-view image aggregation model is obtained by performing aggregation iterative training on a pixel level and an image block level on a degradation training image and a reference training image; the degradation training image is obtained by carrying out degradation treatment on the sample training image; the reference training image and the sample training image are consecutive frame images contained in a sample video sequence.
10. A computer device comprising a processor and a memory, the memory having stored therein a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 8 when the computer program is executed.
11. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 8.
CN202311384163.XA 2023-10-24 2023-10-24 Image enhancement method, device, equipment and storage medium Pending CN117115047A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311384163.XA CN117115047A (en) 2023-10-24 2023-10-24 Image enhancement method, device, equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311384163.XA CN117115047A (en) 2023-10-24 2023-10-24 Image enhancement method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
CN117115047A true CN117115047A (en) 2023-11-24

Family

ID=88811430

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311384163.XA Pending CN117115047A (en) 2023-10-24 2023-10-24 Image enhancement method, device, equipment and storage medium

Country Status (1)

Country Link
CN (1) CN117115047A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117745597A (en) * 2024-02-21 2024-03-22 荣耀终端有限公司 Image processing method and related device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115984494A (en) * 2022-12-13 2023-04-18 辽宁工程技术大学 Deep learning-based three-dimensional terrain reconstruction method for lunar navigation image
CN116912148A (en) * 2023-09-12 2023-10-20 深圳思谋信息科技有限公司 Image enhancement method, device, computer equipment and computer readable storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115984494A (en) * 2022-12-13 2023-04-18 辽宁工程技术大学 Deep learning-based three-dimensional terrain reconstruction method for lunar navigation image
CN116912148A (en) * 2023-09-12 2023-10-20 深圳思谋信息科技有限公司 Image enhancement method, device, computer equipment and computer readable storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KUN ZHOU等: ""NeRFLiX: High-Quality Neural View Synthesis by Learning a Degradation-Driven Inter-viewpoint MiXer"", 《ARXIV》, pages 1 - 18 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117745597A (en) * 2024-02-21 2024-03-22 荣耀终端有限公司 Image processing method and related device

Similar Documents

Publication Publication Date Title
Wang et al. Nerf-sr: High quality neural radiance fields using supersampling
Ma et al. Deblur-nerf: Neural radiance fields from blurry images
Liu et al. Video frame synthesis using deep voxel flow
Insafutdinov et al. Unsupervised learning of shape and pose with differentiable point clouds
Mihajlovic et al. KeypointNeRF: Generalizing image-based volumetric avatars using relative spatial encoding of keypoints
CN110378838B (en) Variable-view-angle image generation method and device, storage medium and electronic equipment
US11823322B2 (en) Utilizing voxel feature transformations for view synthesis
US20220239844A1 (en) Neural 3D Video Synthesis
WO2022164895A2 (en) Neural 3d video synthesis
CN117115047A (en) Image enhancement method, device, equipment and storage medium
Yu et al. Luminance attentive networks for HDR image and panorama reconstruction
US20230274400A1 (en) Automatically removing moving objects from video streams
CN115564639A (en) Background blurring method and device, computer equipment and storage medium
Song et al. Weakly-supervised stitching network for real-world panoramic image generation
Ma et al. Recovering realistic details for magnification-arbitrary image super-resolution
Zhou et al. NeRFLix: High-quality neural view synthesis by learning a degradation-driven inter-viewpoint mixer
Roessle et al. Ganerf: Leveraging discriminators to optimize neural radiance fields
Wang et al. Masked space-time hash encoding for efficient dynamic scene reconstruction
CN116912148B (en) Image enhancement method, device, computer equipment and computer readable storage medium
Peng et al. PDRF: progressively deblurring radiance field for fast scene reconstruction from blurry images
Hu et al. CNN-based deghosting in high dynamic range imaging
Hara et al. Spherical image generation from a few normal-field-of-view images by considering scene symmetry
Zhang et al. Fast and flexible stack‐based inverse tone mapping
Yang et al. An end‐to‐end perceptual enhancement method for UHD portrait images
Ahn et al. Texture enhancement via high-resolution style transfer for single-image super-resolution

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination