CN116385305A - Cross-region transducer-based image shadow removing method and system for nerve radiation field - Google Patents
Cross-region transducer-based image shadow removing method and system for nerve radiation field Download PDFInfo
- Publication number
- CN116385305A CN116385305A CN202310378434.4A CN202310378434A CN116385305A CN 116385305 A CN116385305 A CN 116385305A CN 202310378434 A CN202310378434 A CN 202310378434A CN 116385305 A CN116385305 A CN 116385305A
- Authority
- CN
- China
- Prior art keywords
- shadow
- image
- module
- network model
- crformer
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 230000005855 radiation Effects 0.000 title claims abstract description 19
- 210000005036 nerve Anatomy 0.000 title abstract description 6
- 238000013528 artificial neural network Methods 0.000 claims abstract description 23
- 238000012549 training Methods 0.000 claims abstract description 19
- 238000009877 rendering Methods 0.000 claims abstract description 16
- 238000012360 testing method Methods 0.000 claims abstract description 14
- 230000008569 process Effects 0.000 claims abstract description 10
- 230000001537 neural effect Effects 0.000 claims description 21
- 230000006870 function Effects 0.000 claims description 10
- 238000004364 calculation method Methods 0.000 claims description 6
- 239000002131 composite material Substances 0.000 claims description 4
- 238000005457 optimization Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 230000015572 biosynthetic process Effects 0.000 description 7
- 238000003786 synthesis reaction Methods 0.000 description 7
- 230000000694 effects Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012805 post-processing Methods 0.000 description 2
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000005286 illumination Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 230000000704 physical effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 238000003672 processing method Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/80—Geometric correction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/50—Image enhancement or restoration using two or more images, e.g. averaging or subtraction
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20212—Image combination
- G06T2207/20221—Image fusion; Image merging
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y04—INFORMATION OR COMMUNICATION TECHNOLOGIES HAVING AN IMPACT ON OTHER TECHNOLOGY AREAS
- Y04S—SYSTEMS INTEGRATING TECHNOLOGIES RELATED TO POWER NETWORK OPERATION, COMMUNICATION OR INFORMATION TECHNOLOGIES FOR IMPROVING THE ELECTRICAL POWER GENERATION, TRANSMISSION, DISTRIBUTION, MANAGEMENT OR USAGE, i.e. SMART GRIDS
- Y04S10/00—Systems supporting electrical power generation, transmission or distribution
- Y04S10/50—Systems or methods supporting the power network operation or management, involving a certain degree of interaction with the load-side end user applications
Landscapes
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Radiation-Therapy Devices (AREA)
Abstract
The invention provides an image shadow removing method and system of a nerve radiation field based on a transregional transducer, comprising the following steps: acquiring a fern data set under the surf_llff_data; constructing a shadow removing network model fusing an MLP neural network and a cross-region transducer; initializing the shadow removing network model, selecting an optimizer, and setting network training parameters; optimizing the shadow removing network model by using a loss function and storing; and loading an optimal shadow removing network model generated in the training process, acquiring a test set, inputting the test set into the shadow removing network model, and rendering to generate an image without shadow. The invention uses the MLP neural network fused into NeRF in the transregional Transformer (CRFormer) for high quality shadow removal, rendering high quality shadow-removed images.
Description
Technical Field
The invention belongs to the field of computer vision, and particularly relates to an image shadow removing method and system for a nerve radiation field based on a transregional transducer.
Background
The viewpoint synthesis based on images is an important problem commonly focused in the fields of computer graphics and computer vision, and mainly refers to using a plurality of images of known shooting viewpoints as input to express the properties of geometry, appearance, illumination and the like of three-dimensional objects or scenes shot by the images, so that images of other non-shooting viewpoints can be synthesized, and finally, a drawing result with high reality is obtained. Compared with the traditional process of combining three-dimensional reconstruction with graphic drawing, the method can obtain a photo-level synthesis effect.
While conventional computer graphics allow for the generation of high quality controllable scene images, all physical parameters of the scene, such as camera parameters, illuminance, and material of the object, need to be provided as inputs. These physical properties need to be estimated from existing observations (such as images and video) if one wants to generate a controllable image of the real world scene. This estimation task, known as reverse rendering, is very challenging, especially when the target is photo-realistic synthesis. In contrast, neural rendering is a rapidly emerging field that can compactly represent scenes, and by using neural networks, rendering can be learned from existing observations. The main idea of neural rendering is to combine the insight of classical computer graphics with the latest evolution of deep learning. Similar to classical computer graphics, the goal of neural rendering is to generate photo-realistic images in a controlled manner.
With the rise of neural rendering (NeRF) technology, a similar method is also expanded into the field of viewpoint synthesis, a three-dimensional scene or model is represented by using a neural radiation field, and a volume rendering method is combined, so that the representation method is successfully applied to the field of viewpoint synthesis, and a high-quality synthesis effect is obtained. And the method is optimized and expanded. NeRF, as an implicit representation, provides a new idea for traditional graphics processing methods, namely processing images from the perspective of an implicit neural representation, or neural field.
But the neural rendering results may appear as bad shadows and reduce visual quality. Shadows can also affect the characteristic representation of an image and may adversely affect subsequent image, video processing tasks.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provides an image shadow removing method and system for a nerve radiation field based on a transregional transducer.
In order to achieve the above purpose, the technical scheme of the invention is realized as follows:
an image shadow removal method of a transregional transducer-based neural radiation field, comprising:
s1: acquiring a fern data set under the surf_llff_data;
s2: constructing a shadow removing network model fusing an MLP neural network and a cross-region transducer;
s3: initializing the shadow removing network model, selecting an optimizer, and setting network training parameters;
s4: optimizing the shadow removing network model by using a loss function and storing;
s5: and loading an optimal shadow removing network model generated in the training process, acquiring a test set, inputting the test set into the shadow removing network model, and rendering to generate an image without shadow.
Further, in step S2, a CRFormer module is added to the MLP neural network, where the CRFormer module is used to remove shadows in the image, and a double encoder in the CRFormer module is used to extract a given shadow image, then a cross-region alignment block is used to absorb a shadow portion, and finally the CRFormer module is used to recover the shadow portion; the MLP neural network is used to render a composite of images.
Further, step S3 establishes the shadow removing network model by using a pytorch framework, selects gradient back propagation calculation for training, and initializes the learning rate.
Further, in step S4, the reconstruction loss and the spatial loss are optimized using the loss function, the shadow image is removed, and the average value of the local area of the image is processed in the MLP network.
In another aspect, the present invention further provides an image shadow removing system for a nerve radiation field based on a transregional transducer, including:
the data set module acquires a fern data set under the surf_llff_data;
the model module is used for constructing a shadow removing network model fusing the MLP neural network and the cross-region transducer;
the initialization module initializes the shadow removing network model, selects an optimizer and sets network training parameters;
the optimization module optimizes the shadow removing network model by using the loss function and stores the shadow removing network model;
and the optimal module loads an optimal shadow removing network model generated in the training process, acquires a test set, inputs the test set into the shadow removing network model, and renders and generates an image without shadow.
Further, the model module comprises an MLP neural network module and a CRFormer module, wherein the CRFormer module is used for removing shadows in images, firstly, a double encoder in the CRFormer module is used for extracting a given shadow image, then a cross-region alignment block is used for absorbing shadow parts, and finally, the CRFormer module is used for recovering the shadow parts; the MLP neural network module is used to render a composite of images.
Furthermore, the initialization module establishes the shadow removing network model by adopting a pytorch framework, selects gradient back propagation calculation for training, and initializes the learning rate.
Further, the method for image shadow removal based on transregional fransformer neural radiation field of claim 1, wherein the optimizing module uses a loss function to optimize reconstruction loss and spatial loss, remove shadow images, and process the average of local areas of the image in the MLP neural network.
Compared with the prior art, the invention has the following beneficial effects:
1. the present invention uses a MLP neural network fused into NeRF across region Transformer (CRFormer) for high quality shadow removal;
2. the invention aggregates the pixel characteristics of the non-shadow area into the restored shadow area characteristics through the new area perceived cross-attention (RCA) proposed in the CRFormer; a high quality shadow-removed image is rendered compared to the original neural radiation field.
Drawings
FIG. 1 is a schematic flow chart of an embodiment of the present invention;
FIG. 2 is a diagram of a method of computing a cross-region alignment block according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an MLP neural network according to an embodiment of the present invention.
Detailed Description
It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.
Cross-region Transformer (CRFormer) is used for image shadow high quality removal, taking into account all pixels from non-shadow regions to help recover each shadow pixel, taking full advantage of potential context cues from non-shadow regions to remove shadows.
Aiming at the shadow problem of the neural rendering result, the invention provides an image shadow removing method of a neural radiation field based on a transregional transducer, and CRFormer is applied to the neural radiation field to realize shadow removing of the image.
The following describes the embodiments of the present invention with reference to the drawings.
FIG. 1 is a flowchart of a method for shadow removal of images of a transregional transducer-based neural radiation field according to the present invention, comprising:
step 1: and acquiring a fern data set under the nerf_llff_data.
And a fern data set under the nerf_llff_data in the NeRF official data set is adopted, the data set comprises 72 training pictures, 20 verification pictures and 20 test pictures, and the angles of the pictures are different.
Step 2: fusion MLP neural networks and trans-former formats were employed for shadow removal.
And adding a CRFormer module into the MLP network by adopting a fused MLP neural network and a cross-region Transformer (CRFormer) model, firstly extracting a given shadow image by using a double encoder in the CRFormer module, then absorbing a shadow part by using a cross-region alignment block, and finally recovering the shadow part by using the CRFormer. The model mainly comprises: CRFormer module, MLP neural network module. The CRFormer module is used for removing shadows in the image, and the MLP neural network is used for rendering the synthesis of the image.
As shown in fig. 1, a new cross-region Transformer (CRFormer) is employed, in CRFormer, a dual encoder architecture design is employed for extracting asymmetric features.
Firstly, extracting asymmetric characteristics between two paths of a given shadow image and a shadow mask thereof by using a double encoder (NS Path, S Path); then, the proposed transducer layer with N cross-region alignment blocks absorbs the characteristics of both shadow and non-shadow regions, establishing a connection from the non-shadow region to the shadow region, which is achieved by newly designed region-aware cross-attention. In this way, the proposed CRFormer can recover the intensity of each shadow pixel in the shadow region with enough context information from the non-shadow region. Then, the outputs of the series of cross-region alignment blocks are fed into a single decoder to achieve the de-shadowing result; finally, post-processing is performed by using a lightweight U-shaped network to redetermine the obtained shadow removal result.
To reduce interference caused by convolutions between shadow pixels and non-shadow pixels, features within each region are extracted to provide non-shadow region features of interest, and a top encoder (non-shadow path) is constructed on a shallow sub-net using three convolutions, including two 3 x 3 average pooling convolutions, to sample the feature map; a 1 x 1 convolution adjusts the dimension of the feature map to match the dimension of the bottom encoder output. The bottom encoder of the shadow path is a deeper encoder consisting of several convolutions and residual blocks, where the step size of the two convolutions is set to 2, sampling the feature map. Image semantic segmentation primarily serves to refine the quality of the shadow removal of those three pictures in fig. 1.
As shown in fig. 2, to recover shadow pixels, it is important to fully explore and exploit the potential context cues of non-shadow regions. The present invention therefore proposes a new type of Transformer layer with region-aware cross-attention (RAC) that transfers sufficient context information from non-shadow regions to shadow regions. Within the transducer layer, the transducer layer with N cross-region alignment blocks absorbs the characteristics of both shadow and non-shadow regions, establishing a connection from the non-shadow region to the shadow region, by newly designed region-aware cross-over. In this way, the proposed CRFormer can recover the intensity of each shadow pixel in the shadow region with enough context information from the non-shadow region. Then, the outputs of the series of cross-region alignment blocks are fed into a single decoder to achieve the de-shadowing result; finally, post-processing is performed by using a lightweight U-shaped network to redetermine the obtained shadow removal result.
Step 3: initializing a network model, selecting an optimizer, and setting parameters of network training.
And (3) establishing a network model by adopting a pytorch framework, selecting gradient back propagation calculation for training, and selecting 20 pictures from the data set as a test data set and 72 pictures as a training data set. batch is set to 64 and the learning rate is dynamically decremented from 0.001 to 0.00015.
Step 4: the network model is optimized and saved using the loss function.
Crformars are trained in an end-to-end fashion. The functional formula of the total loss (L) is:
L=ω 1 L rec +ω 2 L spa (1)
wherein L is rec To reconstruct the loss, L spa For space loss omega 1 And omega 2 Is the weight of the different penalty terms.
Specifically, the pixel level L1 distance is adopted to ensure that the pixel intensity of the shadow removing result is consistent with the pixel intensity of the real image, and the calculation formula is as follows:
L rec =||^I-I gt || 1 +||I r -I gt || 1 (2)
wherein ζ represents a shadow-removed image, I r Representing the intensity of the shadow-removed pixels, I being the real image, I gt For true image pixel intensities, 1 represents an empirically set representation pixel level distance.
In addition, the spatial consistency of the images is enhanced by preserving the variability between adjacent regions of the shadow-free image and its corresponding shadow-free version, calculated as:
L spa =Φ(^I ,I gt )+Φ(I r ,I gt ) (3)
wherein ζ represents a shadow-removed image, I r Representing the intensity of the shadow-removed pixels, I being the real image, I gt For true image pixel intensity, Φ is the weight lost.
The loss function can remove shadow images, and can also process the average value of the local areas of the images in the MLP network, so that the feature extraction effect of the MLP network is improved.
Step 5: and loading an optimal network model generated in the training process, acquiring a test set, inputting the test set into the network model, and rendering to generate an image without shadow.
Loading the trained network model, and generating a shadow-removed rendering image result by using the images in the data set.
The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the same, but rather, various modifications and variations may be made by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.
Claims (8)
1. A transregional transducer-based image shadow removal method for a neural radiation field, comprising:
s1: acquiring a fern data set under the surf_llff_data;
s2: constructing a shadow removing network model fusing an MLP neural network and a cross-region transducer;
s3: initializing the shadow removing network model, selecting an optimizer, and setting network training parameters;
s4: optimizing the shadow removing network model by using a loss function and storing;
s5: and loading an optimal shadow removing network model generated in the training process, acquiring a test set, inputting the test set into the shadow removing network model, and rendering to generate an image without shadow.
2. The method for removing shadows from an image of a transregional fransformer-based neural radiation field according to claim 1, wherein in step S2, a CRFormer module is added to the MLP neural network, the CRFormer module is used to remove shadows in the image, a given shadow image is first extracted by a double encoder in the CRFormer module, then a transregional alignment block is used to absorb the shadow portion, and finally the CRFormer module is used to restore the shadow portion; the MLP neural network is used to render a composite of images.
3. The method for image shadow removal of a transregional fransformer-based neural radiation field of claim 1, wherein step S3 uses a pytorch framework to build the shadow removal network model, selects gradient back propagation calculations for training, and initializes a learning rate.
4. The method of claim 1, wherein the step S4 is performed by optimizing reconstruction loss and spatial loss using a loss function, removing shadow images, and processing the average value of the local area of the image in the MLP network.
5. An image shadow removal system for a transregional transducer-based neural radiation field, comprising:
the data set module acquires a fern data set under the surf_llff_data;
the model module is used for constructing a shadow removing network model fusing the MLP neural network and the cross-region transducer;
the initialization module initializes the shadow removing network model, selects an optimizer and sets network training parameters;
the optimization module optimizes the shadow removing network model by using the loss function and stores the shadow removing network model;
and the optimal module loads an optimal shadow removing network model generated in the training process, acquires a test set, inputs the test set into the shadow removing network model, and renders and generates an image without shadow.
6. The cross-regional fransformer based neural radiation field image shadow removal system of claim 5, wherein the model modules include an MLP neural network module and a CRFormer module, the CRFormer module is used to remove shadows in the image, a given shadow image is first extracted by a double encoder in the CRFormer module, then a cross-regional alignment block is used to absorb the shadow portion, and finally the CRFormer module is used to restore the shadow portion; the MLP neural network module is used to render a composite of images.
7. The transregional fransformer-based neural radiation field image shadow removal system of claim 5, wherein the initialization module uses a pytorch framework to build the shadow removal network model, selects gradient back propagation calculations for training, and initializes a learning rate.
8. The transregional fransformer based image shadow removal system of claim 5, wherein the optimization module uses a loss function to optimize reconstruction loss and spatial loss, remove shadow images, and process the average of local areas of the image in the MLP neural network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310378434.4A CN116385305A (en) | 2023-04-11 | 2023-04-11 | Cross-region transducer-based image shadow removing method and system for nerve radiation field |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310378434.4A CN116385305A (en) | 2023-04-11 | 2023-04-11 | Cross-region transducer-based image shadow removing method and system for nerve radiation field |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116385305A true CN116385305A (en) | 2023-07-04 |
Family
ID=86967309
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310378434.4A Pending CN116385305A (en) | 2023-04-11 | 2023-04-11 | Cross-region transducer-based image shadow removing method and system for nerve radiation field |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116385305A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116883578A (en) * | 2023-09-06 | 2023-10-13 | 腾讯科技(深圳)有限公司 | Image processing method, device and related equipment |
CN117292040A (en) * | 2023-11-27 | 2023-12-26 | 北京渲光科技有限公司 | Method, apparatus and storage medium for new view synthesis based on neural rendering |
-
2023
- 2023-04-11 CN CN202310378434.4A patent/CN116385305A/en active Pending
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116883578A (en) * | 2023-09-06 | 2023-10-13 | 腾讯科技(深圳)有限公司 | Image processing method, device and related equipment |
CN116883578B (en) * | 2023-09-06 | 2023-12-19 | 腾讯科技(深圳)有限公司 | Image processing method, device and related equipment |
CN117292040A (en) * | 2023-11-27 | 2023-12-26 | 北京渲光科技有限公司 | Method, apparatus and storage medium for new view synthesis based on neural rendering |
CN117292040B (en) * | 2023-11-27 | 2024-03-08 | 北京渲光科技有限公司 | Method, apparatus and storage medium for new view synthesis based on neural rendering |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Jiang et al. | Learning to see moving objects in the dark | |
Guo et al. | Dense scene information estimation network for dehazing | |
CN116385305A (en) | Cross-region transducer-based image shadow removing method and system for nerve radiation field | |
Shih et al. | Exemplar-based video inpainting without ghost shadow artifacts by maintaining temporal continuity | |
CN109462747B (en) | DIBR system cavity filling method based on generation countermeasure network | |
KR102311796B1 (en) | Method and Apparatus for Deblurring of Human Motion using Localized Body Prior | |
US11880935B2 (en) | Multi-view neural human rendering | |
CN112991231B (en) | Single-image super-image and perception image enhancement joint task learning system | |
CN108648264A (en) | Underwater scene method for reconstructing based on exercise recovery and storage medium | |
CN113284061B (en) | Underwater image enhancement method based on gradient network | |
CN115115516B (en) | Real world video super-resolution construction method based on Raw domain | |
Li et al. | Uphdr-gan: Generative adversarial network for high dynamic range imaging with unpaired data | |
CN107301662A (en) | Compression restoration methods, device, equipment and the storage medium of depth image | |
Lv et al. | Low-light image enhancement via deep Retinex decomposition and bilateral learning | |
CN114972134A (en) | Low-light image enhancement method for extracting and fusing local and global features | |
Huang et al. | Removing reflection from a single image with ghosting effect | |
CN106412560B (en) | A kind of stereoscopic image generation method based on depth map | |
Chen et al. | CERL: A unified optimization framework for light enhancement with realistic noise | |
CN111064905A (en) | Video scene conversion method for automatic driving | |
CN114339030A (en) | Network live broadcast video image stabilization method based on self-adaptive separable convolution | |
CN114862707A (en) | Multi-scale feature recovery image enhancement method and device and storage medium | |
Kim et al. | Light field angular super-resolution using convolutional neural network with residual network | |
Zhang et al. | As-Deformable-As-Possible Single-image-based View Synthesis without Depth Prior | |
Xu et al. | Direction-aware video demoireing with temporal-guided bilateral learning | |
CN112200756A (en) | Intelligent bullet special effect short video generation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |