CN116385305A

CN116385305A - Cross-region transducer-based image shadow removing method and system for nerve radiation field

Info

Publication number: CN116385305A
Application number: CN202310378434.4A
Authority: CN
Inventors: 王波; 国英龙; 杨巨成; 王伟; 刘海涛; 贾智洋; 魏峰; 徐振宇; 孙笑; 王嫄; 陈亚瑞; 张传雷
Original assignee: Siteng Heli Tianjin Technology Co ltd
Current assignee: Siteng Heli Tianjin Technology Co ltd
Priority date: 2023-04-11
Filing date: 2023-04-11
Publication date: 2023-07-04

Abstract

The invention provides an image shadow removing method and system of a nerve radiation field based on a transregional transducer, comprising the following steps: acquiring a fern data set under the surf_llff_data; constructing a shadow removing network model fusing an MLP neural network and a cross-region transducer; initializing the shadow removing network model, selecting an optimizer, and setting network training parameters; optimizing the shadow removing network model by using a loss function and storing; and loading an optimal shadow removing network model generated in the training process, acquiring a test set, inputting the test set into the shadow removing network model, and rendering to generate an image without shadow. The invention uses the MLP neural network fused into NeRF in the transregional Transformer (CRFormer) for high quality shadow removal, rendering high quality shadow-removed images.

Description

Cross-region transducer-based image shadow removing method and system for nerve radiation field

Technical Field

The invention belongs to the field of computer vision, and particularly relates to an image shadow removing method and system for a nerve radiation field based on a transregional transducer.

Background

The viewpoint synthesis based on images is an important problem commonly focused in the fields of computer graphics and computer vision, and mainly refers to using a plurality of images of known shooting viewpoints as input to express the properties of geometry, appearance, illumination and the like of three-dimensional objects or scenes shot by the images, so that images of other non-shooting viewpoints can be synthesized, and finally, a drawing result with high reality is obtained. Compared with the traditional process of combining three-dimensional reconstruction with graphic drawing, the method can obtain a photo-level synthesis effect.

While conventional computer graphics allow for the generation of high quality controllable scene images, all physical parameters of the scene, such as camera parameters, illuminance, and material of the object, need to be provided as inputs. These physical properties need to be estimated from existing observations (such as images and video) if one wants to generate a controllable image of the real world scene. This estimation task, known as reverse rendering, is very challenging, especially when the target is photo-realistic synthesis. In contrast, neural rendering is a rapidly emerging field that can compactly represent scenes, and by using neural networks, rendering can be learned from existing observations. The main idea of neural rendering is to combine the insight of classical computer graphics with the latest evolution of deep learning. Similar to classical computer graphics, the goal of neural rendering is to generate photo-realistic images in a controlled manner.

With the rise of neural rendering (NeRF) technology, a similar method is also expanded into the field of viewpoint synthesis, a three-dimensional scene or model is represented by using a neural radiation field, and a volume rendering method is combined, so that the representation method is successfully applied to the field of viewpoint synthesis, and a high-quality synthesis effect is obtained. And the method is optimized and expanded. NeRF, as an implicit representation, provides a new idea for traditional graphics processing methods, namely processing images from the perspective of an implicit neural representation, or neural field.

But the neural rendering results may appear as bad shadows and reduce visual quality. Shadows can also affect the characteristic representation of an image and may adversely affect subsequent image, video processing tasks.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provides an image shadow removing method and system for a nerve radiation field based on a transregional transducer.

In order to achieve the above purpose, the technical scheme of the invention is realized as follows:

an image shadow removal method of a transregional transducer-based neural radiation field, comprising:

s1: acquiring a fern data set under the surf_llff_data;

s2: constructing a shadow removing network model fusing an MLP neural network and a cross-region transducer;

s3: initializing the shadow removing network model, selecting an optimizer, and setting network training parameters;

s4: optimizing the shadow removing network model by using a loss function and storing;

s5: and loading an optimal shadow removing network model generated in the training process, acquiring a test set, inputting the test set into the shadow removing network model, and rendering to generate an image without shadow.

Further, in step S2, a CRFormer module is added to the MLP neural network, where the CRFormer module is used to remove shadows in the image, and a double encoder in the CRFormer module is used to extract a given shadow image, then a cross-region alignment block is used to absorb a shadow portion, and finally the CRFormer module is used to recover the shadow portion; the MLP neural network is used to render a composite of images.

Further, step S3 establishes the shadow removing network model by using a pytorch framework, selects gradient back propagation calculation for training, and initializes the learning rate.

Further, in step S4, the reconstruction loss and the spatial loss are optimized using the loss function, the shadow image is removed, and the average value of the local area of the image is processed in the MLP network.

In another aspect, the present invention further provides an image shadow removing system for a nerve radiation field based on a transregional transducer, including:

the data set module acquires a fern data set under the surf_llff_data;

the model module is used for constructing a shadow removing network model fusing the MLP neural network and the cross-region transducer;

the initialization module initializes the shadow removing network model, selects an optimizer and sets network training parameters;

the optimization module optimizes the shadow removing network model by using the loss function and stores the shadow removing network model;

and the optimal module loads an optimal shadow removing network model generated in the training process, acquires a test set, inputs the test set into the shadow removing network model, and renders and generates an image without shadow.

Further, the model module comprises an MLP neural network module and a CRFormer module, wherein the CRFormer module is used for removing shadows in images, firstly, a double encoder in the CRFormer module is used for extracting a given shadow image, then a cross-region alignment block is used for absorbing shadow parts, and finally, the CRFormer module is used for recovering the shadow parts; the MLP neural network module is used to render a composite of images.

Furthermore, the initialization module establishes the shadow removing network model by adopting a pytorch framework, selects gradient back propagation calculation for training, and initializes the learning rate.

Further, the method for image shadow removal based on transregional fransformer neural radiation field of claim 1, wherein the optimizing module uses a loss function to optimize reconstruction loss and spatial loss, remove shadow images, and process the average of local areas of the image in the MLP neural network.

Compared with the prior art, the invention has the following beneficial effects:

1. the present invention uses a MLP neural network fused into NeRF across region Transformer (CRFormer) for high quality shadow removal;

2. the invention aggregates the pixel characteristics of the non-shadow area into the restored shadow area characteristics through the new area perceived cross-attention (RCA) proposed in the CRFormer; a high quality shadow-removed image is rendered compared to the original neural radiation field.

Drawings

FIG. 1 is a schematic flow chart of an embodiment of the present invention;

FIG. 2 is a diagram of a method of computing a cross-region alignment block according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of an MLP neural network according to an embodiment of the present invention.

Detailed Description

It should be noted that, without conflict, the embodiments of the present invention and features of the embodiments may be combined with each other.

Cross-region Transformer (CRFormer) is used for image shadow high quality removal, taking into account all pixels from non-shadow regions to help recover each shadow pixel, taking full advantage of potential context cues from non-shadow regions to remove shadows.

Aiming at the shadow problem of the neural rendering result, the invention provides an image shadow removing method of a neural radiation field based on a transregional transducer, and CRFormer is applied to the neural radiation field to realize shadow removing of the image.

The following describes the embodiments of the present invention with reference to the drawings.

FIG. 1 is a flowchart of a method for shadow removal of images of a transregional transducer-based neural radiation field according to the present invention, comprising:

step 1: and acquiring a fern data set under the nerf_llff_data.

And a fern data set under the nerf_llff_data in the NeRF official data set is adopted, the data set comprises 72 training pictures, 20 verification pictures and 20 test pictures, and the angles of the pictures are different.

Step 2: fusion MLP neural networks and trans-former formats were employed for shadow removal.

And adding a CRFormer module into the MLP network by adopting a fused MLP neural network and a cross-region Transformer (CRFormer) model, firstly extracting a given shadow image by using a double encoder in the CRFormer module, then absorbing a shadow part by using a cross-region alignment block, and finally recovering the shadow part by using the CRFormer. The model mainly comprises: CRFormer module, MLP neural network module. The CRFormer module is used for removing shadows in the image, and the MLP neural network is used for rendering the synthesis of the image.

As shown in fig. 1, a new cross-region Transformer (CRFormer) is employed, in CRFormer, a dual encoder architecture design is employed for extracting asymmetric features.

Firstly, extracting asymmetric characteristics between two paths of a given shadow image and a shadow mask thereof by using a double encoder (NS Path, S Path); then, the proposed transducer layer with N cross-region alignment blocks absorbs the characteristics of both shadow and non-shadow regions, establishing a connection from the non-shadow region to the shadow region, which is achieved by newly designed region-aware cross-attention. In this way, the proposed CRFormer can recover the intensity of each shadow pixel in the shadow region with enough context information from the non-shadow region. Then, the outputs of the series of cross-region alignment blocks are fed into a single decoder to achieve the de-shadowing result; finally, post-processing is performed by using a lightweight U-shaped network to redetermine the obtained shadow removal result.

To reduce interference caused by convolutions between shadow pixels and non-shadow pixels, features within each region are extracted to provide non-shadow region features of interest, and a top encoder (non-shadow path) is constructed on a shallow sub-net using three convolutions, including two 3 x 3 average pooling convolutions, to sample the feature map; a 1 x 1 convolution adjusts the dimension of the feature map to match the dimension of the bottom encoder output. The bottom encoder of the shadow path is a deeper encoder consisting of several convolutions and residual blocks, where the step size of the two convolutions is set to 2, sampling the feature map. Image semantic segmentation primarily serves to refine the quality of the shadow removal of those three pictures in fig. 1.

As shown in fig. 2, to recover shadow pixels, it is important to fully explore and exploit the potential context cues of non-shadow regions. The present invention therefore proposes a new type of Transformer layer with region-aware cross-attention (RAC) that transfers sufficient context information from non-shadow regions to shadow regions. Within the transducer layer, the transducer layer with N cross-region alignment blocks absorbs the characteristics of both shadow and non-shadow regions, establishing a connection from the non-shadow region to the shadow region, by newly designed region-aware cross-over. In this way, the proposed CRFormer can recover the intensity of each shadow pixel in the shadow region with enough context information from the non-shadow region. Then, the outputs of the series of cross-region alignment blocks are fed into a single decoder to achieve the de-shadowing result; finally, post-processing is performed by using a lightweight U-shaped network to redetermine the obtained shadow removal result.

Step 3: initializing a network model, selecting an optimizer, and setting parameters of network training.

And (3) establishing a network model by adopting a pytorch framework, selecting gradient back propagation calculation for training, and selecting 20 pictures from the data set as a test data set and 72 pictures as a training data set. batch is set to 64 and the learning rate is dynamically decremented from 0.001 to 0.00015.

Step 4: the network model is optimized and saved using the loss function.

Crformars are trained in an end-to-end fashion. The functional formula of the total loss (L) is:

L＝ω ₁ L _rec +ω ₂ L _spa (1)

wherein L is _rec To reconstruct the loss, L _spa For space loss omega ₁ And omega ₂ Is the weight of the different penalty terms.

Specifically, the pixel level L1 distance is adopted to ensure that the pixel intensity of the shadow removing result is consistent with the pixel intensity of the real image, and the calculation formula is as follows:

L _rec ＝||^I-I ^gt || ₁ +||I ^r -I ^gt || ₁ (2)

wherein ζ represents a shadow-removed image, I ^r Representing the intensity of the shadow-removed pixels, I being the real image, I ^gt For true image pixel intensities, 1 represents an empirically set representation pixel level distance.

In addition, the spatial consistency of the images is enhanced by preserving the variability between adjacent regions of the shadow-free image and its corresponding shadow-free version, calculated as:

L _spa ＝Φ(^I ，I ^gt )+Φ(I ^r ，I ^gt ) (3)

wherein ζ represents a shadow-removed image, I ^r Representing the intensity of the shadow-removed pixels, I being the real image, I ^gt For true image pixel intensity, Φ is the weight lost.

The loss function can remove shadow images, and can also process the average value of the local areas of the images in the MLP network, so that the feature extraction effect of the MLP network is improved.

Step 5: and loading an optimal network model generated in the training process, acquiring a test set, inputting the test set into the network model, and rendering to generate an image without shadow.

Loading the trained network model, and generating a shadow-removed rendering image result by using the images in the data set.

The foregoing description is only of the preferred embodiments of the present application and is not intended to limit the same, but rather, various modifications and variations may be made by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application.

Claims

1. A transregional transducer-based image shadow removal method for a neural radiation field, comprising:

s1: acquiring a fern data set under the surf_llff_data;

2. The method for removing shadows from an image of a transregional fransformer-based neural radiation field according to claim 1, wherein in step S2, a CRFormer module is added to the MLP neural network, the CRFormer module is used to remove shadows in the image, a given shadow image is first extracted by a double encoder in the CRFormer module, then a transregional alignment block is used to absorb the shadow portion, and finally the CRFormer module is used to restore the shadow portion; the MLP neural network is used to render a composite of images.

3. The method for image shadow removal of a transregional fransformer-based neural radiation field of claim 1, wherein step S3 uses a pytorch framework to build the shadow removal network model, selects gradient back propagation calculations for training, and initializes a learning rate.

4. The method of claim 1, wherein the step S4 is performed by optimizing reconstruction loss and spatial loss using a loss function, removing shadow images, and processing the average value of the local area of the image in the MLP network.

5. An image shadow removal system for a transregional transducer-based neural radiation field, comprising:

the data set module acquires a fern data set under the surf_llff_data;

6. The cross-regional fransformer based neural radiation field image shadow removal system of claim 5, wherein the model modules include an MLP neural network module and a CRFormer module, the CRFormer module is used to remove shadows in the image, a given shadow image is first extracted by a double encoder in the CRFormer module, then a cross-regional alignment block is used to absorb the shadow portion, and finally the CRFormer module is used to restore the shadow portion; the MLP neural network module is used to render a composite of images.

7. The transregional fransformer-based neural radiation field image shadow removal system of claim 5, wherein the initialization module uses a pytorch framework to build the shadow removal network model, selects gradient back propagation calculations for training, and initializes a learning rate.

8. The transregional fransformer based image shadow removal system of claim 5, wherein the optimization module uses a loss function to optimize reconstruction loss and spatial loss, remove shadow images, and process the average of local areas of the image in the MLP neural network.