CN116934936A

CN116934936A - Three-dimensional scene style migration method, device, equipment and storage medium

Info

Publication number: CN116934936A
Application number: CN202311205617.2A
Authority: CN
Inventors: 陈尧森; 刘跃根; 罗天
Original assignee: Chengdu Sobey Digital Technology Co Ltd
Current assignee: Chengdu Sobey Digital Technology Co Ltd
Priority date: 2023-09-19
Filing date: 2023-09-19
Publication date: 2023-10-24

Abstract

The application discloses a three-dimensional scene style migration method, a device, equipment and a storage medium, which are characterized in that firstly, RGB images under a plurality of view angles are collected to serve as original images, the original images are subjected to data preprocessing to obtain camera position and posture information, then the original images and the camera position and posture information are input into a nerve radiation field model for training, an original three-dimensional scene is constructed, a style migration network is used for carrying out style migration on the original images and the style images in the original three-dimensional scene to obtain an original image after style migration, finally, the original image after style migration is used as supervision data, and the three-dimensional scene after style migration is obtained through optimization. Compared with the prior art, the method has better visual effect, and the whole nerve radiation field does not need to be trained again when facing different style pictures, so that the three-dimensional scene style migration of artistic styles and real scene styles can be realized, and the method has higher practical value.

Description

Three-dimensional scene style migration method, device, equipment and storage medium

Technical Field

The application relates to the technical field of computer vision and machine learning, in particular to a three-dimensional scene style migration method, a device, equipment and a storage medium.

Background

In recent years, based on the implicit three-dimensional representation of the neural radiation field (Neural Radiance Fields, neRF), great progress has been made, and the three-dimensional scene obtained by the method has a very strong sense of realism. In order to reduce the time of artistic creation and the requirement on expertise, the method has strong application value for carrying out style migration on the three-dimensional scene based on NeRF.

Currently, part of the method can realize that artistic features of a single 2D image are transferred into a complete real 3D scene, thereby changing the style in the real scene. The style migration results obtained by the method often have the problems of blurring, inconsistent appearance and artifacts, and the migration results of different style graphs need to be trained from the beginning again, so that the requirements of practical application are difficult to meet.

Disclosure of Invention

The application aims to overcome the defects of the prior art and provide a three-dimensional scene style migration method, a device, equipment and a storage medium, which can solve the problems of poor migration effect and repeated training in the prior art.

The aim of the application is achieved by the following technical scheme:

in a first aspect, the present application provides a three-dimensional scene style migration method, where the method includes:

collecting RGB images under a plurality of view angles as original images, and carrying out data preprocessing on the original images to obtain camera position and posture information;

inputting the original image and the camera position and posture information into a nerve radiation field model for training, and constructing an original three-dimensional scene;

performing style migration on the original image and the style image in the original three-dimensional scene by using a style migration network to obtain an original image after style migration;

and optimizing the original image after the style migration by taking the original image as supervision data to obtain a three-dimensional scene after the style migration.

In one possible implementation manner, the step of performing data preprocessing on the original image to obtain camera position and orientation information includes:

performing image screening and resolution adjustment on the original image to obtain an adjusted original image;

and respectively extracting image characteristic points of the adjusted original image, performing image stereo matching on the extracted image characteristic points under a plurality of view angles to generate sparse point clouds, and taking the sparse point clouds as camera position and posture information.

In one possible implementation, the neural radiation field model includes a dense voxel grid and a characteristic voxel grid, and the step of inputting the original image and the camera position and posture information into the neural radiation field model for training, and constructing an original three-dimensional scene includes:

inputting the original image and the camera position and pose information into a dense voxel grid and a characteristic voxel grid;

interpolation is carried out by using the dense voxel grids to inquire density information of the space point positions;

interpolation is carried out by using the characteristic voxel grid to inquire the color information of the space point;

obtaining a rendered image according to the density information and the color information by using a rendering formula;

the loss of the rendered image and the original picture is computed for back propagation.

In one possible embodiment, the density information is:wherein->Is a volume rendering function, +.>Activating a function for softplus, +.>For space point coordinates>In order to have a dense voxel grid,is a difference function;

the color information is:，/>for space point coordinates>Is a characteristic voxel grid;

the rendering formula is:+/>,/>is the attenuation parameter, K is the number of beams, < >>Is background color->Is the attenuation parameter at the K-th point.

In one possible implementation manner, the step of performing style migration on the original image and the style image in the original three-dimensional scene by using a style migration network to obtain a style migrated original image includes:

extracting style characteristics and content characteristics of an original image and a style image respectively by adopting a pretrained VGG19 convolutional neural network;

adopting a feature pyramid network to fuse the style features and the content features;

using a picture style migration network to perform mean value calculation and variance calculation on the fused style characteristics and content characteristics to obtain a stylized image;

filtering an abnormal value generated by transmission in the stylized image by using a Gaussian filter to obtain a result image;

converting the result image into a YUV domain, and converting the result image and the style image into the YUV domain by using a picture style migration network;

and splicing the Y channel converted from the result image and the style image with the UV channel converted from the result image to obtain an original image after style migration.

In one possible implementation manner, the step of optimizing the original image after the style migration to obtain the three-dimensional scene after the style migration by using the original image after the style migration as the supervision data includes:

performing stylized three-dimensional scene rendering on the original three-dimensional scene in a volume rendering mode to obtain a stylized rendering image;

and calculating the loss of the stylized rendering image and the original image after the style migration and carrying out back propagation.

In one possible implementation manner, the step of performing stylized three-dimensional scene rendering on the original three-dimensional scene in a volume rendering manner includes:

sampling the characteristic voxel grids through the characteristic voxel grids to obtain original scene color information;

extracting style characteristics of the style image by adopting a pre-trained style characteristic encoder;

generating control parameters by utilizing the super-network processing style characteristics;

adjusting the weight of the color generation module by using the control parameters;

and performing feature migration on the original color information to obtain a final rendering result.

In a second aspect, the present application proposes a three-dimensional scene style migration apparatus, the apparatus comprising:

the preprocessing module is used for collecting RGB images under a plurality of view angles as original images, and carrying out data preprocessing on the original images to obtain camera position and posture information;

the training module is used for collecting RGB images under a plurality of view angles as original images, and carrying out data preprocessing on the original images to obtain camera position and posture information;

the style migration module is used for performing style migration on the original image and the style image in the original three-dimensional scene by using a style migration network to obtain an original image after style migration;

and the scene generation module is used for optimizing the original image after the style migration to obtain a three-dimensional scene after the style migration by taking the original image after the style migration as supervision data.

In a third aspect, the present application also proposes a computer device comprising a processor and a memory, the memory having stored therein a computer program, the computer program being loaded and executed by the processor to implement the three-dimensional scene style migration method according to any of the first aspects.

In a fourth aspect, the present application also proposes a computer readable storage medium having stored therein a computer program, the computer program being loaded and executed by a processor to implement the three-dimensional scene style migration method according to any of the first aspects.

The above-mentioned main scheme of the application and its various further alternatives can be freely combined to form multiple schemes, which are all the schemes that the application can adopt and claim; and the application can be freely combined between the (non-conflicting choices) choices and between the choices and other choices. Various combinations will be apparent to those skilled in the art from a review of the present disclosure, and are not intended to be exhaustive or all of the present disclosure.

The application discloses a three-dimensional scene style migration method, a device, equipment and a storage medium, which are characterized in that firstly, RGB images under a plurality of view angles are collected as original images, the original images are subjected to data preprocessing to obtain camera position and posture information, then the original images and the camera position and posture information are input into a nerve radiation field model for training, an original three-dimensional scene is constructed, a style migration network is used for carrying out style migration on the original images and the style images in the original three-dimensional scene to obtain an original image after style migration, and finally, the original image after style migration is used as supervision data to obtain a three-dimensional scene after style migration in an optimized mode. Compared with the prior art, the method has better visual effect, and the whole nerve radiation field does not need to be trained again when facing different style pictures, so that the three-dimensional scene style migration of artistic styles and real scene styles can be realized, and the method has higher practical value.

Drawings

Fig. 1 shows a flow chart of a three-dimensional scene style migration method according to an embodiment of the present application.

Fig. 2 shows a schematic flow chart of style migration according to an embodiment of the present application.

Fig. 3 shows a schematic diagram of an embodiment of three-dimensional scene style migration proposed by an embodiment of the present application.

Detailed Description

Other advantages and effects of the present application will become apparent to those skilled in the art from the following disclosure, which describes the embodiments of the present application with reference to specific examples. The application may be practiced or carried out in other embodiments that depart from the specific details, and the details of the present description may be modified or varied from the spirit and scope of the present application. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict.

All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

In the prior art, part of the method can realize that the artistic characteristics of a single 2D image are transferred into a complete real 3D scene, so that the style in the real scene is changed. However, the style migration results obtained by the methods often have the problems of blurring, inconsistent appearance and artifacts, and the migration results of different style graphs need to be trained from the beginning again, so that the requirements of practical application are difficult to meet.

Therefore, in order to solve the above-mentioned problems, embodiments of the present application provide a three-dimensional scene style migration method, apparatus, device, and storage medium, which have better visual effects than the prior art, and do not need to train the entire nerve radiation field again when facing different style pictures, and can also realize three-dimensional scene style migration of artistic styles and real scene styles, so that the three-dimensional scene style migration method, apparatus, device, and storage medium have higher practical values, and are described in detail below.

3D scene style transfer: the appearance in a 3D scene can be edited by texture generation and semantic view synthesis. The scene style is changed by taking the image as a reference, the scene style is also a hot topic of 3D sensing style transfer research, and the spatial consistency is one of main problems to be solved.

Referring to fig. 1, fig. 1 shows a flow chart of a three-dimensional scene style migration method according to an embodiment of the present application, where the method includes the following steps:

s100, acquiring RGB images under a plurality of view angles as original images, and carrying out data preprocessing on the original images to obtain camera position and posture information.

In the reverse engineering, the point data set of the product appearance surface obtained by the measuring instrument is also called as point cloud, the number of points obtained by using the two-dimensional coordinate measuring machine is smaller, and the distance between the points is larger, so that the point cloud is called as sparse point cloud.

The step of preprocessing the original image to obtain the camera position and posture information comprises the following steps:

Firstly, RGB images under a plurality of view angles in a 3D scene are collected as original images, the original images are subjected to image screening, blurred images are removed, a group of more excellent and clear images are screened, resolution adjustment is carried out on the group of images, the resolution of the images is reduced or improved, and the adjusted original images are obtained. Extracting the characteristic points of the adjusted original image, and performing stereo matching on the characteristic points under different visual angles to obtain a plurality of image pairs and sparse point clouds, wherein the sparse point clouds can represent the position and posture information of the camera.

S200, inputting the original image and the camera position and posture information into a nerve radiation field model for training, and constructing an original three-dimensional scene.

The neural radiation field model is trained to represent the original three-dimensional scene by using the RGB original image and the camera position and posture information, wherein the construction method of the original three-dimensional scene is divided into two parts of appearance feature construction and geometric shape construction.

The nerve radiation field model comprises dense voxel grids and characteristic voxel grids, the original image and camera position and posture information are input into the nerve radiation field model for training, and the step of constructing an original three-dimensional scene comprises the following steps:

inputting the original image and the camera position and posture information into a dense voxel grid and a characteristic voxel grid;

Geometry construction is represented using a dense voxel grid, which aims to efficiently query density information of any spatial point by interpolation:wherein->Is a volume rendering function, +.>Activating a function for softplus, +.>For space point coordinates>Is a dense voxel grid>Is a difference function;

appearance feature construction is represented using a feature voxel grid, which aims to efficiently query color information of any spatial point location by interpolation:，/>for space point coordinates>Is a characteristic voxel grid;

the rendering formula is:+/>,/>is the attenuation parameter, K is the number of beams,is background color->Is the attenuation parameter of the kth point, +.>。

The density information and the color information of any point in the scene are obtained, and the construction of the three-dimensional scene is completed. If the scene corresponding to the view angle needs to be rendered, the scene is calculated according to a rendering formula of volume rendering.

And S300, performing style migration on the original image and the style image in the original three-dimensional scene by using a style migration network to obtain the original image after style migration.

And (3) generating supervision data of scene style migration: performing style migration on the original image and the style image by using a style migration network, wherein the style migration network is used as supervision data, and the supervision data generation comprises: extracting features of an original image and a style image, fusing multi-scale features, performing preliminary style migration and optimizing style migration.

Specifically, the step of performing style migration on the original image and the style image in the original three-dimensional scene by using a style migration network to obtain the original image after style migration comprises the following steps:

The feature extraction of the original image and the style image adopts a pretrained VGG19 convolutional neural network, and aims to extract the style features and the content features. The multi-scale feature fusion adopts a feature pyramid network FPN, and aims to fuse picture features with different resolutions and enhance the feature extraction effect. The primary style migration adopts a common picture style migration network AdaIN, and aims to obtain a style migration result preliminarily by utilizing extraction style characteristics and content characteristics. Style migration optimization includes the sub-steps of: filtering abnormal values generated by feature transfer by using a Gaussian filter, so that the feature transfer is smoother, converting a result image and a style image into a YUV domain, and further processing the result image and the style image by using AdaIN; and splicing the Y channel of the obtained result and the UV channel of the original result, and converting the result into RGB to obtain a final result.

Referring to fig. 2, fig. 2 shows a flow chart of style migration according to an embodiment of the present application, where the processed omnibearing style and full-resolution content are spliced to obtain a low-resolution stylized image, the low-resolution stylized image is filtered by a gaussian filter, and then the image is converted into a YUV domain, and a UV channel therein is selected. And inputting the full-resolution content into RGBtoYUV, converting the picture into YUV domain, processing the picture by using a style migration network, taking a Y channel therein, splicing the UV channel and the Y channel, and finally converting the spliced UV channel and Y channel into RGB image, thus obtaining the full-resolution image.

The two-dimensional realism style conversion framework shown in fig. 2 supports the input of full resolution style images and full resolution content images, enabling the realism style conversion of style images to content images. The image is converted into YUV channels in this framework. And finally, fusing the generated stylized UV channel, and fusing the stylized image with the Y channel fused by the original content image to obtain the final realistic stylized image.

And S400, optimizing the original image after style migration to obtain a three-dimensional scene after style migration by taking the original image after style migration as supervision data.

Three-dimensional scene style migration: taking the image after style migration as supervision, and optimizing and training to obtain the three-dimensional scene after style migration.

The method for optimizing the three-dimensional scene after style migration by taking the original image after style migration as supervision data comprises the following steps:

and calculating the loss of the stylized rendering image and the original image after the stylized migration, and carrying out back propagation.

The specific stylized three-dimensional scene rendering comprises the following sub-steps: sampling the characteristic voxel grids through the characteristic voxel grids to obtain original scene color information;

The style characteristic extraction adopts a pretrained style characteristic encoder which adopts a VGGNet structure, and the color generation module adopts an MLP (multi-level programming) for obtaining final color information C according to style characteristics and original colors:

；

where c is the original color, x is the coordinates of the point, and d is the viewpoint direction. Different style maps can accomplish style migration of the three-dimensional scene and do not need retraining.

In one possible embodiment, please refer to fig. 3, fig. 3 illustrates a schematic diagram of an embodiment of three-dimensional scene style migration proposed by an embodiment of the present application. Training an original picture, obtaining density information by using a density voxel grid, obtaining color information by using a characteristic voxel grid and a view angle direction and combining with super-linearity processing, performing style coding on the color information by combining with a style picture, and updating the weight of the super-linearity by using a super-network. And inputting the style image and the content image into a YUV style network to obtain a stylized image.

In the framework, training of photorealistic style conversion in 3D scenes is divided into two phases. The first stage is the geometric training of a single scene. We use the density voxel grid and the feature voxel grid to directly represent the scene, output the density with the density voxel grid, and predict the color with the feature voxel grid of the rgdnet shallow MLP. The second stage is style training. The parameters of the density voxel grid and the feature voxel grid will be frozen and we use the features of the reference pattern image as input to the super-network, thus controlling the input of rgdnet. Therefore, we jointly optimize the super network to realize the scene realism style conversion and the image of any style.

Referring again to fig. 3, in this framework, the photo style conversion training in a three-dimensional scene is split into two phases. The first stage is a single scene geometry training. We use the density voxel grid and the feature voxel grid to directly represent the scene, output the density with the density voxel grid, and predict the color with the feature voxel grid of the rgdnet shallow MLP. The second stage is style training. And freezing parameters of the density voxel grid and the characteristic voxel grid, taking the characteristics of the reference style image as the input of the super network, and controlling the input of RGB Net. Therefore, we jointly optimize the super network to realize the scene realism style conversion and the image of any style.

Compared with the prior art, the embodiment of the application has the following beneficial effects:

firstly, a NeRF model is used for representing an initial three-dimensional scene, an effective two-dimensional style migration image is obtained through a style migration network, and then a three-dimensional scene style migration module is trained by combining an original image and a stylized image, so that style migration of the three-dimensional scene is realized.

Secondly, three-dimensional scene style migration of artistic styles and real scene styles can be realized.

Thirdly, style migration of the three-dimensional scene can be completed for any style picture, and the head training is not needed again.

The following provides a possible implementation manner of the three-dimensional scene style migration device, which is used for executing each execution step and corresponding technical effect of the three-dimensional scene style migration method shown in the above embodiment and the possible implementation manner. The device comprises:

The preferred embodiment provides a computer device, which can implement the steps in any embodiment of the three-dimensional scene style migration method provided by the embodiment of the present application, so that the beneficial effects of the three-dimensional scene style migration method provided by the embodiment of the present application can be implemented, and detailed descriptions of the foregoing embodiments are omitted herein.

Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor. To this end, an embodiment of the present application provides a storage medium having stored therein a plurality of instructions capable of being loaded by a processor to perform the steps of any one of the embodiments of the three-dimensional scene style migration method provided by the embodiment of the present application.

Wherein the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

The steps in any three-dimensional scene style migration method embodiment provided by the embodiment of the present application can be executed due to the instructions stored in the storage medium, so that the beneficial effects that any three-dimensional scene style migration method provided by the embodiment of the present application can be achieved, and detailed descriptions of the previous embodiments are omitted.

The foregoing description of the preferred embodiments of the application is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the application.

Claims

1. A method for style migration of a three-dimensional scene, the method comprising:

2. The method for migrating a three-dimensional scene according to claim 1, wherein the step of preprocessing the original image data to obtain camera position and orientation information comprises:

3. The three-dimensional scene style migration method of claim 1, wherein the step of inputting the original image and the camera position and orientation information into a neural radiation field model for training, and constructing an original three-dimensional scene, comprises:

4. The three-dimensional scene style migration method of claim 3, wherein the density information is:wherein->Is a volume rendering function, +.>For the softplus activation function,is a difference function>For space point coordinates>Is a dense voxel grid;

the color information is:，/>is a characteristic voxel grid;

the rendering formula is:+/>,/>is the attenuation parameter, K is the number of beams,is background color->Is the attenuation parameter at the K-th point.

5. The method for style migration of a three-dimensional scene according to claim 1, wherein the step of style migrating the original image and the style image in the original three-dimensional scene using a style migration network to obtain the style migrated original image comprises:

6. The method for migrating a three-dimensional scene according to claim 1, wherein the step of optimizing the original image after the migration of the style as the supervision data to obtain the three-dimensional scene after the migration of the style comprises the steps of:

7. The three-dimensional scene style migration method according to claim 6, wherein the step of stylized three-dimensional scene rendering of the original three-dimensional scene using a volume rendering method comprises:

8. A three-dimensional scene style migration apparatus, the apparatus comprising:

9. A computer device comprising a processor and a memory, the memory having stored therein a computer program that is loaded and executed by the processor to implement the three-dimensional scene style migration method of any of claims 1-7.

10. A computer readable storage medium having stored therein a computer program that is loaded and executed by a processor to implement the three-dimensional scene style migration method of any of claims 1-7.