CN117557722A

CN117557722A - Reconstruction method and device of 3D model, enhancement realization device and storage medium

Info

Publication number: CN117557722A
Application number: CN202311495527.1A
Authority: CN
Inventors: 张吉松; 夏勇峰
Original assignee: Beijing Beehive Century Technology Co ltd
Current assignee: Beijing Beehive Century Technology Co ltd
Priority date: 2023-11-10
Filing date: 2023-11-10
Publication date: 2024-02-13

Abstract

Some embodiments of the present application provide a method, an apparatus, an enhancement implementation device, and a storage medium for reconstructing a 3D model, where the method includes: acquiring 2D video data and extracting key frame data in the 2D video data; generating a first 3D model corresponding to key frame data according to a pre-trained generation model and the key frame data, wherein the pre-trained generation model is obtained by training a neural network model by adopting sample key frame data; adopting a multi-view fusion algorithm to process noise and artifact on the first 3D model to obtain a second 3D model; mapping preset texture information to a second 3D model to generate a target 3D model corresponding to key frame data, wherein in the embodiment of the application, the neural network model is utilized for surface rendering, so that the data processing time is greatly shortened; through multi-view fusion and texture mapping technology, the 3D structure can be further improved, and the accuracy of reconstruction is improved.

Description

Reconstruction method and device of 3D model, enhancement realization device and storage medium

Technical Field

The application relates to the technical field of data processing, in particular to a reconstruction method and device of a 3D model, enhancement realization equipment and a storage medium.

Background

In recent years, with the continuous development of network technology, 3D structural reconstruction of 2D video is required in many scenes, but in the reconstruction process, the problems of long data processing time and low reconstruction accuracy exist, and how to improve the efficiency and accuracy of 3D reconstruction is a problem which needs to be solved at present.

Disclosure of Invention

Some embodiments of the present application provide a method, an apparatus, an enhancement implementation device, and a storage medium for reconstructing a 3D model, by which 2D video data is acquired and key frame data in the 2D video data is extracted; generating a first 3D model corresponding to the key frame data according to a pre-trained generation model and the key frame data, wherein the pre-trained generation model is obtained by training a neural network model by adopting sample key frame data; performing noise and artifact processing on the first 3D model by adopting a multi-view fusion algorithm to obtain a second 3D model; mapping preset texture information to the second 3D model to generate a target 3D model corresponding to the key frame data, wherein in the embodiment of the application, the neural network model is utilized for surface rendering, so that the data processing time is greatly shortened; through multi-view fusion and texture mapping technology, the 3D structure can be further improved, and the accuracy of reconstruction is improved.

In a first aspect, some embodiments of the present application provide a method for reconstructing a 3D model, including:

acquiring 2D video data and extracting key frame data in the 2D video data;

generating a first 3D model corresponding to the key frame data according to a pre-trained generation model and the key frame data, wherein the pre-trained generation model is obtained by training a neural network model by adopting sample key frame data;

performing noise and artifact processing on the first 3D model by adopting a multi-view fusion algorithm to obtain a second 3D model;

mapping preset texture information to the second 3D model, and generating a target 3D model corresponding to the key frame data.

Some embodiments of the application utilize a neural network model to perform surface rendering, so that the data processing time is greatly shortened; through multi-view fusion and texture mapping technology, the 3D structure can be further improved, and the accuracy of reconstruction is improved.

In some embodiments, the generative model is obtained by:

acquiring 2D sample video data;

analyzing the 2D sample video data to obtain sample key frame data corresponding to the 2D sample video data;

training the neural network model according to the sample key frame data to obtain a trained network model;

and if the loss function of the trained network model is smaller than a preset value, determining the trained network model as the generation model.

Some embodiments of the present application obtain corresponding 3D structure information by inputting key frames into a neural network model, and through calculation and learning of the neural network, the neural network understands and simulates a surface rendering process by learning a large amount of key frame data, thereby generating an accurate 3D structure.

In some embodiments, the performing noise and artifact processing on the first 3D model by using a multi-view fusion algorithm to obtain a second 3D model includes:

acquiring image data of a plurality of angles corresponding to the key frame data;

registering and aligning the image data of the plurality of angles to obtain calibration information;

and according to the calibration information, carrying out fusion processing on the first 3D model, and removing noise and artifacts in the first 3D model to obtain the second 3D model.

According to some embodiments of the method, noise and artifacts generated in the reconstruction process are eliminated by utilizing image information of multiple views, the reality and quality of a model are improved by mapping real texture information onto a 3D model, and model correction and optimization are performed by the multi-view fusion technology through integrating the information of the multiple views, so that accuracy of a reconstruction result is improved.

In some embodiments, the mapping the preset texture information to the second 3D model, generating a target 3D model corresponding to the key frame data, includes:

and mapping preset texture information to the second 3D model by adopting a texture coordinate and texture mapping algorithm, and generating a target 3D model corresponding to the key frame data.

Some embodiments of the present application improve the quality and fidelity of a model by mapping texture information of a real image onto the surface of the model, maintaining the authenticity and detail of the model.

In a second aspect, some embodiments of the present application provide a reconstruction apparatus of a 3D model, including:

the acquisition module is used for acquiring 2D video data and extracting key frame data in the 2D video data;

the generation module is used for generating a first 3D model corresponding to the key frame data according to a pre-trained generation model and the key frame data, wherein the pre-trained generation model is obtained by training a neural network model by adopting sample key frame data;

the elimination module is used for carrying out noise and artifact processing on the first 3D model by adopting a multi-view fusion algorithm to obtain a second 3D model;

and the mapping module is used for mapping the preset texture information to the second 3D model and generating a target 3D model corresponding to the key frame data.

In some embodiments, the apparatus further comprises a model training module to:

acquiring 2D sample video data;

In some embodiments, the cancellation module is configured to:

In some embodiments, the mapping module is to:

In a third aspect, some embodiments of the present application provide an enhancement implementation device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, where the processor may implement the method for reconstructing a 3D model according to any embodiment of the first aspect when the program is executed by the processor.

In a fourth aspect, some embodiments of the present application provide a computer readable storage medium having stored thereon a computer program, which when executed by a processor, may implement a method for reconstructing a 3D model according to any embodiment of the first aspect.

In a fifth aspect, some embodiments of the present application provide a computer program product, where the computer program product includes a computer program, where the computer program when executed by a processor may implement a method for reconstructing a 3D model according to any embodiment of the first aspect.

Drawings

In order to more clearly illustrate the technical solutions of some embodiments of the present application, the drawings that are required to be used in some embodiments of the present application will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present application and should not be considered as limiting the scope, and other related drawings may be obtained according to these drawings without inventive effort to a person having ordinary skill in the art.

Fig. 1 is a schematic flow chart of a method for reconstructing a 3D model according to an embodiment of the present application;

fig. 2 is a schematic structural diagram of a 3D model reconstruction device according to an embodiment of the present application;

fig. 3 is a schematic diagram of an enhancement implementation device provided in an embodiment of the present application.

Detailed Description

The technical solutions in some embodiments of the present application will be described below with reference to the drawings in some embodiments of the present application.

It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures. Meanwhile, in the description of the present application, the terms "first", "second", and the like are used only to distinguish the description, and are not to be construed as indicating or implying relative importance.

In recent years, with the continuous development of network technology, 3D structural reconstruction is required to be performed on 2D video in many scenes, but in the reconstruction process, the problems of long data processing time and low reconstruction accuracy exist, and in view of this, some embodiments of the present application provide a reconstruction method of a 3D model, by acquiring 2D video data and extracting key frame data in the 2D video data; generating a first 3D model corresponding to key frame data according to a pre-trained generation model and the key frame data, wherein the pre-trained generation model is obtained by training a neural network model by adopting sample key frame data; adopting a multi-view fusion algorithm to process noise and artifact on the first 3D model to obtain a second 3D model; mapping preset texture information to a second 3D model to generate a target 3D model corresponding to key frame data, wherein in the embodiment of the application, the neural network model is utilized for surface rendering, so that the data processing time is greatly shortened; through multi-view fusion and texture mapping technology, the 3D structure can be further improved, and the accuracy of reconstruction is improved.

As shown in fig. 1, an embodiment of the present application provides a method for reconstructing a 3D model, where the method includes:

s101, acquiring 2D video data and extracting key frame data in the 2D video data;

specifically, the terminal device acquires 2D video data, which may be an augmented reality device, for example, AR (Augmented Reality ) glasses; the AR glasses comprise a display screen, a high-definition camera and a communication module, and can realize interaction experience of virtual reality and augmented reality.

The terminal device acquires 2D video data, parses the 2D video data, and extracts key frame data, such as I-frame images, from the 2D video data.

S102, generating a first 3D model corresponding to key frame data according to a pre-trained generation model and the key frame data, wherein the pre-trained generation model is obtained by training a neural network model by adopting sample key frame data;

specifically, the terminal equipment acquires 2D sample video data in advance, acquires sample key frame data in the 2D sample video data, trains a neural network model according to the sample key frame data to obtain a generation model, and the generation model is used for rendering the key frame data in the 2D video data to generate a 3D model.

In a specific implementation process, after obtaining key frame data, a terminal device adopts a pre-trained generation model to render the key frame data, so as to obtain a first 3D model corresponding to the key frame data.

S103, adopting a multi-view fusion algorithm to process noise and artifact on the first 3D model to obtain a second 3D model;

specifically, the multi-view fusion algorithm is an algorithm for fusing target information from multi-source data, extracts effective information from information of various sources such as text, images, video and the like, and fuses the effective information together to obtain more accurate and reliable information. And the terminal equipment adopts a multi-view fusion algorithm to process noise and artifact on the first 3D model to obtain a second 3D model.

S104, mapping the preset texture information to the second 3D model, and generating a target 3D model corresponding to the key frame data.

The embodiment of the application provides a complex scene 3D structure reconstruction method and system based on nerve surface rendering, which comprises the steps of firstly acquiring 2D video data and extracting key frames in a video. And then, performing surface rendering on the key frame by utilizing a pre-trained neural network to generate a preliminary 3D structure, namely a first 3D model, further perfecting the preliminary 3D structure through a multi-view fusion and texture mapping technology, eliminating noise and artifacts in the model to obtain a second 3D model, improving the accuracy of reconstruction, and finally, generating a high-quality complex scene 3D model corresponding to the key frame data.

The method for reconstructing the 3D model provided by the embodiment is further described in another embodiment of the present application.

In some embodiments, the generative model is obtained by:

acquiring 2D sample video data;

and if the loss function of the trained network model is smaller than a preset value, determining the trained network model to be a generating model.

Specifically, the terminal device acquires sample key frame data in 2D sample video data, inputs the sample key frame data into a neural network model, obtains corresponding 3D structure information through calculation and learning of the neural network, namely a trained network model, and the neural network understands and simulates a surface rendering process through learning a large amount of sample data, so that an accurate 3D structure is generated.

Different neural network structures and parameters can be selected by the neural network model so as to obtain a better surface rendering effect. In addition, other technologies and contents, such as deep learning, computer vision and the like, can be combined, so that the accuracy and efficiency of reconstruction are further improved.

In the embodiment of the application, the neural network is utilized to conduct surface rendering on the key frames, and a preliminary 3D structure is generated; 3D structure is perfected through multi-view fusion and texture mapping technology, and a high-quality 3D model of the complex scene is generated.

In some embodiments, a multi-view fusion algorithm is used to perform noise and artifact processing on the first 3D model to obtain a second 3D model, including:

acquiring image data of a plurality of angles corresponding to key frame data;

registering and aligning the image data of a plurality of angles to obtain calibration information;

and according to the calibration information, carrying out fusion processing on the first 3D model, and removing noise and artifacts in the first 3D model to obtain a second 3D model.

Specifically, the terminal device perfects the 3D structure through multi-view fusion and texture mapping technology, generates a high-quality 3D model of the complex scene, namely eliminates noise and artifacts generated in the reconstruction process by utilizing image information of a plurality of views, and improves the reality and quality of the model by mapping real texture information onto the 3D model.

The multi-view fusion technology improves the accuracy of a reconstruction result by integrating information of a plurality of views, and removes noise and artifacts generated in the reconstruction process by registering and aligning images of the plurality of views and then integrating the information of the images and fusing and optimizing the pixel level, thereby obtaining a clearer and more accurate 3D structure.

In some embodiments, mapping the preset texture information to the second 3D model, generating a target 3D model corresponding to the key frame data, includes:

and mapping the preset texture information to a second 3D model by adopting a texture coordinate and texture mapping algorithm, and generating a target 3D model corresponding to the key frame data.

In the embodiment of the application, after the terminal equipment acquires the second 3D model, the texture coordinates and the texture mapping algorithm are adopted to map the real texture information, namely the preset texture information, onto the second 3D model, so that the authenticity and detail of the model are maintained, and the quality and fidelity of the model are improved.

Texture Mapping (Texture Mapping) is a method of converting object space coordinate points into Texture coordinates, and further obtaining values of corresponding points from textures to enhance coloring details.

The specific algorithm comprises the following steps:

1) Projection mapping is mainly because there are two methods that can convert three-dimensional space coordinate points into two-dimensional texture coordinate points: projector and UV Mapping;

2) The transformation function, in the previous projection mapping, maps the 3-dimensional spatial coordinates to 2-dimensional parametric spatial coordinates uv, can do three things at this stage: coordinate range processing, coordinate free transformation and conversion to texture space;

3) Texture sampling, comprising: nearest neighbor (nearest neighbor), bilinear (bilinear) and cubic convolution (cubic convolution).

4) The texture conversion, after obtaining the texture value through the texture sampling, does not necessarily use the texture value as a color directly, such as using as a Normal vector in Normal Mapping and as a high offset in Bump Mapping, so that the texture value needs to be converted correspondingly.

Converting the three-dimensional object coordinates into two-dimensional parameter space uv coordinates, wherein in real-time rendering, the uv coordinates are usually stored in vertex information; converting uv coordinates into texture space coordinates according to actual texture sizes after processing and transformation, wherein decimal places are possible at the moment; sampling texture according to texture space coordinates, and processing two conditions of enlarging and shrinking, wherein the shrinking condition is more complex and involves an anisotropic filtering algorithm; after the texture value is obtained through sampling, the texture value cannot be directly used, and the texture value can be used only through corresponding conversion.

According to the embodiment of the application, the neural network is utilized for surface rendering, so that the data processing time is greatly shortened, and the reconstruction speed is improved; the 3D structure can be further improved through multi-view fusion and texture mapping technology, and the accuracy of reconstruction is improved; the application range is wide, and the method can be applied to 3D structure reconstruction of various complex scenes, such as buildings, landscapes, objects and the like.

It should be noted that, in this embodiment, each of the possible embodiments may be implemented separately, or may be implemented in any combination without conflict, which is not limited to the implementation of the present application.

Another embodiment of the present application provides a 3D model reconstruction device, configured to execute the 3D model reconstruction method provided in the foregoing embodiment.

Fig. 2 is a schematic structural diagram of a 3D model reconstruction device according to an embodiment of the present application. The reconstructing device of the 3D model comprises an obtaining module 201, a generating module 202, an eliminating module 203 and a mapping module 204, wherein:

the acquisition module 201 is configured to acquire 2D video data and extract key frame data in the 2D video data;

the generating module 202 is configured to generate a first 3D model corresponding to key frame data according to a pre-trained generating model and key frame data, where the pre-trained generating model is obtained by training a neural network model with sample key frame data;

the elimination module 203 is configured to perform noise and artifact processing on the first 3D model by using a multi-view fusion algorithm to obtain a second 3D model;

the mapping module 204 is configured to map the preset texture information to the second 3D model, and generate a target 3D model corresponding to the key frame data.

The specific manner in which the individual modules perform the operations of the apparatus of this embodiment has been described in detail in connection with embodiments of the method and will not be described in detail herein.

In another embodiment of the present application, the 3D model reconstruction device provided in the foregoing embodiment is further described in additional detail.

In some embodiments, the apparatus further comprises a model training module for:

acquiring 2D sample video data;

In some embodiments, the cancellation module is to:

acquiring image data of a plurality of angles corresponding to key frame data;

In some embodiments, the mapping module is to:

The embodiment of the application also provides a computer readable storage medium, on which a computer program is stored, which when executed by a processor, can implement the operations of the method corresponding to any embodiment in the 3D model reconstruction methods provided in the above embodiments.

The embodiment of the application also provides a computer program product, which comprises a computer program, wherein the computer program can realize the operation of the method corresponding to any embodiment in the 3D model reconstruction method provided by the embodiment when being executed by a processor.

As shown in fig. 3, some embodiments of the present application provide an enhancement implementation device 300, the enhancement implementation device 300 comprising: memory 310, processor 320, and a computer program stored on memory 310 and executable on processor 320, wherein processor 320, when reading the program from memory 310 and executing the program via bus 330, may implement the method of any of the embodiments as included in the 3D model reconstruction method described above.

Processor 320 may process digital signals and may include various computing structures. Such as a complex instruction set computer architecture, a reduced instruction set computer architecture, or an architecture that implements a combination of instruction sets. In some examples, processor 320 may be a microprocessor.

Memory 310 may be used for storing instructions to be executed by processor 320 or data related to execution of the instructions. Such instructions and/or data may include code to implement some or all of the functions of one or more modules described in embodiments of the present application. The processor 320 of the disclosed embodiments may be configured to execute instructions in the memory 310 to implement the methods shown above. Memory 310 includes dynamic random access memory, static random access memory, flash memory, optical memory, or other memory known to those skilled in the art.

The above is only an example of the present application, and is not intended to limit the scope of the present application, and various modifications and variations will be apparent to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principles of the present application should be included in the protection scope of the present application. It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further definition or explanation thereof is necessary in the following figures.

The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily think about changes or substitutions within the technical scope of the present application, and the changes or substitutions are intended to be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

Claims

1. A method for reconstructing a 3D model, the method comprising:

acquiring 2D video data and extracting key frame data in the 2D video data;

2. The method for reconstructing a 3D model according to claim 1, wherein the generated model is obtained by:

acquiring 2D sample video data;

3. The method for reconstructing a 3D model according to claim 1, wherein the performing noise and artifact processing on the first 3D model by using a multi-view fusion algorithm to obtain a second 3D model comprises:

4. The method for reconstructing a 3D model according to claim 1, wherein mapping the preset texture information to the second 3D model, generating a target 3D model corresponding to the key frame data, comprises:

5. A reconstruction apparatus for a 3D model, the apparatus comprising:

6. The apparatus for reconstructing a 3D model according to claim 5, further comprising a model training module for:

acquiring 2D sample video data;

7. The apparatus for reconstructing a 3D model according to claim 5, wherein the cancellation module is configured to:

8. The apparatus for reconstructing a 3D model according to claim 5, wherein the mapping module is configured to:

9. An enhancement enabling device, comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor is capable of implementing a method of reconstructing a 3D model according to any one of claims 1-4 when executing the program.

10. A computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and wherein the program, when executed by a processor, implements a method for reconstructing a 3D model according to any one of claims 1-4.