CN116051746A - Improved method for three-dimensional reconstruction and neural rendering network - Google Patents

Improved method for three-dimensional reconstruction and neural rendering network Download PDF

Info

Publication number
CN116051746A
CN116051746A CN202310038592.5A CN202310038592A CN116051746A CN 116051746 A CN116051746 A CN 116051746A CN 202310038592 A CN202310038592 A CN 202310038592A CN 116051746 A CN116051746 A CN 116051746A
Authority
CN
China
Prior art keywords
network
view
dimensional
nerf
input
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202310038592.5A
Other languages
Chinese (zh)
Inventor
党凤月
任振宁
李会朋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shandong Huichuan Iot Technology Co ltd
Original Assignee
Shandong Huichuan Iot Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shandong Huichuan Iot Technology Co ltd filed Critical Shandong Huichuan Iot Technology Co ltd
Priority to CN202310038592.5A priority Critical patent/CN116051746A/en
Publication of CN116051746A publication Critical patent/CN116051746A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • G06T15/205Image-based rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an improved method of a three-dimensional reconstruction and neural rendering network, which comprises two-dimensional view feature extraction, feature matching, model decomposition, single-view neural rendering network, multi-view neural rendering network and commodity three-dimensional modeling scheme flow. The beneficial effects of the invention are as follows: by utilizing the advantages of FastNeRF and PixelNeRF, depth optimization is performed on the basis of NeRF, and a complete network structure is provided, so that the model has the characteristics of high reasoning speed, small view input quantity and strong generalization; for the lower computational efficiency of NeRF, fastNeRF can render a 200Hz high-fidelity realistic image on a terminal consumer grade GPU, and the speed of the method is 3000 times that of the original NeRF algorithm; in addition, the PixelNeRF aims to well synthesize a new view angle when a few view angles are known, and greatly improves the generalization of the model.

Description

Improved method for three-dimensional reconstruction and neural rendering network
Technical Field
The invention relates to a neural rendering method, in particular to an improved method of three-dimensional reconstruction and neural rendering network, and belongs to the technical field of neural rendering based on software-three-dimensional modeling.
Background
In the intelligent age of combining the digital and the media and enabling the advanced technology to be changed day by day, the demands of people are also increasing. In recent years, three-dimensional models are widely applied to various industries, such as mapping and measurement, geographic information systems, teaching display, city planning, building construction, game making, smart cities, smart scenic spots, digital archiving protection of ancient cultural relics, and the like. The three-dimensional modeling technology is convenient for people to live, and has better experience, the three-dimensional reconstruction application is very wide, in the aspect of image entertainment, an object can be reconstructed to obtain a three-dimensional model, and the three-dimensional model can be used for 3D printing or driving by human bodies to make some interesting applications; for virtual fitting, clothes with different sizes can be automatically adapted according to the fat, thin and height of different human bodies after the human bodies are rebuilt; in the aspect of intelligent home, a plurality of shopping APP can be used for placing virtual furniture to see whether the matching of the furniture and the home of the user is achieved, and whether the furniture is put down or not is determined through the actual size; in terms of relic reconstruction and AR tourism, a plurality of museums or tourist attractions have similar products at present, such as AR, xihu lake and the like; in the aspect of automatic driving, a high-precision map can be constructed; for the three-dimensional reconstruction of a large scene, virtual roaming can be achieved, and in most shopping apps, a consumer can only see one two-dimensional commodity graph and cannot see the real commodity effect.
The neural radiation field (NeRF) proposed in 2020 becomes the most fire algorithm in the three-dimensional modeling field, and neural rendering of the neural radiation field (NeRF) is an emerging three-dimensional modeling and rendering technology, and the neural network is used for implicitly representing the shape, texture and material of an object, and is used for training end to end directly, so that rendering results of various angles with high reduction degree can be obtained. However, the NeRF technology is also relatively primary, the training speed, the reasoning speed, the modeling robustness and the lack of explicit three-dimensional representation have all seriously affected the application of the technology, and many scientists around the world have conducted intensive research and exploration on the basis of the technology.
Disclosure of Invention
The invention aims to solve at least one technical problem, and provides an improved method for three-dimensional reconstruction and neural rendering network, which improves the model reasoning speed and generalization, and the model can support a small amount of input, namely 3-4 two-dimensional commodity graphs are input, so that three-dimensional commodities can be reconstructed.
The invention realizes the above purpose through the following technical scheme: an improved method of three-dimensional reconstruction and neural rendering network, comprising the steps of:
step one, two-dimensional view feature extraction, the NeRF method requires a large number of pictures of known viewing angles as input and takes a large amount of training time; pixelNeRF allows the network to be trained across multiple scenes to learn previous scenes, i.e., to acquire a priori knowledge of the scenes, enabling it to perform new view synthesis from a sparse view set (one or a small number) in a feed-forward manner; the PixelNeRF adds a full convolution image encoder in front of the NeRF network for pixel-wise encoding of the image into a feature network aligned along the pixels, wherein NeRF is a neural radiation field, which is a method of generating new views of complex scenes, essentially constructing an implicit rendering flow;
step two, after the features corresponding to the points of the input view are obtained, in order to know the projection of the points on the world coordinates on the input view, the coordinates of the points need to be converted into a camera space, and the corresponding features are sampled in a feature map according to the normalized coordinates;
step three, model decomposition, neRF is essentially an equation that converts a three-dimensional position p and a two-dimensional representation of direction d into a three-dimensional color c and scalar density σ, where density depends only on position and color depends on position and direction; the basic idea of FastNeRF is to sacrifice a part of the cache to improve the calculation efficiency and reduce the time required for rendering; neRF was decomposed into two neural networks: a position dependent network that generates a deep radiation pattern and a direction dependent network that generates weights, the inner products of the weights and the deep radiation pattern estimating colors in the scene that are seen at the specified position and from the specified direction; the FastNeRF architecture can be cached efficiently, and the testing time efficiency is remarkably improved while the visual quality of NeRF is maintained;
step four, a single-view neural rendering network, wherein a model is formed by combining two-dimensional input visual angle characteristics with a decomposition network, the model is formed by two parts, the first part is a characteristic network which is formed by encoding input images in the step one and the step two into alignment along pixels according to pixels, the second part is a neural network which is formed by decomposing the step three into two suitable buffer memory and is used for buffering space coordinate information and corresponding encoding characteristics and outputting color and density values, and the neural rendering is to realize display or implicit control of some scene assets (illumination, camera parameters, gestures, geometry, appearance and semantic structures) through a depth image or video generation method;
step five, for the multi-view neural rendering network, for inputting multiple views, independently processing coordinates and corresponding features in each view coordinate frame according to a single-view rendering method, then aggregating in a mode of solving an average value of rendering results obtained by each view, transmitting the aggregated results to an MLP to obtain predicted density and color, and finally minimizing the mean square error between the calculated results and a true value;
step six, a commodity three-dimensional modeling scheme flow, wherein the improved network structure is applied to commodity three-dimensional modeling, data is submitted first, 3-4 two-dimensional commodity models are input, each angle of the commodity is provided with a picture, and the picture background is required to be as concise and clear as possible and is best mainly based on white background; then preprocessing the picture; transmitting the processed picture into a model for three-dimensional modeling and nerve rendering; and finally uploading the generated three-dimensional commodity to an app for viewing by a consumer.
As still further aspects of the invention: in step one, pixelNeRF uses pre-trained ResNet34 for feature picture feature extraction, and is implemented in the first 7x7 large-kernel convolutional layer with 3 convolution kernels of 3x3 small convolution kernels instead, corresponding to the last average poolThe changed core is changed into 4x4, which not only can reduce parameters, but also can deepen the network depth to realize network capacity and complexity; the improved network is mainly composed of 16 basic units, 3x3 convolution layers and 1 full connection layer, and the total is 36 layers, and the final size is that
Figure BDA0004050398500000041
Wherein ResNet34 refers to a residual network with 34 convolutional layers as a backbone network for extracting feature maps.
As still further aspects of the invention: in the second step, specifically, the method includes:
knowing the camera coordinate p of the input view angle and the rotation matrix R, the point coordinate is s in the camera space c =R -1 (s i -p);
Since the rotation matrix is an orthogonal matrix, the coordinate is s c =R T (s i -p)=(x c ,y c ,z c );
Projected onto a camera plane with coordinates of
Figure BDA0004050398500000042
Its coordinates after normalization are
Figure BDA0004050398500000043
Finally according to s uv Corresponding features are sampled in the feature map.
As still further aspects of the invention: in the third step, specifically include:
dividing the NeRF same task into two neural networks suitable for buffering;
location dependent network F pos Outputting a depth radiation pattern (u, v, w) comprising D components;
direction dependent network F dir When the direction of the light is input, the weight (β) of the output component 1 ,...,β D );
The split architecture allows for independent caching of both location dependent and ray direction dependent outputs, which can greatly improve performance when cached.
As still further aspects of the invention: in the fourth step, specifically include:
for a single input image i, firstly fixing a coordinate system as a view space of the input image, and specifying a position and camera rays in the coordinate system;
extracting the characteristic quantity W of the input image through the improved ResNet34 network (i)
For a point x on the camera ray (i) By using known internal references, x is calculated (i) Projected onto the image coordinates pi (x (i) ) On, then extracting corresponding image feature vectors W between pixel features (i) (π(x (i) ));
Finally, the image features are transferred to the decomposition network along with the position (x, y, z) and the view direction (θ, φ), and the view angle (rgb σ) is output.
As still further aspects of the invention: in step six, the submitted data includes 2D pictures and model information, and preprocessing of the pictures includes image segmentation and format conversion.
The beneficial effects of the invention are as follows: acquiring input view characteristics; dividing the same task into two neural networks suitable for buffering, a position dependent network generating a depth radiation pattern and a direction dependent network generating weight; combining the input visual angle characteristics with a decomposition network to obtain single-view output color and density values; the single view result is aggregated in a mean value solving mode and is transmitted to an MLP to obtain predicted density and color; by utilizing the advantages of FastNeRF and PixelNeRF, depth optimization is performed on the basis of NeRF, and a complete network structure is provided, so that the model has the characteristics of high reasoning speed, small view input quantity and strong generalization; for the lower computational efficiency of NeRF, fastNeRF can render a 200Hz high-fidelity realistic image on a terminal consumer grade GPU, and the speed of the method is 3000 times that of the original NeRF algorithm; in addition, the PixelNeRF aims to well synthesize a new view angle when a few view angles are known, and greatly improves the generalization of the model.
Drawings
FIG. 1 is a schematic flow chart of the present invention;
FIG. 2 is a schematic diagram of an improved network architecture according to the present invention;
FIG. 3 is a diagram of a network architecture for model decomposition of the present invention;
FIG. 4 is a block diagram of a single view neural rendering network of the present invention;
FIG. 5 is a block diagram of a multi-view neural rendering network of the present invention;
FIG. 6 is a flow chart of a three-dimensional modeling scheme for a commodity according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
As shown in fig. 1 to 6, an improved method of three-dimensional reconstruction and neural rendering network, comprising the steps of:
step one, two-dimensional view feature extraction, the NeRF method requires a large number of pictures of known viewing angles as input and takes a large amount of training time; pixelNeRF allows the network to be trained across multiple scenes to learn previous scenes, i.e., to acquire a priori knowledge of the scenes, enabling it to perform new view synthesis from a sparse view set (one or a small number) in a feed-forward manner; the PixelNeRF adds a full convolution image encoder in front of the NeRF network for pixel-wise encoding of the image into a feature network aligned along the pixels, wherein NeRF is a neural radiation field, which is a method of generating new views of complex scenes, essentially constructing an implicit rendering flow;
step two, after the features corresponding to the points of the input view are obtained, in order to know the projection of the points on the world coordinates on the input view, the coordinates of the points need to be converted into a camera space, and the corresponding features are sampled in a feature map according to the normalized coordinates;
step three, model decomposition, neRF is essentially an equation that converts a three-dimensional position p and a two-dimensional representation of direction d into a three-dimensional color c and scalar density σ, where density depends only on position and color depends on position and direction; the basic idea of FastNeRF is to sacrifice a part of the cache to improve the calculation efficiency and reduce the time required for rendering; neRF was decomposed into two neural networks: a position dependent network that generates a deep radiation pattern and a direction dependent network that generates weights, the inner products of the weights and the deep radiation pattern estimating colors in the scene that are seen at the specified position and from the specified direction; the FastNeRF architecture can be cached efficiently, and the testing time efficiency is remarkably improved while the visual quality of NeRF is maintained;
step four, a single-view neural rendering network, wherein a model is formed by combining two-dimensional input visual angle characteristics with a decomposition network, the model is formed by two parts, the first part is a characteristic network which is formed by encoding input images in the step one and the step two into alignment along pixels according to pixels, the second part is a neural network which is formed by decomposing the step three into two suitable buffer memory and is used for buffering space coordinate information and corresponding encoding characteristics and outputting color and density values, and the neural rendering is to realize display or implicit control of some scene assets (illumination, camera parameters, gestures, geometry, appearance and semantic structures) through a depth image or video generation method;
step five, for the multi-view neural rendering network, for inputting multiple views, independently processing coordinates and corresponding features in each view coordinate frame according to a single-view rendering method, then aggregating in a mode of solving an average value of rendering results obtained by each view, transmitting the aggregated results to an MLP to obtain predicted density and color, and finally minimizing the mean square error between the calculated results and a true value;
step six, a commodity three-dimensional modeling scheme flow, wherein the improved network structure is applied to commodity three-dimensional modeling, data is submitted first, 3-4 two-dimensional commodity models are input, each angle of the commodity is provided with a picture, and the picture background is required to be as concise and clear as possible and is best mainly based on white background; then preprocessing the picture; transmitting the processed picture into a model for three-dimensional modeling and nerve rendering; and finally uploading the generated three-dimensional commodity to an app for viewing by a consumer.
Example two
In addition to all the technical features in the first embodiment, the present embodiment further includes:
in the first step, the PixelNeRF adopts a pre-training ResNet34 for extracting characteristic picture characteristics, and is realized by replacing convolution kernels of 3x3 small convolution kernels in the initial 7x7 large-kernel convolution layer, and the corresponding final average pooling kernel is changed into 4x4, so that parameters can be reduced, and the network depth can be deepened to realize network capacity and complexity; the improved network is mainly composed of 16 basic units, 3x3 convolution layers and 1 full connection layer, and the total is 36 layers, and the final size is that
Figure BDA0004050398500000081
Wherein ResNet34 refers to a residual network with 34 convolutional layers as a backbone network for extracting feature maps.
In the second step, specifically, the method includes:
knowing the camera coordinate p of the input view angle and the rotation matrix R, the point coordinate is s in the camera space c =R -1 (s i -p);
Since the rotation matrix is an orthogonal matrix, the coordinate is s c =R T (s i -p)=(x c ,y c ,z c );
Projected onto a camera plane with coordinates of
Figure BDA0004050398500000091
Its coordinates after normalization are
Figure BDA0004050398500000092
Finally according to s uv Corresponding features are sampled in the feature map.
Example III
In addition to all the technical features in the first embodiment, the present embodiment further includes:
in the third step, specifically include:
dividing the NeRF same task into two neural networks suitable for buffering;
location dependent network F pos Outputting a depth radiation pattern (u, v, w) comprising D components;
direction dependent network F dir When the direction of the light is input, the weight (β) of the output component 1 ,...,β D );
The split architecture allows for independent caching of both location dependent and ray direction dependent outputs, which can greatly improve performance when cached.
In the fourth step, specifically include:
for a single input image i, firstly fixing a coordinate system as a view space of the input image, and specifying a position and camera rays in the coordinate system;
extracting the characteristic quantity W of the input image through the improved ResNet34 network (i)
For a point x on the camera ray (i) By using known internal references, x is calculated (i) Projected onto the image coordinates pi (x (i) ) On, then extracting corresponding image feature vectors W between pixel features (i) (π(x (i) ));
Finally, the image features are transferred to the decomposition network along with the position (x, y, z) and the view direction (θ, φ), and the view angle (rgb σ) is output.
In step six, the submitted data includes 2D pictures and model information, and preprocessing of the pictures includes image segmentation and format conversion.
Working principle: on the basis of the prior art, the NeRF algorithm is deeply optimized, the model reasoning speed and generalization are improved, the model can support a small amount of input, namely 3-4 two-dimensional commodity graphs are input, and three-dimensional commodities can be reconstructed.
It will be evident to those skilled in the art that the invention is not limited to the details of the foregoing illustrative embodiments, and that the present invention may be embodied in other specific forms without departing from the spirit or essential characteristics thereof. The present embodiments are, therefore, to be considered in all respects as illustrative and not restrictive, the scope of the invention being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein. Any reference sign in a claim should not be construed as limiting the claim concerned.
Furthermore, it should be understood that although the present disclosure describes embodiments, not every embodiment is provided with a separate embodiment, and that this description is provided for clarity only, and that the disclosure is not limited to the embodiments described in detail below, and that the embodiments described in the examples may be combined as appropriate to form other embodiments that will be apparent to those skilled in the art.

Claims (6)

1. An improved method for three-dimensional reconstruction and neural rendering network is characterized in that: the method comprises the following steps:
step one, extracting the characteristics of the two-dimensional view,
the NeRF method requires a picture of a known viewing angle as an input and takes a lot of training time;
PixelNeRF allows the network to be trained across multiple scenes to learn previous scenes, i.e., to acquire a priori knowledge of the scenes, enabling it to perform new view synthesis from a sparse view set in a feed-forward manner;
the PixelNeRF adds a full convolution image encoder in front of a NeRF network for encoding the image into a characteristic network aligned along the pixels according to the pixels, wherein NeRF is a nerve radiation field, is a method for generating a new view of a complex scene, and constructs an implicit rendering flow;
step two, feature matching, after obtaining the features corresponding to the points in the input view angle, in order to know the projection of the points on the world coordinates on the input view angle, the coordinates of the points need to be converted into a camera space, and the corresponding features are sampled in a feature map according to the normalized coordinates;
step three, model decomposition, neRF is essentially an equation that converts a three-dimensional position p and a two-dimensional representation of direction d into a three-dimensional color c and scalar density σ, where density depends only on position and color depends on position and direction;
the basic idea of FastNeRF is to sacrifice a part of the cache to improve the calculation efficiency and reduce the time required for rendering;
NeRF was decomposed into two neural networks: a position dependent network that generates a deep radiation pattern and a direction dependent network that generates weights, the inner products of the weights and the deep radiation pattern estimating colors in the scene that are seen at the specified position and from the specified direction;
the FastNeRF architecture can be cached efficiently, and the testing time efficiency is remarkably improved while the visual quality of NeRF is maintained;
step four, a single-view neural rendering network, wherein a model combines two-dimensional input visual angle characteristics with a decomposition network, the model is composed of two parts, the first part is a characteristic network in which input images in the step one and the step two are aligned along pixels according to pixel codes, and the second part is a neural network in which the decomposition in the step three is divided into two suitable caches and is used for caching space coordinate information and corresponding coding characteristics and outputting color and density values;
step five, for the multi-view neural rendering network, for inputting multiple views, independently processing coordinates and corresponding features in each view coordinate frame according to a single-view rendering method, then aggregating in a mode of solving an average value of rendering results obtained by each view, transmitting the aggregated results to an MLP to obtain predicted density and color, and finally minimizing the mean square error between the calculated results and a true value;
step six, a commodity three-dimensional modeling scheme flow, wherein the improved network structure is applied to commodity three-dimensional modeling, data is submitted first, 3-4 two-dimensional commodity models are input, each angle of the commodity is provided with a picture, and the picture background is required to be as concise and clear as possible and is best mainly based on white background; then preprocessing the picture; transmitting the processed picture into a model for three-dimensional modeling and nerve rendering; and finally uploading the generated three-dimensional commodity to an app for viewing by a consumer.
2. The improvement as claimed in claim 1 wherein: in the first step, the PixelNeRF adopts the pretrained res net34 to extract the characteristic picture features, and uses the convolution kernels of 3x3 small convolution kernels to replace the convolution kernels in the first 7x7 large-kernel convolution layer, and the corresponding final average pooled kernel is changed into 4x4, so that parameters can be reduced, and the network depth can be deepened to realize network capacity and complexity; the improved network is mainly composed of 16 basic units, 3x3 convolution layers and 1 full connection layer, and the total is 36 layers, and the final size is that
Figure FDA0004050398480000031
Wherein ResNet34 refers to a residual network with 34 convolutional layers as a backbone network for extracting feature maps.
3. The improvement as claimed in claim 1 wherein: in the second step, specifically, the method includes:
knowing the camera coordinate p of the input view angle and the rotation matrix R, the point coordinate is s in the camera space c =R -1 (s i -p);
Since the rotation matrix is an orthogonal matrix, the coordinate is s c =R T (s i -p)=(x c ,y c ,z c );
Projected onto a camera plane with coordinates of
Figure FDA0004050398480000032
Its coordinates after normalization are
Figure FDA0004050398480000033
Finally according to s uv Corresponding features are sampled in the feature map.
4. The improvement as claimed in claim 1 wherein: in the third step, specifically, the method includes:
dividing the NeRF same task into two neural networks suitable for buffering;
location dependent network F pos Outputting a depth radiation pattern (u, v, w) comprising D components;
direction dependent network F dir When the direction of the light is input, the weight (β) of the output component 1 ,...,β D );
The split architecture allows for independent caching of both location dependent and ray direction dependent outputs, which can greatly improve performance when cached.
5. The improvement as claimed in claim 1 wherein: in the fourth step, specifically, the method includes:
for a single input image i, firstly fixing a coordinate system as a view space of the input image, and specifying a position and camera rays in the coordinate system;
extracting the characteristic quantity W of the input image through the improved ResNet34 network (i)
For a point x on the camera ray (i) By using known internal references, x is calculated (i) Projected onto the image coordinates pi (x (i) ) On, then extracting corresponding image feature vectors W between pixel features (i) (π(x (i) ));
Finally, the image features are transferred to the decomposition network along with the position (x, y, z) and the view direction (θ, φ), and the view angle (rgb σ) is output.
6. The improvement as claimed in claim 1 wherein: in the sixth step, the submitted data includes 2D pictures and model information, and preprocessing of the pictures includes image segmentation and format conversion.
CN202310038592.5A 2023-01-12 2023-01-12 Improved method for three-dimensional reconstruction and neural rendering network Withdrawn CN116051746A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310038592.5A CN116051746A (en) 2023-01-12 2023-01-12 Improved method for three-dimensional reconstruction and neural rendering network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310038592.5A CN116051746A (en) 2023-01-12 2023-01-12 Improved method for three-dimensional reconstruction and neural rendering network

Publications (1)

Publication Number Publication Date
CN116051746A true CN116051746A (en) 2023-05-02

Family

ID=86132846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310038592.5A Withdrawn CN116051746A (en) 2023-01-12 2023-01-12 Improved method for three-dimensional reconstruction and neural rendering network

Country Status (1)

Country Link
CN (1) CN116051746A (en)

Similar Documents

Publication Publication Date Title
CN112465955B (en) Dynamic human body three-dimensional reconstruction and visual angle synthesis method
WO2021103137A1 (en) Indoor scene illumination estimation model, method and device, and storage medium and rendering method
CN108876814B (en) Method for generating attitude flow image
Jiang et al. H $ _ {2} $-Mapping: Real-time Dense Mapping Using Hierarchical Hybrid Representation
CN117274501B (en) Drivable digital person modeling method, device, equipment and medium
WO2023004559A1 (en) Editable free-viewpoint video using a layered neural representation
CN116385667B (en) Reconstruction method of three-dimensional model, training method and device of texture reconstruction model
CN117557714A (en) Three-dimensional reconstruction method, electronic device and readable storage medium
CN116134491A (en) Multi-view neuro-human prediction using implicit differentiable renderers for facial expression, body posture morphology, and clothing performance capture
CN115222917A (en) Training method, device and equipment for three-dimensional reconstruction model and storage medium
CN114429531A (en) Virtual viewpoint image generation method and device
CN115115805A (en) Training method, device and equipment for three-dimensional reconstruction model and storage medium
CN117730530A (en) Image processing method and device, equipment and storage medium
CN114677479A (en) Natural landscape multi-view three-dimensional reconstruction method based on deep learning
CN114996814A (en) Furniture design system based on deep learning and three-dimensional reconstruction
CN116778063A (en) Rapid virtual viewpoint synthesis method and device based on characteristic texture grid and hash coding
CN113673567B (en) Panorama emotion recognition method and system based on multi-angle sub-region self-adaption
CN117274066B (en) Image synthesis model, method, device and storage medium
CN117635801A (en) New view synthesis method and system based on real-time rendering generalizable nerve radiation field
Hara et al. Enhancement of novel view synthesis using omnidirectional image completion
WO2021245326A1 (en) A method, an apparatus and a computer program product for video encoding and video decoding
Sumantri et al. 360 panorama synthesis from a sparse set of images on a low-power device
Li et al. Progressive multi-scale light field networks
Li et al. Point-Based Neural Scene Rendering for Street Views
CN116051746A (en) Improved method for three-dimensional reconstruction and neural rendering network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20230502

WW01 Invention patent application withdrawn after publication