CN116778061B - Three-dimensional object generation method based on non-realistic picture - Google Patents

Three-dimensional object generation method based on non-realistic picture Download PDF

Info

Publication number
CN116778061B
CN116778061B CN202311070901.3A CN202311070901A CN116778061B CN 116778061 B CN116778061 B CN 116778061B CN 202311070901 A CN202311070901 A CN 202311070901A CN 116778061 B CN116778061 B CN 116778061B
Authority
CN
China
Prior art keywords
loss function
training
dimensional object
diffusion model
image
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311070901.3A
Other languages
Chinese (zh)
Other versions
CN116778061A (en
Inventor
徐浩然
李泽健
陈培
孙凌云
王小松
陈晓皎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202311070901.3A priority Critical patent/CN116778061B/en
Publication of CN116778061A publication Critical patent/CN116778061A/en
Application granted granted Critical
Publication of CN116778061B publication Critical patent/CN116778061B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/02Non-photorealistic rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T15/003D [Three Dimensional] image rendering
    • G06T15/10Geometric effects
    • G06T15/20Perspective computation
    • G06T15/205Image-based rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/10Constructive solid geometry [CSG] using solid primitives, e.g. cylinders, cubes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/70Labelling scene content, e.g. deriving syntactic or semantic representations

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Generation (AREA)

Abstract

The invention discloses a three-dimensional object generation method based on a non-realistic picture, which is characterized in that the probability distribution of a generated image is obtained through a pre-training diffusion model based on text prompt words and a depth map, and the loss function is carried out through the probability distribution of the generated image and the KL divergence of the probability distribution of a target image so as to update the parameters of a nerve radiation field, so that the updated three-dimensional geometric model generated by the nerve radiation field can be more accurate without depending on a depth estimator. The invention suppresses the density of non-subject matter outside the semantic mask through the floating artifact loss function, namely the loss function related to the density map and the subject semantic mask, thereby eliminating the generation of floating artifacts, encouraging the density increase in the semantic mask to form a more accurate three-dimensional geometric model.

Description

Three-dimensional object generation method based on non-realistic picture
Technical Field
The invention belongs to the field of deep learning image processing technology and three-dimensional object generation, and particularly relates to a three-dimensional object generation method based on a non-realistic picture.
Background
In recent years, neural radiation fields (Neural Radiance Fields, neRF) have made tremendous progress in the field of modeling to generate realistic three-dimensional objects. NeRF-based three-dimensional generation methods learn a three-dimensional model from a series of two-dimensional images by training a neural network. On the other hand, diffusion Models (Diffusion Models) have significantly driven the development of text-to-image generation. The prior distribution of the image is generated by using the diffusion model, and the NeRF-based three-dimensional generation method can realize a generation flow without camera parameters and positions.
In the existing diffusion model distillation-guided NeRF generation method, a rendering graph (distribution) obtained by NeRF through a differentiable rendering process is required to be close to a generated target image (distribution) of a diffusion model, and the method comprises two methods of score distillation sampling (Score Distillation Sampling, SDS) and variation score distillation (Variational Score Distillation, VSD). Wherein the former optimizes the mean square error between a single target graph and a single rendered graph and the latter optimizes the KL divergence between the target graph distribution and the rendered graph distribution. In the nerve radiation field method based on diffusion model distillation, the diffusion model can accept various modes (such as characters and line manuscripts) as input conditions, meets the wide production scene, and has good research and application values.
The existing NeRF generation method guided by diffusion model distillation is based on a realistic picture. When non-realistic modeling is performed based on the non-realistic picture (for example, three-dimensional object cartoon style generation modeling is performed based on the Chinese wind-art plane cartoon picture), the modeling quality of the existing method is obviously reduced: on the one hand, the existing NeRF generation method is mostly based on a specific illumination model (such as Lambertian diffuse reflection illumination model), but the non-realistic picture mostly does not follow the specific illumination model, and on the basis, errors (such as a large amount of floating artifacts) are introduced in the optimization of the view-dependent neural radiation field; on the other hand, the existing NeRF generation method based on diffusion model distillation guidance depends on the accuracy of a depth estimator in geometric optimization, and the depth of a non-photorealistic picture is difficult to estimate by using or fine-tuning an existing depth estimator, so that the optimization of the geometric structure of a nerve radiation field is influenced.
Therefore, in the neural radiation field generation method based on diffusion model distillation, for non-realistic picture guidance, floating artifacts caused by non-Lambertian diffuse reflection illumination in the model training process need to be removed so as to form a correct geometric structure, and further downstream production applications such as grid generation and the like are ensured; furthermore, a geometric optimization method independent of depth estimation is needed to meet the requirements of non-realistic modeling.
Disclosure of Invention
The invention provides a three-dimensional object generation method based on a non-realistic picture, which can accurately obtain a three-dimensional geometric model based on the non-realistic picture.
The embodiment of the invention provides a three-dimensional object generation method based on a non-realistic picture, which comprises the following steps:
fine tuning the basic diffusion model based on the non-realistic picture set to obtain a pre-training diffusion model;
the method comprises the steps of constructing a training system comprising a pre-training diffusion model, a nerve radiation field, a control Net network and a semantic segmentation network, wherein text prompt words are input into the pre-training diffusion model to obtain probability distribution of a target image, probability distribution of a rendered image, a depth map and a density map corresponding to the rendered image are obtained based on the nerve radiation field, the pre-training diffusion model is controlled by the control Net network according to the depth map to obtain probability distribution of a generated image based on the text prompt words, and semantic segmentation is carried out on the rendered image by the semantic segmentation network to obtain a main semantic mask;
constructing a total loss function comprising a variational fractional distillation loss function, a geometrically optimized loss function, and a floating artifact loss function, wherein:
distilling the loss function by a variation score such that the expectation of the KL-divergence of the probability distribution of the rendered image and the target image is not higher than a first loss threshold, the expectation of the KL-divergence of the probability distribution of the generated image and the target image is not higher than a second loss threshold, and the expectation of the loss function value with respect to the density map and the subject semantic mask is not higher than a third loss threshold by a floating artifact loss function;
training a group of nerve radiation fields by a training system based on the text prompt words and the camera poses by utilizing a total loss function to obtain a plurality of final nerve radiation fields;
inputting the camera pose into a randomly selected one of a plurality of final nerve radiation fields to obtain a rendering diagram of the three-dimensional object.
Further, the geometric optimization loss function is obtained based on expected construction of KL divergence of probability distribution of the target image and the generated image in different diffusion steps, the generated image is close to the target image through the geometric optimization loss function to update the depth map, and parameters of the nerve radiation field are updated through updating the depth map.
Further, the floating artifact loss function is constructed based on density maps and a main semantic mask under different camera poses, the density maps are updated through the floating artifact loss function, and parameters of the nerve radiation field are updated through the updated density maps, so that floating artifacts of the rendered image are removed.
Further, the variation fraction distillation loss function is constructed based on the expectation of KL divergence between the probability distribution of the target image and the rendered image under different camera poses and different diffusion steps, and the rendered image is made to approach the target image through the variation fraction distillation loss function so as to update the parameters of the nerve radiation field.
Further, when the training iteration turns reach the set turn superparameter, the density map is updated through the floating artifact loss function, and the parameters of the nerve radiation field are updated through the updated density map, so that the floating artifact of the three-dimensional object is eliminated.
Further, fine tuning the diffusion model based on the non-realistic picture set to obtain a pre-trained diffusion model, including:
and taking the non-realistic picture as a pre-training data set, bypassing the up-and-down sampling multi-layer perceptron which is additionally arranged on the basic diffusion model, and training the multi-layer perceptron through an image synthesis technology based on the pre-training data set to obtain a pre-training diffusion model.
Further, the multi-layer perceptron of increased up-down sampling is bypassed at the base diffusion model, comprising:
and adding up-down sampling multi-layer perceptrons on the bypass of the basic diffusion model by adopting a low-rank matrix fine tuning method, and respectively superposing the output results of the basic diffusion model after training and the up-down sampling multi-layer perceptrons on the basis of text prompt words to obtain a target image.
Further, based on one final neural radiation field obtained by randomly sampling a plurality of final neural radiation fields, a signed distance function value corresponding to the three-dimensional object is obtained based on one final neural radiation field obtained by randomly sampling, and the generation of the geometric grid of the three-dimensional object is performed by adopting a depth travelling tetrahedron technology based on the signed distance function value.
Compared with the prior art, the invention has the beneficial effects that:
according to the method, the probability distribution of the generated image is obtained through the pre-training diffusion model based on the text prompt words and the depth map, and the loss function is carried out on the probability distribution of the generated image and the KL divergence of the probability distribution of the target image so as to update the parameters of the nerve radiation field, so that the three-dimensional geometric model generated by the updated nerve radiation field can be more accurate without depending on the depth estimator.
The floating artifact suppression method provided by the invention suppresses the density of non-subject matters outside the semantic mask through the loss function of the density map and the subject semantic mask, so as to eliminate the generation of floating artifacts, and encourages the density increase in the semantic mask to accelerate the convergence of a training system and form an accurate three-dimensional geometric model.
Drawings
FIG. 1 is a flow chart of a method for generating a three-dimensional object based on a non-realistic picture according to an embodiment of the present invention;
fig. 2 is a data path diagram of a three-dimensional object generating method based on a non-realistic picture according to an embodiment of the present invention.
Detailed Description
The present invention will be further described in detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the detailed description and specific examples, while indicating the invention, are intended for purposes of illustration only and are not intended to limit the scope of the invention. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the present invention.
The embodiment of the invention provides a variational fractional distillation loss function, a geometric optimization loss function and a floating artifact loss function to train a nerve radiation field, so that the nerve radiation field can obtain a more accurate three-dimensional geometric model aiming at a non-realistic image, and the method is specifically described as follows:
the embodiment of the invention provides a three-dimensional object generation method based on a non-realistic picture, which comprises the following steps as shown in fig. 1 and 2:
s1, fine tuning is carried out on a basic diffusion model based on a non-realistic picture set to obtain a pre-training diffusion model, and the specific steps are as follows:
the embodiment of the invention obtains a non-realistic picture set, wherein the non-realistic picture set comprises a plurality of non-realistic pictures, specifically, as shown in fig. 2, generally 10-50 non-realistic pictures, the plurality of non-realistic pictures are new country lotus pictures and white backgrounds, and the non-realistic picture set is used as a pre-training data set.
The specific embodiment of the invention adopts a low-rank matrix fine tuning method (Low Rank Adaptation, loRA) to add up-and-down sampled multi-layer perceptron on a bypass of a basic diffusion model, and based on text prompt words, output results of the basic diffusion model after training and the up-and-down sampled multi-layer perceptron are respectively overlapped to obtain a target image, wherein in one embodiment, the basic diffusion model is a potential space diffusion model (Latent Diffusion Model, LDM).
The object image provided by the embodiment of the inventionThe method comprises the following steps:
wherein, the liquid crystal display device comprises a liquid crystal display device,based on expansionLoose model parameters (Lei's)>For multi-layer perceptron parameters->Is gaussian noise.
The embodiment of the invention discloses a multi-layer perceptron which is sampled up and down and is additionally arranged on a bypass of a basic diffusion model, and the multi-layer perceptron is trained by an image synthesis technology (streamBooth) based on a pre-training data set to obtain a pre-training diffusion model capable of generating target style contents based on given characters.
S2, constructing a training system comprising a pre-training diffusion model, a nerve radiation field, a control network (ControlNet) and a semantic segmentation network (SAM, segment Anything Model). Inputting text prompt words into a pre-training diffusion model to obtain probability distribution of a target image, obtaining probability distribution of a rendered image and a depth map and a density map corresponding to the rendered image based on a nerve radiation field, controlling the pre-training diffusion model to obtain probability distribution of a generated image according to the depth map by adopting a control Net network based on the text prompt words, and performing semantic segmentation on the rendered image by adopting a semantic segmentation network to obtain a main semantic mask.
In one embodiment, text prompt words are obtainedText prompt->As input to the training system. Text prompt word ++>And inputting the pre-training diffusion model to obtain probability distribution of the target image. Parametrization of->Random initialization of neural radiation fields using a differentiable renderer +.>Rendering operation is carried out on the nerve radiation field, and the pose of the camera is given>Renderer->Emitting rays pixel by pixel and calculating each ray +.>Color weighting along the line sample points to get a rendered image +.>Thereby obtaining a probability distribution of the rendered image, wherein +.>Is the origin of rays, < >>Is a time parameter, +.>Is the direction of the ray. Computing depth maps from nerve radiation fieldsAnd Density map->. Text prompt word ++>And depth map->And inputting a pre-training diffusion model and a control Net network to obtain probability distribution of the generated image. Inputting the rendering graph into a semantic segmentation network SAM to obtain a main semantic mask +.>
S3, constructing a total loss function comprising a variation fractional distillation loss function, a geometric optimization loss function and a floating artifact loss function, wherein:
the specific embodiment of the invention enables the expectation of KL divergence of probability distribution of the rendered image and the target image to be not higher than a set first loss threshold value through changing the fractional distillation loss function.
In a specific embodiment, the variation fraction distillation loss function provided in this embodiment is constructed based on the expectation of KL divergence between the probability distributions of the target image and the rendered image under different camera poses and different diffusion steps. In view of optimizing complexity, the present embodiment updates parameters of the neural radiation field by varying the fractional distillation loss function such that the noisy rendered image distribution approximates the noisy target image distribution.
The embodiment of the invention provides a variation fractional distillation loss functionThe method comprises the following steps:
wherein, the liquid crystal display device comprises a liquid crystal display device,for diffusion step->And camera pose->Is>And->Super-parameters for fractional distillation loss function of variation, +.>As the first in the fractional distillation loss functionWeight under s diffusion step, +.>For KL divergence function, +.>Is the firstsNoisy rendered image under diffusion step +.>Probability distribution of->Is->Noisy target image under diffusion step +.>Probability distribution of->For the pose of the camera, the user is added with->Is text prompt word, is->For giving +.>All possible routes->Probability distribution of parameterized three-dimensional representation, +.>Noisy rendered image of the individual diffusion step +.>Wherein->Is a unit sheetQuantity (S)>Representing a gaussian distribution. Updating the three-dimensional representation using a variation fractional distillation (VSD, variational Score Distillation) method with the expectation that the probability distribution KL divergence (Kullback-Leibler Divergence) of the two noisy rendered images and the target image is not higher than a set first loss threshold>Further update the nerve radiation field parameter +.>
Particular embodiments of the present invention geometrically optimize the loss function such that the expectation of KL divergence of the probability distribution of the generated image and the target image is not higher than the second loss threshold.
In a specific embodiment, the geometric optimization loss function provided in this embodiment is based on the expected construction of KL divergences of probability distributions of the target image and the generated image in different diffusion steps, the generated image is made to approach the target image by the geometric optimization loss function to update the depth map, and parameters of the neural radiation field are updated by updating the depth map.
The geometric optimization loss function provided by the embodimentThe method comprises the following steps:
wherein, the liquid crystal display device comprises a liquid crystal display device,for diffusion step->And camera pose->The period of (2)Function of looking at->And->Super-parameters for geometrically optimized loss function, +.>For the first +.>The weight of the diffusion step is calculated,is->Depth map based diffusion step->And text prompt +.>Pre-training diffusion model noisy generated graph +.>Probability distribution of->Is->Noisy target image of the individual diffusion step +.>Probability distribution of (2); depth map based->And text prompt +.>Is generated by a pre-trained diffusion model of (2)The probability distribution of the generated map characterizes the pictures consistent with the current geometry of the neuro-radiation field, while the probability distribution of the target image generated by the pre-trained diffusion model without depth conditions characterizes all pictures, when the neuro-radiation field forms the correct geometry,/if>And->With small differences, on the basis of this, a geometrical optimization of the nerve radiation field without depth estimation is performed, i.e. the optimization +.>To minimize depth map basedDiffusion model generation map (with noise) distribution and depth-free map->The expectation of KL divergence between the generated map (noisy) distributions of the conditional predictive diffusion model is less than the set second loss threshold.
The depth map of the obtained rendering image visual angle provided by the embodiment of the inventionThe method comprises the following steps:
wherein, the liquid crystal display device comprises a liquid crystal display device,is the pose of the camera->Down->Density value at>And->The start and end points, respectively, of the penetration of the radiation through the object.
The specific embodiment of the invention generates the three-dimensional object based on the non-realistic picture, has floating artifacts and needs to be restrained. Depth estimation of non-realistic pictures using conventional depth estimators (e.g., dense Prediction Transformer, DPT) will suffer from a greater error rate, resulting in incorrect gradient return of existing neural radiation field-related methods on the loss function of the relevant depth information. When training on non-realistic pictures, the neural radiation field will generate a significant amount of floating artifacts. The specific embodiment of the invention is in the initial training stage of the nerve radiation field, />For round superparameter,/->For iterative rounds) training with the above-described distillation loss function and geometrically optimized loss function, which training phase neural radiation field mainly generates a three-dimensional object prototype, after which, additionally introducing a floating artifact loss function based on subject identification, eliminating the generated floating artifact, and guiding the subject density to increase, in particular, when->Calculating corresponding density map of neural radiation field rendering image +.>The method comprises the following steps:
pre-training diffusion using semantic segmentation network SAMThe model generation diagram is subjected to main body semantic segmentation to form a main body semantic maskDefine floating artifact loss function +.>Suppressing the density of non-subjects outside the semantic mask thereby eliminating the creation of floating artifacts and encouraging density growth within the semantic mask to increase the model convergence rate.
The floating artifact loss function provided by the embodiment of the inventionThe method comprises the following steps:
wherein, the liquid crystal display device comprises a liquid crystal display device,for the pose of the camera>Is>For the subject semantic mask, ++>For element-by-element product>Is [0,1]An increasing function (e.g), />Is [0,1]A decreasing function (e.g) Wherein->Is a super parameter). The expectation regarding the loss function value of the density map and the subject semantic mask is made not higher than the third loss threshold by floating the artifact loss function.
The embodiment of the invention provides the total loss functionThe method comprises the following steps:
wherein, the liquid crystal display device comprises a liquid crystal display device,、/>and->The weight values of the variational fractional distillation loss function, the floating artifact loss function and the geometric optimization loss function are respectively.
S4, training a group of nerve radiation fields through a training system to obtain a plurality of final nerve radiation fields by utilizing a total loss function based on the text prompt words.
S5, randomly sampling a final nerve radiation field from the multiple final nerve radiation fields, and inputting any camera pose into the final nerve radiation field to obtain a rendering diagram of the three-dimensional object under any view angle. Based on the final neural radiation field, a signed distance function value (Signed Distance Function, SDF) corresponding to the three-dimensional representation may be calculated, and a depth travelling tetrahedron technique (Deep Marching Tetrahedra, dmet) may be used to generate a geometric mesh of the three-dimensional object based on the signed distance function value.
The present embodiment is only for explanation of the present invention and is not to be construed as limiting the present invention, and modifications to the present embodiment, which may not creatively contribute to the present invention as required by those skilled in the art after reading the present specification, are all protected by patent laws within the scope of claims of the present invention.

Claims (8)

1. The three-dimensional object generation method based on the non-realistic picture is characterized by comprising the following steps of:
fine tuning the basic diffusion model based on the non-realistic picture set to obtain a pre-training diffusion model;
the method comprises the steps of constructing a training system comprising a pre-training diffusion model, a nerve radiation field, a control Net network and a semantic segmentation network, wherein text prompt words are input into the pre-training diffusion model to obtain probability distribution of a target image, probability distribution of a rendered image, a depth map and a density map corresponding to the rendered image are obtained based on the nerve radiation field, the pre-training diffusion model is controlled by the control Net network according to the depth map to obtain probability distribution of a generated image based on the text prompt words, and semantic segmentation is carried out on the rendered image by the semantic segmentation network to obtain a main semantic mask;
constructing a total loss function comprising a variational fractional distillation loss function, a geometrically optimized loss function, and a floating artifact loss function, wherein:
distilling the loss function by a variation score such that the expectation of the KL-divergence of the probability distribution of the rendered image and the target image is not higher than a first loss threshold, the expectation of the KL-divergence of the probability distribution of the generated image and the target image is not higher than a second loss threshold, and the expectation of the loss function value with respect to the density map and the subject semantic mask is not higher than a third loss threshold by a floating artifact loss function;
training a group of nerve radiation fields by a training system based on the text prompt words and the camera poses by utilizing a total loss function to obtain a plurality of final nerve radiation fields;
inputting the camera pose into a randomly selected one of a plurality of final nerve radiation fields to obtain a rendering diagram of the three-dimensional object.
2. The method for generating the three-dimensional object based on the non-realistic picture according to claim 1, wherein the geometric optimization loss function is obtained based on expected construction of KL divergences of probability distributions of the target image and the generated image in different diffusion steps, the generated image is made to approach the target image by the geometric optimization loss function to update a depth map, and parameters of the neural radiation field are updated by updating the depth map.
3. The method for generating the three-dimensional object based on the non-realistic picture according to claim 1, wherein the floating artifact loss function is constructed based on a density map and a main semantic mask under different camera poses, the density map is updated through the floating artifact loss function, and parameters of the nerve radiation field are updated through updating the density map, so that floating artifacts of the rendered image are removed.
4. The method for generating a three-dimensional object based on a non-realistic picture according to claim 1, wherein the variational fractional distillation loss function is constructed based on the expectation of KL divergence between the probability distribution of the target image and the rendered image under different camera pose and different diffusion steps, and the parameters of the neural radiation field are updated by bringing the rendered image close to the target image by the variational fractional distillation loss function.
5. The method for generating a three-dimensional object based on a non-realistic picture according to claim 1, wherein when the training iteration round reaches the set round super-parameters, the density map is updated by the floating artifact loss function, and the parameters of the neural radiation field are updated by updating the density map, so that the floating artifact of the three-dimensional object is removed.
6. The method for generating a three-dimensional object based on a non-realistic picture according to claim 1, wherein fine tuning the diffusion model based on the non-realistic picture set results in a pre-trained diffusion model, comprising:
and taking the non-realistic picture as a pre-training data set, bypassing the up-and-down sampling multi-layer perceptron which is additionally arranged on the basic diffusion model, and training the multi-layer perceptron through an image synthesis technology based on the pre-training data set to obtain a pre-training diffusion model.
7. The method of generating a three-dimensional object based on non-realistic pictures of claim 6, wherein the basic diffusion model bypasses the added up-down sampled multi-layer perceptron, comprising:
and adding up-down sampling multi-layer perceptrons on the bypass of the basic diffusion model by adopting a low-rank matrix fine tuning method, and respectively superposing the output results of the basic diffusion model after training and the up-down sampling multi-layer perceptrons on the basis of text prompt words to obtain a target image.
8. The method for generating a three-dimensional object based on a non-realistic picture according to claim 1, wherein the generating of the geometric grid of the three-dimensional object is performed by using a depth travelling tetrahedron technique based on the signed distance function value, based on one final neural radiation field obtained by randomly sampling a plurality of final neural radiation fields, based on one final neural radiation field obtained by randomly sampling to obtain the signed distance function value corresponding to the three-dimensional object.
CN202311070901.3A 2023-08-24 2023-08-24 Three-dimensional object generation method based on non-realistic picture Active CN116778061B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311070901.3A CN116778061B (en) 2023-08-24 2023-08-24 Three-dimensional object generation method based on non-realistic picture

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311070901.3A CN116778061B (en) 2023-08-24 2023-08-24 Three-dimensional object generation method based on non-realistic picture

Publications (2)

Publication Number Publication Date
CN116778061A CN116778061A (en) 2023-09-19
CN116778061B true CN116778061B (en) 2023-10-27

Family

ID=87986385

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311070901.3A Active CN116778061B (en) 2023-08-24 2023-08-24 Three-dimensional object generation method based on non-realistic picture

Country Status (1)

Country Link
CN (1) CN116778061B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117237542B (en) * 2023-11-10 2024-02-13 中国科学院自动化研究所 Three-dimensional human body model generation method and device based on text

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115393410A (en) * 2022-07-18 2022-11-25 华东师范大学 Monocular view depth estimation method based on nerve radiation field and semantic segmentation
WO2023080921A1 (en) * 2021-11-03 2023-05-11 Google Llc Neural radiance field generative modeling of object classes from single two-dimensional views
WO2023093186A1 (en) * 2022-06-15 2023-06-01 之江实验室 Neural radiation field-based method and apparatus for constructing pedestrian re-identification three-dimensional data set
CN116563459A (en) * 2023-04-13 2023-08-08 北京航空航天大学 Text-driven immersive open scene neural rendering and mixing enhancement method

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023080921A1 (en) * 2021-11-03 2023-05-11 Google Llc Neural radiance field generative modeling of object classes from single two-dimensional views
WO2023093186A1 (en) * 2022-06-15 2023-06-01 之江实验室 Neural radiation field-based method and apparatus for constructing pedestrian re-identification three-dimensional data set
CN115393410A (en) * 2022-07-18 2022-11-25 华东师范大学 Monocular view depth estimation method based on nerve radiation field and semantic segmentation
CN116563459A (en) * 2023-04-13 2023-08-08 北京航空航天大学 Text-driven immersive open scene neural rendering and mixing enhancement method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
三维语义场景复原网络;林金花;王延杰;;光学精密工程(05);全文 *

Also Published As

Publication number Publication date
CN116778061A (en) 2023-09-19

Similar Documents

Publication Publication Date Title
CN108230329B (en) Semantic segmentation method based on multi-scale convolution neural network
CN116778061B (en) Three-dimensional object generation method based on non-realistic picture
CN111386536A (en) Semantically consistent image style conversion
CN111985376A (en) Remote sensing image ship contour extraction method based on deep learning
CN109583509B (en) Data generation method and device and electronic equipment
CN113888689A (en) Image rendering model training method, image rendering method and image rendering device
US20230281913A1 (en) Radiance Fields for Three-Dimensional Reconstruction and Novel View Synthesis in Large-Scale Environments
CN109461177B (en) Monocular image depth prediction method based on neural network
CN108898639A (en) A kind of Image Description Methods and system
US20230177822A1 (en) Large scene neural view synthesis
US11010948B2 (en) Agent navigation using visual inputs
CN115797571A (en) New visual angle synthesis method of 3D stylized scene
CN109741378A (en) Multimodal medical image registration method, apparatus, platform and medium based on MRF model
KR20200063368A (en) Unsupervised stereo matching apparatus and method using confidential correspondence consistency
KR101602593B1 (en) Method and arrangement for 3d model morphing
US11403807B2 (en) Learning hybrid (surface-based and volume-based) shape representation
Zhu et al. FSGS: Real-Time Few-shot View Synthesis using Gaussian Splatting
CN111161384B (en) Path guiding method of participation medium
CN117333637B (en) Modeling and rendering method, device and equipment for three-dimensional scene
Li et al. Is Synthetic Data From Diffusion Models Ready for Knowledge Distillation?
CN117095132B (en) Three-dimensional reconstruction method and system based on implicit function
CN106407932A (en) Handwritten number recognition method based on fractional calculus and generalized inverse neural network
KR20230167746A (en) Method and system for generating polygon meshes approximating surfaces using root-finding and iteration for mesh vertex positions
CN116363320A (en) Training of reconstruction model and three-dimensional model reconstruction method, device, equipment and medium
Xia et al. Vecfontsdf: Learning to reconstruct and synthesize high-quality vector fonts via signed distance functions

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant