CN115082636B - Single image three-dimensional reconstruction method and device based on mixed Gaussian network - Google Patents

Single image three-dimensional reconstruction method and device based on mixed Gaussian network Download PDF

Info

Publication number
CN115082636B
CN115082636B CN202210792028.8A CN202210792028A CN115082636B CN 115082636 B CN115082636 B CN 115082636B CN 202210792028 A CN202210792028 A CN 202210792028A CN 115082636 B CN115082636 B CN 115082636B
Authority
CN
China
Prior art keywords
function
dimensional
value
human body
gaussian
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210792028.8A
Other languages
Chinese (zh)
Other versions
CN115082636A (en
Inventor
吴连朋
于涛
刘烨斌
许瀚誉
朱家林
王宝云
于芝涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Hisense Visual Technology Co Ltd
Juhaokan Technology Co Ltd
Original Assignee
Tsinghua University
Hisense Visual Technology Co Ltd
Juhaokan Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, Hisense Visual Technology Co Ltd, Juhaokan Technology Co Ltd filed Critical Tsinghua University
Priority to CN202210792028.8A priority Critical patent/CN115082636B/en
Publication of CN115082636A publication Critical patent/CN115082636A/en
Application granted granted Critical
Publication of CN115082636B publication Critical patent/CN115082636B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/10Constructive solid geometry [CSG] using solid primitives, e.g. cylinders, cubes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/42Global feature extraction by analysis of the whole pattern, e.g. using frequency domain transformations or autocorrelation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Geometry (AREA)
  • Biophysics (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Computer Graphics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to the technical field of three-dimensional reconstruction, and provides a single image three-dimensional reconstruction method and device based on a mixed Gaussian network, wherein a trained mixed Gaussian network is used for extracting two-dimensional target characteristics of each pixel point from a single Zhang Caise image containing the whole body of a human body, and determining a continuous occupation value function of each human body pixel point on a corresponding three-dimensional projection ray according to the two-dimensional target characteristics; the three-dimensional human body model is generated by carrying out uniform discrete indication function generated by three-linear sampling on the continuous occupation value function, and because the mixed Gaussian network can extract more human body detail features from a single color image, the three-dimensional reconstruction efficiency is improved based on a single Zhang Caise image, and meanwhile, the precision of the human body three-dimensional model can be ensured.

Description

Single image three-dimensional reconstruction method and device based on mixed Gaussian network
Technical Field
The application relates to the technical field of three-dimensional reconstruction, and provides a single-image three-dimensional reconstruction method and device based on a hybrid Gaussian network.
Background
In computer vision, three-dimensional reconstruction refers to a process of reconstructing three-dimensional information according to single-view or multi-view images, and is widely applied to Virtual fitting, 3D games, autonomous navigation and other scenes, particularly to a holographic communication scene based on Virtual Reality (VR)), augmented Reality (Augmented Reality, AR), and through a reconstructed human body three-dimensional model, immersive interactive experience is brought to a user.
Because the multi-view-based human body three-dimensional reconstruction needs to calibrate the multi-view cameras, the layout of the multi-view cameras has strict limit requirements, and the calibration process is complicated, so that the three-dimensional reconstruction efficiency is seriously affected. In the holographic communication scene, the real-time requirement is high, so that the three-dimensional reconstruction efficiency is improved, and the three-dimensional reconstruction of the human body based on the single view is a current research hot spot.
Disclosure of Invention
The application provides a single-image three-dimensional reconstruction method and device based on a mixed Gaussian network, which are used for improving the efficiency and the precision of three-dimensional reconstruction.
In one aspect, the application provides a single image three-dimensional reconstruction method based on a hybrid Gaussian network, comprising the following steps:
acquiring a single Zhang Caise image containing the whole body of a human body;
Extracting the two-dimensional target characteristics of each pixel point from the color image by adopting a pre-trained Gaussian mixture network, and calculating the Gaussian mixture function value of each pixel point on a corresponding three-dimensional projection ray according to the two-dimensional target characteristics of each pixel point;
According to the Gaussian mixture function value corresponding to each pixel point, a continuous occupation value function on each three-dimensional projection ray is obtained, and each occupation value is used for representing whether the corresponding pixel point is positioned inside or outside the human body model;
Performing tri-linear sampling on each continuous occupation value function in a three-dimensional space to generate a uniform discrete display function representing the geometrical surface of a human body;
And extracting the geometrical surface of the human body from the uniform discrete oscillography function to obtain the three-dimensional human body model.
In another aspect, the present application provides a reconstruction device, including a processor, a memory, a display screen, and a communication interface, where the communication interface, the display screen, the memory, and the processor are connected by a bus;
The memory includes a data storage unit and a program storage unit, the program storage unit stores a computer program, and the processor performs the following operations according to the computer program:
acquiring a single Zhang Caise image containing the whole body of the human body acquired by a camera through the communication interface, and storing the single Zhang Caise image in the data storage unit;
Extracting the two-dimensional target characteristics of each pixel point from the color image by adopting a pre-trained Gaussian mixture network, and calculating the Gaussian mixture function value of each pixel point on a corresponding three-dimensional projection ray according to the two-dimensional target characteristics of each pixel point;
According to the Gaussian mixture function value corresponding to each pixel point, a continuous occupation value function on each three-dimensional projection ray is obtained, and each occupation value is used for representing whether the corresponding pixel point is positioned inside or outside the human body model;
Performing tri-linear sampling on each continuous occupation value function in a three-dimensional space to generate a uniform discrete display function representing the geometrical surface of a human body;
and extracting the geometric surface of the human body from the uniform discrete oscillography function to obtain a three-dimensional human body model, and displaying the three-dimensional human body model through the display screen.
In another aspect, an embodiment of the present application provides a computer readable storage medium, where computer executable instructions are stored, where the computer executable instructions are configured to cause a computer device to perform the single image three-dimensional reconstruction method based on a hybrid gaussian network provided by the embodiment of the present application.
According to the single image three-dimensional reconstruction method and the single image three-dimensional reconstruction device based on the mixed Gaussian network, the two-dimensional target feature of each pixel point is extracted from a single Zhang Caise image containing the whole human body through the pre-trained mixed Gaussian network, the mixed Gaussian function value corresponding to each pixel point is generated according to the two-dimensional target feature, and the continuous occupation value function of each human body pixel point on the corresponding three-dimensional projection ray is obtained according to the mixed Gaussian parameters corresponding to each pixel point; further, three-linear sampling is carried out on the continuous occupation value function in the three-dimensional space, a uniform discrete oscillography function is generated, the surface of the human body model is extracted from the uniform discrete oscillography function, and the three-dimensional human body model is obtained.
Drawings
In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the application, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is an overall architecture diagram of a single image three-dimensional reconstruction method based on a hybrid gaussian network according to an embodiment of the present application;
fig. 2 is a block diagram of HRNet networks provided in an embodiment of the present application;
fig. 3 is a flowchart of a training method of a mixed gaussian network according to an embodiment of the present application;
FIG. 4 is a flowchart of a method for each round of iterative training of a Gaussian mixture network according to an embodiment of the application;
FIG. 5 is a flowchart of a method for determining a target loss value of a Gaussian mixture network according to an embodiment of the application;
Fig. 6 is a flowchart of a single image three-dimensional reconstruction method based on a hybrid gaussian network according to an embodiment of the present application;
FIG. 7 is a flowchart of a method for extracting features through a hybrid Gaussian network according to an embodiment of the application;
FIG. 8 is a flowchart of a method for calculating a Gaussian mixture function value of each pixel point on a three-dimensional projection ray according to an embodiment of the present application;
FIG. 9 is a schematic diagram showing pixel points located inside and outside a mannequin according to an embodiment of the present application;
FIG. 10 is a schematic diagram of densely sampling occupancy values according to an embodiment of the present application;
FIG. 11 is a schematic diagram showing a change of continuous occupancy values on each three-dimensional projection ray according to an embodiment of the present application;
FIG. 12 is a graph showing a comparison of continuous variation of occupancy values using a mixed Gaussian function and a densely sampled version provided by an embodiment of the application;
FIG. 13 is a flow chart of a method for generating a uniform discrete display function according to an embodiment of the present application;
FIG. 14 is a schematic diagram of an optimization process for a Gaussian mixture function graph according to an embodiment of the application;
FIG. 15 is a hardware configuration diagram of a reconstruction device according to an embodiment of the present application;
Fig. 16 is a functional block diagram of a reconstruction device according to an embodiment of the present application.
Detailed Description
With the continuous development of VR/AR holographic communication technology, how to directly reconstruct a corresponding three-dimensional model of a human body by using images becomes an emerging research direction.
Because the multi-view-based human body three-dimensional reconstruction needs to calibrate the multi-view cameras, the layout of the multi-view cameras has strict limit requirements, and the calibration process is complicated, so that the three-dimensional reconstruction efficiency is seriously affected. In the holographic communication scene, the real-time requirement is high, so that three-dimensional reconstruction of human bodies based on single view is mostly adopted.
Currently, popular single view-based three-dimensional reconstruction methods for human bodies are mainly based on three-dimensional reconstruction of parameterized human body models. A commonly used parameterized mannequin is SMPL (A Skinned Multi-Person Linear Model) which contains 72 parameters for describing body posture and describing body shape. When reconstructing based on the parameterized human body model, estimating the two-dimensional joint position according to a single image, and optimizing to obtain the SMPL parameters by the minimum projection distance between the three-dimensional joint and the two-dimensional joint, thereby obtaining the human body model. However, the parameterized model reconstructed by the method has limited expressive power on geometric details of the human surface, and detailed textures of clothes on the human surface cannot be reconstructed well.
In view of this, the embodiment of the application provides a single-image three-dimensional reconstruction method and device based on a mixed Gaussian network by utilizing stronger learning capability of the neural network, wherein the mixed Gaussian network is built by a convolutional neural network and a fully-connected neural network, training data are utilized to train the mixed Gaussian network in advance, in the reconstruction process, a single image containing the whole body of a human body is input into the trained mixed Gaussian network, a mixed Gaussian function of each pixel point in the single image is obtained, a continuous indication function on a projection ray is generated based on pixel-by-pixel mixed Gaussian parameters, the continuous indication function is subjected to three-line sampling in a three-dimensional space, a uniform discrete indication function is generated, and a human body three-dimensional model surface is extracted from the uniform discrete indication function, so that a human body three-dimensional model is obtained. Because the Gaussian mixture function is used for directly describing the change of the occupation value of the pixel point on the corresponding projection ray, a small amount of parameters can be directly used for describing the continuous change form of the occupation value of the three-dimensional projection ray direction, so that the reconstruction efficiency is improved; in addition, the Gaussian mixture network can extract more human body detail features from a single image, so that the accuracy of a human body three-dimensional model can be ensured while the three-dimensional reconstruction efficiency is improved based on the single image.
Referring to fig. 1, for the overall architecture diagram of the single image three-dimensional reconstruction method based on the mixed gaussian network provided by the embodiment of the application, a single Zhang Caise image including the whole body of a human body acquired by a camera is input into the mixed gaussian network, the mixed gaussian function value of each pixel point is output through the mixed gaussian network, and a three-dimensional human body model is generated according to a human body parameter diagram described by the mixed gaussian function value of each pixel point.
In an embodiment of the application, the mixed gaussian network is built up from a convolutional neural network (Convolutional Neural Networks, CNN) and a fully connected neural network (Fully Connected Neural Network, FCNN). Aiming at the coding requirement of high-quality two-dimensional images in the field of human body three-dimensional reconstruction, a HRNet network is adopted in a convolutional neural network part of a hybrid Gaussian network in the concrete implementation, the number of the scales of the network is 3, and the highest resolution scale comprises 8 convolutional residual error network modules; in the fully-connected neural network part of the hybrid Gaussian network, a 5-layer MLP (Multilayer Perception) -network is adopted, and the characteristic dimensions of the middle layer are {256, 512, 256, 256}, respectively.
Referring to fig. 2, for the block diagram of HRNet network provided by the embodiment of the present application, in order to obtain a Feature Map (Feature Map) with a high resolution scale, the HRNet network adopts a mode of firstly reducing resolution (i.e. downsampling) and then raising resolution (i.e. upsampling), and the HRNet network connects the Feature maps (Feature maps) with different resolutions in parallel, and on the basis of the parallel connection, interaction (fusion) between the Feature maps with different resolutions is added, so as to ensure sufficient fusion of local and global features in the convolution process.
In the embodiment of the present application, before three-dimensional reconstruction, a mixed gaussian network needs to be trained, and the training process of the mixed gaussian network is shown in fig. 3, and mainly includes the following steps:
S301: a training dataset is acquired.
In an alternative embodiment, in S301, a training dataset is generated by Rendering high-quality three-dimensional body scan data based on a physical-based Rendering method (Physics-based Rendering), so that the training dataset includes body images obtained by Rendering three-dimensional body scan data under different poses and clothes. Specifically, for each human object, the rendering viewpoint is rotated along the main direction of the three-dimensional human model, and in the process of rotating for one circle (i.e. 360 degrees), one high-quality human image is rendered every 2 degrees at intervals, and 180 high-quality human images are rendered in total. The rendering interval can be set according to actual requirements.
In another alternative embodiment, in S301, a human body image is collected from an existing human body data set and network resources to form a training data set.
S302: and (3) aiming at each human body image, calculating the real indication function of each sampling point on each three-dimensional projection ray inside and outside the model by adopting a sampling method based on the pixel projection rays, and fitting the true value of the Gaussian mixture function corresponding to each three-dimensional projection ray according to the real indication function.
In the embodiment of the application, in the imaging process, light rays in the three-dimensional space uniformly irradiate the human body to form projection rays. Therefore, in S302, a plurality of sampling points are set on each three-dimensional projection ray, a sampling method based on pixel projection rays is adopted for each human body image, and real indication functions of each sampling point on each three-dimensional projection ray inside and outside the model are calculated, wherein the size of sampling values is shown in an exemplary mode whether the sampling points are located inside or outside the model (for example, when the sampling points are located inside the model, the value is 1, and when the sampling points are located outside the model, the value is 0), further, according to the real indication functions of each sampling point, a mixed gaussian function true value corresponding to the corresponding three-dimensional projection ray is fitted, and the mixed gaussian function true value corresponding to each three-dimensional projection ray and the real indication function of each sampling point can be used as a supervision value for training the mixed gaussian network.
In the implementation, in S302, first, according to the external parameters of the camera for collecting the image of the human body and the image coordinates of each pixel point, starting positions and projection directions of a plurality of three-dimensional projection rays in the three-dimensional space are generated; taking the intersection point of the current three-dimensional projection ray and the three-dimensional human body model as the end point of a sampling interval, and uniformly sampling the current three-dimensional projection ray in the sampling interval (for example, assuming that the length of the sampling interval is 2m and the set sampling point number is 1000), so as to obtain at least one sampling point; then using a geometric method to calculate the true readiness function values of all sampling points inside and outside the model (for example, the sampling points are inside the three-dimensional human body model, the function value is 1, the sampling points are outside the three-dimensional human body model, and the function value is 0), thereby obtaining the true readiness function of the current three-dimensional projection ray; and finally, fitting a mixed Gaussian function true value corresponding to the true indication function of each three-dimensional projection ray by using a probability distribution fitting method.
S303: inputting a plurality of human body images, corresponding true indication functions and true values of the Gaussian mixture functions into the Gaussian mixture network to be trained, and obtaining the trained Gaussian mixture network through at least one round of iteration.
Wherein, each iteration process is shown in fig. 4, and mainly comprises the following steps:
s3031: and determining predictive functions and predictive values of the Gaussian mixture functions corresponding to the three-dimensional projection rays of each human body image by adopting a Gaussian mixture network to be trained.
In S3031, after inputting a plurality of human body images to a gaussian mixture network to be trained, for each human body image, determining predictive function values of each sampling point inside and outside a model on a plurality of three-dimensional projection rays forming the human body image, and determining a predictive function of a corresponding three-dimensional projection ray according to the predictive function values of each sampling point, further, determining a gaussian mixture function predicted value of the corresponding three-dimensional projection ray according to the predictive function of each three-dimensional projection ray.
S3032: and determining a target loss value of the Gaussian mixture network according to each predictive indirection function and the Gaussian mixture function predictive value and the corresponding true indirection function and the Gaussian mixture function true value.
In S3032, the target loss value of the gaussian mixture network includes a negative log likelihood loss, a mean square error of the display function, and a mean square error of the gaussian mixture function. The determining process of the target loss value is shown in fig. 5, and mainly includes the following steps:
S3032_1: and determining the negative log likelihood loss according to the probability distribution of the mixed Gaussian function predicted value corresponding to each three-dimensional projection ray.
In S3032_1, each human body image corresponds to a plurality of three-dimensional projection rays, a probability distribution of a corresponding mixed gaussian function predicted value is determined for each three-dimensional projection ray, and a negative log likelihood loss is determined according to the probability distribution corresponding to the plurality of three-dimensional projection rays. Wherein the negative log likelihood loss is formulated as follows:
l log=-logfGMM (x) equation 1
S3032_2: and determining the mean square error of the oscillometric function according to the predicted oscillometric function and the real oscillometric function of each three-dimensional projection ray.
In S3032_2, the mean square error of the sexual function is calculated as follows:
L Occupancy=||Occupancyinfer(x)-OccupancyGT(x)||2 equation 2
Where L Occupancy represents the mean square error of the oscillometric function, occup infer (x) represents the predicted value of the oscillometric function for the sample point x on the three-dimensional projection line inside and outside the model (i.e., the value of the sample point x on the three-dimensional projection line on the corresponding predicted oscillometric function), and occup GT (x) represents the true value of the oscillometric function for the sample point x on the three-dimensional projection line inside and outside the model (i.e., the value of the sample point x on the three-dimensional projection line on the corresponding true oscillometric function).
S3032_3: and determining the mean square error of the Gaussian mixture function according to the predicted value of the Gaussian mixture function and the true value of the Gaussian mixture function corresponding to each three-dimensional projection ray.
In S3032_3, the calculation formula of the mean square error of the mixed gaussian function is as follows:
wherein L GMM denotes the mean square error of the mixture Gaussian function, Hybrid Gaussian function predicted value of three-dimensional projection rays inferred by hybrid Gaussian network to be trained is represented by/>And (5) representing the truth value of the Gaussian mixture function of the three-dimensional projection ray.
S3032_4: and determining a target loss value of the Gaussian mixture network according to the negative log-likelihood loss, the mean square error of the oscillography function and the mean square error of the Gaussian mixture function.
In S3032_3, the calculation formula of the mean square error of the mixed gaussian function is as follows:
Loss=l log+LOccupancy+LGMM equation 4
S3033: and adjusting parameters of the Gaussian mixture network to be trained according to the target loss value until the target loss value is within a preset range.
In S3033, the target loss value of each iteration is compared with a preset range, if the target loss value is within the preset range or reaches the upper limit of the iteration number, the adjustment of the mixed gaussian network parameter is stopped, and the network parameter with the minimum target loss value is used as the final parameter of the trained mixed gaussian network.
Optionally, in S3033, an Adam optimizer may be used to perform optimization adjustment on parameters of the gaussian mixture network.
The mixed Gaussian network in the embodiment of the application can be deployed on servers for holographic communication, including but not limited to a micro server, a cloud server and a server cluster, and can also be deployed on clients such as notebook computers, desktop computers, smart phones, tablets, VR glasses and AR glasses with an interaction function. Wherein the server and the client are collectively referred to as a reconstruction device.
Based on the trained mixed gaussian network, fig. 6 illustrates a flowchart of a single image three-dimensional reconstruction method based on the mixed gaussian network, where the flowchart is executed by a reconstruction device and mainly includes the following steps:
s601: a single color image is acquired that contains the whole body of the human body.
In an alternative embodiment, since the front detail features of the human body are rich, the camera is placed in front of the human body in S601 to take a single color image containing the whole body of the human body.
S602: and extracting the two-dimensional target characteristics of each pixel point from a single color image by adopting a pre-trained Gaussian mixture network, and calculating the Gaussian mixture function value of each pixel point on a corresponding three-dimensional projection ray according to the two-dimensional target characteristics of each pixel point.
From the foregoing embodiments, it can be seen that the gaussian mixture network is constructed from a convolutional neural network and a fully connected neural network, and features of different layers can be extracted from a single Zhang Caise image through the two neural networks. The specific feature extraction process, see fig. 7, mainly comprises the following steps:
S6021: and extracting feature graphs of different scales corresponding to the color images based on the convolutional neural network part in the mixed Gaussian network.
In an alternative embodiment, in S6021, the convolutional neural network part sets 3 scales, and the characteristic map of 3 scales is obtained by performing convolution processing on a single color image.
It should be noted that, in the embodiment of the present application, the number of different scales is not limited, for example, 5 scales may be set.
S6022: and fusing the feature images with different scales to obtain fusion feature vectors corresponding to each pixel point.
In S6022, because the feature images of different scales contain different amounts of human body information, the feature images of small scales (with low resolution) contain abundant deep semantic information of human body, and the feature images of large scales (with high resolution) contain stronger human body geometric information, fusion feature vectors corresponding to each pixel point in a single color image are obtained after the feature images of different scales are fused, wherein the fusion feature vectors contain abundant human body detail information (such as human body geometric information, deep semantic information and the like).
S6023: and extracting the mapping position of the image coordinates of each pixel point in the frequency domain based on the fully connected neural network part in the mixed Gaussian network.
In S6023, the fully-connected neural network part includes an input layer, a hidden layer, and an output layer. After the input layer obtains a single color image, frequency domain mapping is carried out on each pixel point, and the mapping position of the image coordinate of each pixel point in the frequency domain is extracted, wherein the frequency domain mapping formula is as follows:
Z pos=γ(p)=(sin(20πp),cos(20πp),...,sin(2L-1πp),cos(2L-1 pi p)) equation 5
Where p is the two-dimensional image coordinates of each pixel point, p= (u, v), u, v represent the coordinates in the length (lateral resolution) and width (longitudinal resolution) directions of the color image, respectively, and L is the coding dimension of the mapping position of the frequency domain. Alternatively, l=32.
S6024: and splicing the mapping position of each pixel point and the fusion feature vector of the corresponding pixel point to obtain the two-dimensional target feature.
In S6024, the input layer of the fully-connected neural network part includes the fusion feature vector Z Img corresponding to each pixel, in addition to the mapping position of the image coordinate of each pixel in the frequency domain. And splicing the mapping position of each pixel point and the fusion feature vector of the corresponding pixel point through the hidden layer of the fully-connected network part to obtain a spliced two-dimensional target feature Z MLP=cat(ZImg,Zpos).
S6025: and calculating the Gaussian mixture function value of each pixel point on the corresponding three-dimensional projection ray according to the two-dimensional target characteristics of each pixel point.
In S6025, after the target two-dimensional feature is obtained, the mixture gaussian function value of each pixel point is regressed through the output layer of the fully connected network part, and the regression formula is expressed as follows:
p GMM=fMLP(ZImg,Zpos) equation 6
The specific determination process of the mixture gaussian function value is shown in fig. 8, and mainly comprises the following steps:
S6025_1: based on a fully connected neural network part in the mixed Gaussian network, gaussian processing is carried out on the two-dimensional target characteristics of each pixel point, and the mean value and the variance of two Gaussian functions are respectively obtained.
The output layer of the fully connected network part comprises two gaussian functions, and in S6025_1, the two-dimensional target feature of each pixel point is subjected to gaussian processing through the output layer of the fully connected neural network part, so that the mean value and the variance of the two gaussian functions are respectively obtained.
S6025_2: and determining the target parameters of the mixed Gaussian function according to the mean and the variance of the two Gaussian functions.
In s6025_2, the target parameters of the mixture gaussian function are determined according to the mean and variance of the two gaussian functions, and the formula is as follows:
P GMM={μ1122, ω, t } equation 7
Wherein mu 1、σ1 is the mean and variance of the first Gaussian function, mu 2、σ2 is the mean and variance of the second Gaussian function, omega is the mixing weight of the mixed Gaussian function, and t is the truncation parameter of the display function.
S6025_3: and calculating the Gaussian mixture function value of each pixel point on the corresponding three-dimensional projection ray according to the target parameters of the Gaussian mixture function.
In S6025_3, after knowing the target parameters of the gaussian mixture function, the gaussian mixture function value of each pixel point on the corresponding three-dimensional projection line can be calculated, where the calculation formula is as follows:
S603: and obtaining a continuous occupation value function on each three-dimensional projection ray according to the Gaussian mixture function value corresponding to each pixel point, wherein each occupation value is used for representing whether the corresponding pixel point is positioned inside or outside the human body model.
In the embodiment of the present application, the mixture gaussian function value of each pixel is used as a continuous analytic expression of the indication function of the pixel inside and outside the human body model, and in the analytic expression process, an occupation value (Occupancy) can be used to exemplarily represent whether the corresponding pixel is located inside or outside the human body model, as shown in fig. 9, when the pixel is located inside the human body model, the occupation value is 1, and when the pixel is located outside the human body model, the occupation value is 0.
The conventional human body three-dimensional reconstruction method based on a single image needs to perform dense sampling on the occupied value in the three-dimensional space, as shown in fig. 10, circles represent sampling points, and thus the efficiency of human body three-dimensional reconstruction is seriously reduced.
From the perspective of projection of a single image, in the direction of the three-dimensional projection ray corresponding to each pixel point, the change of the occupation value is piecewise discrete, as shown in fig. 11, if a dense sampling method is adopted to estimate the occupation value function (i.e. the indication function) of each three-dimensional projection ray, performance waste is caused, so that the traditional occupation value estimation method based on the alignment of the pixel points with the two-dimensional target features still has limitations in the aspect of three-dimensional human model representation.
In order to solve the above problem, in S603, the mixture gaussian function value corresponding to each pixel is converted into an example function value (i.e. occupation value) representing the pixel inside and outside the mannequin, so as to obtain a continuous occupation value function on each three-dimensional projection ray, where the conversion relationship is as follows:
Referring to fig. 12, a comparison chart of continuous change of occupancy values expressed by using a mixed gaussian function and a traditional dense sampling manner is provided for an embodiment of the present application, as shown in fig. 12, the change of occupancy values of pixel points on corresponding three-dimensional projection rays is directly described by using the mixed gaussian function, and a small number of parameters (such as a mean value and a variance) can be directly used for describing the continuous change of occupancy values of three-dimensional projection ray directions (z directions), thereby improving reconstruction and reasoning efficiency.
S604: and performing tri-linear sampling on each continuous occupation value function in a three-dimensional space to generate a uniform discrete indication function.
In S604, the uniform discrete oscillometric function is used to characterize the three-dimensional geometric surface of the human body, and the process of the uniform discrete oscillometric function is shown in fig. 13, and mainly includes the following steps:
S6041: image coordinates of the discrete and uniform sampling points projected on the color image are obtained.
Since the color image is generated by the perspective projection relationship, when performing the tri-linear interpolation, it is necessary to project the discrete and uniform sampling points onto the color image, and obtain the image (pixel) coordinates of the sampling points corresponding to the color image.
S6042: for each sampling point, determining at least one adjacent three-dimensional projection ray according to the image coordinates corresponding to the sampling point, and acquiring an intersection point of the sampling point and the at least one three-dimensional projection ray.
In S6042, for each sampling point, at least one projection ray adjacent to the sampling point (i.e., the distance from the sampling point to each three-dimensional projection ray is smaller than a set threshold) is determined according to the image coordinates corresponding to the sampling point, and a perpendicular is drawn to the determined at least one projection ray through the sampling point, so as to obtain an intersection point of the sampling point and the at least one three-dimensional projection ray.
S6043: and obtaining the occupation value of at least one intersection point from the continuous occupation value function, and performing tri-linear interpolation on the occupation value of at least one intersection point to obtain the discrete occupation value of one sampling point.
In S6043, the continuous occupation value function on each three-dimensional projection ray is known, and for each sampling point, from the continuous occupation value function on at least one three-dimensional projection ray adjacent to the sampling point, the occupation value of the intersection point of the sampling point and at least one three-dimensional projection ray can be obtained, and the discrete occupation value of the sampling point is obtained by performing tri-linear interpolation on the occupation value of at least one intersection point.
S6044: and generating a uniform discrete indication function representing the geometrical surface of the human body according to the discrete occupation value of each sampling point.
S605: and extracting the geometrical surface of the human body from the uniform discrete oscillography function to obtain the three-dimensional human body model.
An alternative embodiment is to extract the body geometry surface from the uniform discrete display function using a Marching-Cubes algorithm in S605 to obtain a three-dimensional body model.
In some embodiments, in order to improve the accuracy of the three-dimensional human body model, as shown in fig. 14, after obtaining a human body parameter map described by the gaussian mixture function value of each pixel point, a 2D image optimization method based on a generation countermeasure Network (GAN) is used to perform refinement optimization on the human body parameter map represented by the gaussian mixture parameter obtained by preliminary reconstruction, so as to obtain a reconstruction result of a finer three-dimensional human body geometric surface.
According to the single-image three-dimensional reconstruction method based on the mixed Gaussian network, the two-dimensional target feature of each pixel point is extracted from a single Zhang Caise image containing the whole body of a human body through the pre-trained mixed Gaussian network, the mixed Gaussian function value corresponding to each pixel point is generated according to the two-dimensional target feature, and the continuous occupation value function of each human body pixel point on a corresponding three-dimensional projection ray is obtained according to the mixed Gaussian parameter corresponding to each pixel point; further, three-linear sampling is carried out on the continuous occupation value function in a three-dimensional space, a uniform discrete oscillography function is generated, the geometric surface of a human body is extracted from the uniform discrete oscillography function, and a three-dimensional human body model is obtained, and as a mixed Gaussian network can extract more human body detail features from a single color image, the three-dimensional reconstruction efficiency is improved based on a single Zhang Caise image, and meanwhile, the precision of the human body three-dimensional model can be ensured; and the human body parameter map which is preliminarily reconstructed by using the Gaussian mixture function is optimized by using a GAN network, so that the reconstruction accuracy of the human body three-dimensional model is further improved.
Based on the same technical concept, the embodiment of the application provides a reconstruction device, which can be a notebook computer, a desktop computer, a smart phone, a tablet, VR glasses, AR glasses and other clients with an interaction function, and can also be a server for realizing an interaction process, including but not limited to a micro server, a cloud server, a server cluster and the like, and can realize the steps of the single-image three-dimensional reconstruction method based on the hybrid gaussian network in the above embodiment, and can achieve the same technical effect.
Referring to fig. 15, the reconstruction device comprises a processor 1501, a memory 1502, a display screen 1503 and a communication interface 1504, said display screen 1503, said memory 1502 and said processor 1501 being connected by a bus 1505;
The memory 1502 includes a data storage unit and a program storage unit, the program storage unit storing a computer program, and the processor 1501 performs the following operations according to the computer program:
acquiring a single Zhang Caise image containing the whole body of the human body acquired by a camera through the communication interface 1504 and storing the single Zhang Caise image in the data storage unit;
Extracting the two-dimensional target characteristics of each pixel point from the color image by adopting a pre-trained Gaussian mixture network, and calculating the Gaussian mixture function value of each pixel point on a corresponding three-dimensional projection ray according to the two-dimensional target characteristics of each pixel point;
According to the Gaussian mixture function value corresponding to each pixel point, a continuous occupation value function on each three-dimensional projection ray is obtained, and each occupation value is used for representing whether the corresponding pixel point is positioned inside or outside the human body model;
Performing tri-linear sampling on each continuous occupation value function in a three-dimensional space to generate a uniform discrete display function representing the geometrical surface of a human body;
And extracting the geometric surface of the human body from the uniform discrete display function, obtaining a three-dimensional human body model, and displaying the three-dimensional human body model through the display screen 1503.
Optionally, the gaussian mixture network is built by a convolutional neural network and a fully-connected neural network, and the processor 1501 adopts a trained gaussian mixture network to extract the two-dimensional target feature of each pixel point from the color image, which specifically comprises the following steps:
Based on a convolutional neural network part in the Gaussian mixture network, extracting feature images with different scales corresponding to the color images, and fusing the feature images with different scales to obtain fused feature vectors corresponding to each pixel point;
and extracting the mapping position of the image coordinates of each pixel point in the frequency domain based on the fully connected neural network part in the Gaussian mixture network, and splicing the mapping position of each pixel point with the fusion feature vector of the corresponding pixel point to obtain the two-dimensional target feature.
Optionally, the processor 1501 calculates, according to the two-dimensional target feature of each pixel, a gaussian mixture function value of each pixel on a corresponding three-dimensional projection line, where the specific operations are as follows:
based on a fully connected neural network part in the Gaussian mixture network, carrying out Gaussian processing on the two-dimensional target characteristics of each pixel point to respectively obtain the mean value and the variance of two Gaussian functions;
Determining target parameters of the Gaussian mixture function according to the mean value and the variance of the two Gaussian functions;
And calculating the Gaussian mixture function value of each pixel point on the corresponding three-dimensional projection ray according to the target parameters of the Gaussian mixture function.
Optionally, the processor 1501 performs three-linear sampling on each continuous occupancy value function in a three-dimensional space to generate a uniform discrete indication function, which specifically includes:
Acquiring image coordinates of the discrete and uniform sampling points projected on the color image;
For each sampling point, determining at least one adjacent three-dimensional projection ray according to the image coordinates corresponding to the sampling point, and acquiring an intersection point of the sampling point and the at least one three-dimensional projection ray;
acquiring an occupation value of at least one intersection point from the continuous occupation value function, and performing tri-linear interpolation on the occupation value of the at least one intersection point to obtain a discrete occupation value of the sampling point;
And generating a uniform discrete indication function according to the discrete occupation value of each sampling point.
Optionally, the processor 1501 trains the mixed gaussian network by:
acquiring a training data set, wherein the training data set comprises human body images obtained by rendering three-dimensional human body scanning data under different postures and clothes;
For each human body image, a sampling method based on pixel projection rays is adopted, real indication functions of sampling points on each three-dimensional projection ray inside and outside a model are calculated, and a mixed Gaussian function true value corresponding to the corresponding three-dimensional projection ray is fitted according to the real indication functions of the sampling points;
Inputting a plurality of human body images and true values of a corresponding true indication function and a corresponding mixed Gaussian function into a mixed Gaussian network to be trained, and obtaining the trained mixed Gaussian network through at least one round of iteration, wherein each iteration process executes the following operations:
determining predictive functions and predictive values of the Gaussian mixture functions corresponding to a plurality of three-dimensional projection rays of each human body image by adopting the Gaussian mixture network to be trained;
Determining a target loss value of the Gaussian mixture network according to each predictive function and the Gaussian mixture function predictive value and the corresponding true predictive function and Gaussian mixture function true value;
And adjusting the parameters of the Gaussian mixture network to be trained according to the target loss value until the target loss value is within a preset range.
Optionally, the processor 1501 determines the target loss value of the gaussian mixture network according to each predictive oscillography function and the gaussian mixture function predicted value, and the true oscillography function and the gaussian mixture function true value, which specifically includes:
determining negative log likelihood loss according to probability distribution of the mixed Gaussian function predicted value corresponding to each three-dimensional projection ray;
Determining the mean square error of the oscillography function according to the predictive oscillography function and the real oscillography function of each three-dimensional projection ray;
determining the mean square error of the Gaussian mixture function according to the predicted value of the Gaussian mixture function and the true value of the Gaussian mixture function corresponding to each three-dimensional projection ray;
And determining a target loss value of the Gaussian mixture network according to the negative log-likelihood loss, the mean square error of the display function and the mean square error of the Gaussian mixture function.
It should be noted that fig. 15 is only an example, and provides hardware necessary for implementing the steps of the single-image three-dimensional reconstruction method based on the mixed gaussian network provided by the embodiment of the present application by using a reconstruction device, which is not shown, and includes common devices of interaction devices such as a speaker, a microphone, a power supply, an audio processor, and the like.
The Processor referred to in fig. 15 of the embodiments of the present application may be a central processing unit (Central Processing Unit, CPU), a general purpose Processor, a graphics Processor (Graphics Processing Unit, GPU) a digital signal Processor (DIGITAL SIGNAL Processor, DSP), an Application-specific integrated Circuit (ASIC), a field programmable gate array (Field Programmable GATE ARRAY, FPGA) or other programmable logic device, transistor logic device, hardware component, or any combination thereof.
Referring to fig. 16, a functional block diagram of a reconstruction device according to an embodiment of the present application mainly includes an acquisition module 1601, a feature extraction module 1602, a function determination module 1603, and a model rendering module 1604, where:
An acquisition module 1601 for acquiring a single Zhang Caise image including the whole body of the human body;
The feature extraction module 1602 is configured to extract a two-dimensional target feature of each pixel from the color image by using a pre-trained gaussian mixture network, and calculate a gaussian mixture function value of each pixel on a corresponding three-dimensional projection ray according to the two-dimensional target feature of each pixel;
The function determining module 1603 is configured to obtain a continuous occupation value function on each three-dimensional projection ray according to the mixture gaussian function value corresponding to each pixel point, where each occupation value is used to represent whether the corresponding pixel point is located inside or outside the human body model; performing tri-linear sampling on each continuous occupation value function in a three-dimensional space to generate a uniform discrete indication function representing the geometrical surface of the human body;
the model rendering module 1604 is configured to extract a geometric surface of a human body from the uniform discrete display function, and obtain a three-dimensional human body model.
The specific implementation of each of the above functional modules refers to the foregoing embodiments, and will not be repeated here.
The embodiment of the application also provides a computer readable storage medium for storing instructions which, when executed, can complete the single image three-dimensional reconstruction method based on the hybrid Gaussian network in the previous embodiment.
The embodiment of the application also provides a computer program product for storing a computer program for executing the single image three-dimensional reconstruction method based on the mixed Gaussian network in the previous embodiment.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (6)

1. A single image three-dimensional reconstruction method based on a mixed Gaussian network is characterized by comprising the following steps:
acquiring a single Zhang Caise image containing the whole body of a human body;
Extracting feature images with different scales corresponding to the color images by adopting a convolutional neural network part in a pre-trained Gaussian mixture network, and fusing the feature images with different scales to obtain fused feature vectors corresponding to each pixel point in the color images;
Extracting the mapping position of the image coordinates of each pixel point in the frequency domain by adopting a fully connected neural network part in a pre-trained Gaussian mixture network, and splicing the mapping position of each pixel point with the fusion feature vector to obtain the two-dimensional target feature of the corresponding pixel point;
Carrying out Gaussian processing on the two-dimensional target features of each pixel point by adopting the fully-connected neural network part, respectively obtaining the mean value and the variance of two Gaussian functions, determining the target parameters of a Gaussian mixture function according to the mean value and the variance of the two Gaussian functions, and calculating the Gaussian mixture function value of each pixel point on a corresponding three-dimensional projection ray according to the target parameters of the Gaussian mixture function;
According to the Gaussian mixture function value corresponding to each pixel point, a continuous occupation value function on each three-dimensional projection ray is obtained, and each occupation value is used for representing whether the corresponding pixel point is positioned inside or outside the human body model;
Performing tri-linear sampling on each continuous occupation value function in a three-dimensional space to generate a uniform discrete display function representing the geometrical surface of a human body;
And extracting the geometrical surface of the human body from the uniform discrete oscillography function to obtain the three-dimensional human body model.
2. The method of claim 1, wherein the tri-linear sampling of each successive occupancy value function in three-dimensional space to generate a uniform discrete display function comprises:
Acquiring image coordinates of the discrete and uniform sampling points projected on the color image;
For each sampling point, determining at least one adjacent three-dimensional projection ray according to the image coordinates corresponding to the sampling point, and acquiring an intersection point of the sampling point and the at least one three-dimensional projection ray;
acquiring an occupation value of at least one intersection point from the continuous occupation value function, and performing tri-linear interpolation on the occupation value of the at least one intersection point to obtain a discrete occupation value of the sampling point;
And generating a uniform discrete indication function according to the discrete occupation value of each sampling point.
3. The method of claim 1 or 2, wherein the gaussian mixture network is trained by:
acquiring a training data set, wherein the training data set comprises human body images obtained by rendering three-dimensional human body scanning data under different postures and clothes;
For each human body image, a sampling method based on pixel projection rays is adopted, real indication functions of sampling points on each three-dimensional projection ray inside and outside a model are calculated, and a mixed Gaussian function true value corresponding to the corresponding three-dimensional projection ray is fitted according to the real indication functions of the sampling points;
Inputting a plurality of human body images and true values of a corresponding true indication function and a corresponding mixed Gaussian function into a mixed Gaussian network to be trained, and obtaining the trained mixed Gaussian network through at least one round of iteration, wherein each iteration process executes the following operations:
determining predictive functions and predictive values of the Gaussian mixture functions corresponding to a plurality of three-dimensional projection rays of each human body image by adopting the Gaussian mixture network to be trained;
Determining a target loss value of the Gaussian mixture network according to each predictive function and the Gaussian mixture function predictive value and the corresponding true predictive function and Gaussian mixture function true value;
And adjusting the parameters of the Gaussian mixture network to be trained according to the target loss value until the target loss value is within a preset range.
4. The method of claim 3, wherein said determining the target loss value for the gaussian mixture network based on each predictive pilot function and gaussian mixture function predicted value, and the corresponding true pilot function and gaussian mixture function true value, comprises:
determining negative log likelihood loss according to probability distribution of the mixed Gaussian function predicted value corresponding to each three-dimensional projection ray;
Determining the mean square error of the oscillography function according to the predictive oscillography function and the real oscillography function of each three-dimensional projection ray;
determining the mean square error of the Gaussian mixture function according to the predicted value of the Gaussian mixture function and the true value of the Gaussian mixture function corresponding to each three-dimensional projection ray;
And determining a target loss value of the Gaussian mixture network according to the negative log-likelihood loss, the mean square error of the display function and the mean square error of the Gaussian mixture function.
5. The reconstruction device is characterized by comprising a processor, a memory, a display screen and a communication interface, wherein the communication interface, the display screen, the memory and the processor are connected through a bus;
The memory includes a data storage unit and a program storage unit, the program storage unit stores a computer program, and the processor performs the following operations according to the computer program:
acquiring a single Zhang Caise image containing the whole body of the human body acquired by a camera through the communication interface, and storing the single Zhang Caise image in the data storage unit;
Extracting feature images with different scales corresponding to the color images by adopting a convolutional neural network part in a pre-trained Gaussian mixture network, and fusing the feature images with different scales to obtain fused feature vectors corresponding to each pixel point in the color images;
Extracting the mapping position of the image coordinates of each pixel point in the frequency domain by adopting a fully connected neural network part in a pre-trained Gaussian mixture network, and splicing the mapping position of each pixel point with the fusion feature vector to obtain the two-dimensional target feature of the corresponding pixel point;
Carrying out Gaussian processing on the two-dimensional target features of each pixel point by adopting the fully-connected neural network part, respectively obtaining the mean value and the variance of two Gaussian functions, determining the target parameters of a Gaussian mixture function according to the mean value and the variance of the two Gaussian functions, and calculating the Gaussian mixture function value of each pixel point on a corresponding three-dimensional projection ray according to the target parameters of the Gaussian mixture function;
According to the Gaussian mixture function value corresponding to each pixel point, a continuous occupation value function on each three-dimensional projection ray is obtained, and each occupation value is used for representing whether the corresponding pixel point is positioned inside or outside the human body model;
Performing tri-linear sampling on each continuous occupation value function in a three-dimensional space to generate a uniform discrete display function representing the geometrical surface of a human body;
and extracting the geometric surface of the human body from the uniform discrete oscillography function to obtain a three-dimensional human body model, and displaying the three-dimensional human body model through the display screen.
6. The reconstruction device of claim 5 wherein the processor tri-linearly samples the continuous occupancy value function in three-dimensional space to generate a uniform discrete display function, in particular by:
Acquiring image coordinates of the discrete and uniform sampling points projected on the color image;
For each sampling point, determining at least one adjacent three-dimensional projection ray according to the image coordinates corresponding to the sampling point, and acquiring an intersection point of the sampling point and the at least one three-dimensional projection ray;
acquiring an occupation value of at least one intersection point from the continuous occupation value function, and performing tri-linear interpolation on the occupation value of the at least one intersection point to obtain a discrete occupation value of the sampling point;
And generating a uniform discrete indication function according to the discrete occupation value of each sampling point.
CN202210792028.8A 2022-07-05 2022-07-05 Single image three-dimensional reconstruction method and device based on mixed Gaussian network Active CN115082636B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210792028.8A CN115082636B (en) 2022-07-05 2022-07-05 Single image three-dimensional reconstruction method and device based on mixed Gaussian network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210792028.8A CN115082636B (en) 2022-07-05 2022-07-05 Single image three-dimensional reconstruction method and device based on mixed Gaussian network

Publications (2)

Publication Number Publication Date
CN115082636A CN115082636A (en) 2022-09-20
CN115082636B true CN115082636B (en) 2024-05-17

Family

ID=83257445

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210792028.8A Active CN115082636B (en) 2022-07-05 2022-07-05 Single image three-dimensional reconstruction method and device based on mixed Gaussian network

Country Status (1)

Country Link
CN (1) CN115082636B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117893696B (en) * 2024-03-15 2024-05-28 之江实验室 Three-dimensional human body data generation method and device, storage medium and electronic equipment

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108665537A (en) * 2018-05-15 2018-10-16 清华大学 The three-dimensional rebuilding method and system of combined optimization human body figure and display model
CN111340944A (en) * 2020-02-26 2020-06-26 清华大学 Single-image human body three-dimensional reconstruction method based on implicit function and human body template
CN112330795A (en) * 2020-10-10 2021-02-05 清华大学 Human body three-dimensional reconstruction method and system based on single RGBD image
CN113506335A (en) * 2021-06-01 2021-10-15 清华大学 Real-time human body holographic reconstruction method and device based on multiple RGBD cameras

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10825243B1 (en) * 2019-08-15 2020-11-03 Autodesk, Inc. Three-dimensional (3D) model creation and incremental model refinement from laser scans

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108665537A (en) * 2018-05-15 2018-10-16 清华大学 The three-dimensional rebuilding method and system of combined optimization human body figure and display model
CN111340944A (en) * 2020-02-26 2020-06-26 清华大学 Single-image human body three-dimensional reconstruction method based on implicit function and human body template
CN112330795A (en) * 2020-10-10 2021-02-05 清华大学 Human body three-dimensional reconstruction method and system based on single RGBD image
CN113506335A (en) * 2021-06-01 2021-10-15 清华大学 Real-time human body holographic reconstruction method and device based on multiple RGBD cameras

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
DeepHuman: 3D Human Reconstruction from a Single Image;Zerong Zheng等;《arXiv:1903.06473v2 [cs.CV]》;20190328;全文 *
Learning Occupancy Function from Point Clouds for Surface Reconstruction;Meng Jia等;《arXiv:2010.11378v1 [cs.CV]》;20201022;全文 *

Also Published As

Publication number Publication date
CN115082636A (en) 2022-09-20

Similar Documents

Publication Publication Date Title
CN109003325B (en) Three-dimensional reconstruction method, medium, device and computing equipment
CN109754464B (en) Method and apparatus for generating information
CN112991358A (en) Method for generating style image, method, device, equipment and medium for training model
CN116051740A (en) Outdoor unbounded scene three-dimensional reconstruction method and system based on nerve radiation field
US20240046557A1 (en) Method, device, and non-transitory computer-readable storage medium for reconstructing a three-dimensional model
CN109191554A (en) A kind of super resolution image reconstruction method, device, terminal and storage medium
EP4036863A1 (en) Human body model reconstruction method and reconstruction system, and storage medium
WO2023040609A1 (en) Three-dimensional model stylization method and apparatus, and electronic device and storage medium
CN112927362A (en) Map reconstruction method and device, computer readable medium and electronic device
CN113327318B (en) Image display method, image display device, electronic equipment and computer readable medium
JP2024004444A (en) Three-dimensional face reconstruction model training, three-dimensional face image generation method, and device
CN115082540B (en) Multi-view depth estimation method and device suitable for unmanned aerial vehicle platform
CN113129352A (en) Sparse light field reconstruction method and device
CN111243085B (en) Training method and device for image reconstruction network model and electronic equipment
CN112734910A (en) Real-time human face three-dimensional image reconstruction method and device based on RGB single image and electronic equipment
CN115082636B (en) Single image three-dimensional reconstruction method and device based on mixed Gaussian network
CN115115805A (en) Training method, device and equipment for three-dimensional reconstruction model and storage medium
CN110827341A (en) Picture depth estimation method and device and storage medium
CN115222917A (en) Training method, device and equipment for three-dimensional reconstruction model and storage medium
CN114758070A (en) Single-image three-dimensional human body fine reconstruction method based on cross-domain multitask
CN117095132B (en) Three-dimensional reconstruction method and system based on implicit function
CN112927348B (en) High-resolution human body three-dimensional reconstruction method based on multi-viewpoint RGBD camera
CN116385667B (en) Reconstruction method of three-dimensional model, training method and device of texture reconstruction model
CN116863078A (en) Three-dimensional human body model reconstruction method, three-dimensional human body model reconstruction device, electronic equipment and readable medium
CN115272608A (en) Human hand reconstruction method and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant