CN115082636A

CN115082636A - Single image three-dimensional reconstruction method and equipment based on hybrid Gaussian network

Info

Publication number: CN115082636A
Application number: CN202210792028.8A
Authority: CN
Inventors: 吴连朋; 于涛; 刘烨斌; 许瀚誉; 朱家林; 王宝云; 于芝涛
Original assignee: Tsinghua University; Hisense Visual Technology Co Ltd; Juhaokan Technology Co Ltd
Current assignee: Tsinghua University; Hisense Visual Technology Co Ltd; Juhaokan Technology Co Ltd
Priority date: 2022-07-05
Filing date: 2022-07-05
Publication date: 2022-09-20
Anticipated expiration: 2042-07-05
Also published as: CN115082636B

Abstract

The application relates to the technical field of three-dimensional reconstruction, and provides a single-image three-dimensional reconstruction method and equipment based on a mixed Gaussian network, wherein a two-dimensional target feature of each pixel point is extracted from a single color image containing the whole body of a human body through the trained mixed Gaussian network, a continuous occupation value function of each pixel point of the human body on a corresponding three-dimensional projection ray is determined according to the two-dimensional target feature, and the occupation value change of the pixel point on the corresponding projection ray is directly described by using the Gaussian mixed function, so that a continuous change form of the occupation value in the ray direction can be described by using a small amount of parameters, and the three-dimensional reconstruction efficiency is improved; the three-dimensional human body model is generated by the uniform discrete indicative function generated by trilinear sampling of the continuous occupancy value function, and the hybrid Gaussian network can extract more human body detail characteristics from a single color image, so that the three-dimensional reconstruction efficiency is improved based on the single color image, and the precision of the human body three-dimensional model can be ensured.

Description

Single image three-dimensional reconstruction method and equipment based on hybrid Gaussian network

Technical Field

The application relates to the technical field of three-dimensional reconstruction, and provides a single-image three-dimensional reconstruction method and equipment based on a hybrid Gaussian network.

Background

In computer vision, three-dimensional reconstruction refers to a process of reconstructing three-dimensional information according to a single-view or multi-view image, and is widely applied to scenes such as Virtual fitting, 3D games, autonomous navigation and the like, in particular to holographic communication scenes based on Virtual Reality (VR) and Augmented Reality (AR), and immersive interactive experience is brought to a user through a reconstructed human three-dimensional model.

As the multi-view-based human body three-dimensional reconstruction needs to calibrate the multi-view cameras, strict limitation requirements are imposed on the layout of the multi-view cameras, the calibration process is complex, and the three-dimensional reconstruction efficiency is seriously influenced. In a holographic communication scene, the real-time requirement is high, so in order to improve the efficiency of three-dimensional reconstruction, the single-view-based human body three-dimensional reconstruction is a current research hotspot.

Disclosure of Invention

The application provides a single-image three-dimensional reconstruction method and equipment based on a hybrid Gaussian network, which are used for improving the efficiency and the precision of three-dimensional reconstruction.

In one aspect, the present application provides a single image three-dimensional reconstruction method based on a hybrid gaussian network, including:

acquiring a single color image containing the whole body of a human body;

extracting the two-dimensional target feature of each pixel point from the color image by adopting a pre-trained Gaussian mixture network, and calculating a Gaussian mixture function value of each pixel point on the corresponding three-dimensional projection ray according to the two-dimensional target feature of each pixel point;

obtaining a continuous occupation value function on each three-dimensional projection ray according to the mixed Gaussian function value corresponding to each pixel point, wherein each occupation value is used for representing whether the corresponding pixel point is positioned inside or outside the human body model;

carrying out trilinear sampling on each continuous occupation value function in a three-dimensional space to generate a uniform discrete indicative function representing the geometric surface of a human body;

and extracting the geometric surface of the human body from the uniform discrete indicative function to obtain the three-dimensional human body model.

In another aspect, the present application provides a reconstruction device, including a processor, a memory, a display screen, and a communication interface, where the communication interface, the display screen, the memory, and the processor are connected by a bus;

the memory includes a data storage unit and a program storage unit, the program storage unit stores a computer program, and the processor performs the following operations according to the computer program:

acquiring a single color image which is acquired by a camera and contains the whole body of the human body through the communication interface, and storing the single color image in the data storage unit;

and extracting the geometric surface of the human body from the uniform discrete indicative function to obtain a three-dimensional human body model, and displaying through the display screen.

In another aspect, an embodiment of the present application provides a computer-readable storage medium, where computer-executable instructions are stored, and the computer-executable instructions are configured to cause a computer device to execute the hybrid gaussian network-based single-image three-dimensional reconstruction method provided in the embodiment of the present application.

According to the single-image three-dimensional reconstruction method and the single-image three-dimensional reconstruction equipment based on the Gaussian mixture network, the two-dimensional target characteristics of each pixel point are extracted from a single color image containing the whole body of a human body through the pre-trained Gaussian mixture network, the Gaussian mixture function corresponding to each pixel point is generated according to the two-dimensional target characteristics, the continuous occupation value function of each pixel point of the human body on the corresponding three-dimensional projection ray is obtained according to the Gaussian mixture parameter corresponding to each pixel point, and the occupation value change of the pixel point on the corresponding projection ray is directly described by using the Gaussian mixture function, so that the continuous change form of the occupation value in the ray direction can be directly described by using a small amount of parameters, and the three-dimensional reconstruction efficiency is improved; further, the continuous occupancy value function is subjected to trilinear sampling in a three-dimensional space to generate a uniform discrete indicative function, the surface of the human body model is extracted from the uniform discrete indicative function to obtain the three-dimensional human body model, and the hybrid Gaussian network can extract more human detail features from a single color image, so that the three-dimensional reconstruction efficiency is improved based on the single color image, and the precision of the human body three-dimensional model can be ensured.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.

Fig. 1 is an overall architecture diagram of a single image three-dimensional reconstruction method based on a hybrid gaussian network according to an embodiment of the present application;

fig. 2 is a structural diagram of an HRNet network according to an embodiment of the present application;

fig. 3 is a flowchart of a method for training a gaussian mixture network according to an embodiment of the present disclosure;

fig. 4 is a flowchart of a method for each iteration training of a gaussian mixture network according to an embodiment of the present disclosure;

fig. 5 is a flowchart of a method for determining a target loss value of a gaussian mixture network according to an embodiment of the present disclosure;

fig. 6 is a flowchart of a single-image three-dimensional reconstruction method based on a hybrid gaussian network according to an embodiment of the present application;

FIG. 7 is a flowchart of a method for extracting features through a hybrid Gaussian network according to an embodiment of the present application;

fig. 8 is a flowchart of a method for calculating a gaussian function value of each pixel point on a three-dimensional projection ray according to an embodiment of the present application;

fig. 9 is a schematic diagram illustrating that pixel points are located inside and outside a human body model according to occupancy values provided in an embodiment of the present application;

FIG. 10 is a schematic diagram of dense sampling of occupancy values provided by an embodiment of the present application;

FIG. 11 is a schematic diagram illustrating a continuous occupancy value variation on each three-dimensional projection ray according to an embodiment of the present application;

FIG. 12 is a comparison graph of continuous variation of occupancy values expressed using a Gaussian mixture function and a dense sampling method according to an embodiment of the present application;

FIG. 13 is a flowchart of a method for generating a uniform discrete illustrative function according to an embodiment of the present application;

FIG. 14 is a schematic diagram of an optimization process for a Gaussian mixture function graph according to an embodiment of the present application;

fig. 15 is a hardware configuration diagram of a reconstruction device according to an embodiment of the present application;

fig. 16 is a functional block diagram of a reconstruction device according to an embodiment of the present application.

Detailed Description

With the continuous development of VR/AR holographic communication technology, how to directly reconstruct a corresponding human body three-dimensional model by using images becomes an emerging research direction.

As the multi-view-based human body three-dimensional reconstruction needs to calibrate the multi-view cameras, strict limitation requirements are imposed on the layout of the multi-view cameras, the calibration process is complicated, and the three-dimensional reconstruction efficiency is seriously influenced. In a holographic communication scene, the real-time requirement is high, so that three-dimensional human body reconstruction based on a single view is mostly adopted.

At present, the popular single-view-based human body three-dimensional reconstruction method is mainly based on the three-dimensional reconstruction of a parameterized human body model. A commonly used parameterized human Model is SMPL (a Skinned Multi-Person Linear Model), which contains 72 parameters for describing the human body posture and for describing the human body conformation. When the parameterized human body model is reconstructed, the position of a two-dimensional joint is estimated according to a single image, and then the SMPL parameter is obtained by optimizing the minimum projection distance between a three-dimensional joint and the two-dimensional joint, so that the human body model is obtained. However, the parameterized model reconstructed by the method has limited representation capability on the geometric details of the human body surface, and the detailed texture of the clothes on the human body surface cannot be well reconstructed.

In view of this, the embodiment of the present application provides a single-image three-dimensional reconstruction method and device based on a hybrid gaussian network by using a strong learning capability of the neural network, wherein the hybrid gaussian network is constructed by a convolutional neural network and a fully-connected neural network, the training data is trained in advance, during the reconstruction process, a single image including a whole body of a human body is input into the trained hybrid gaussian network, a hybrid gaussian function of each pixel point in the single image is obtained, a continuous demonstrative function on a projection ray is generated based on pixel-by-pixel hybrid gaussian parameters, the continuous demonstrative function is subjected to trilinear sampling in a three-dimensional space, a uniform discrete representational function is generated, a human body three-dimensional model surface is extracted from the uniform discrete representational function, and a human body three-dimensional model is obtained. Because the Gaussian mixture function is used for directly describing the change of the occupancy value of the pixel point on the corresponding projection ray, a small amount of parameters can be directly used for describing the continuous change form of the occupancy value of the three-dimensional projection ray direction, and the reconstruction efficiency is improved; in addition, the Gaussian mixture network can extract more human detail features from the single image, so that the accuracy of the human three-dimensional model can be ensured while the three-dimensional reconstruction efficiency is improved based on the single image.

Referring to fig. 1, for the overall architecture diagram of the single-image three-dimensional reconstruction method based on the hybrid gaussian network provided in the embodiment of the present application, a single color image including the whole body of a human body acquired by a camera is input into the hybrid gaussian network, a hybrid gaussian function value of each pixel point is output through the hybrid gaussian network, and a three-dimensional human body model is generated according to a human body parameter diagram described by the hybrid gaussian function value of each pixel point.

In the embodiment of the present application, the hybrid gaussian Network is constructed by a Convolutional Neural Network (CNN) and a Fully Connected Neural Network (FCNN). Aiming at the coding requirement of a high-quality two-dimensional image in the field of human body three-dimensional reconstruction, when the coding requirement is specifically implemented, an HRNet network is adopted in a convolutional neural network part of a hybrid Gaussian network, the number of the network scales is 3, and the highest resolution scale comprises 8 convolutional residual error network modules; in the fully-connected neural network part of the hybrid Gaussian network, a 5-layer MLP (Multi layer Perception) network is adopted, and the characteristic dimensions of the middle layer are {256, 512, 256 and 256} respectively.

Referring to fig. 2, for the structure diagram of the HRNet network provided in the embodiment of the present application, in order to obtain a Feature Map (Feature Map) with a high resolution scale, the HRNet network adopts a mode of first reducing resolution (i.e., downsampling) and then increasing resolution (i.e., upsampling), the HRNet network connects Feature maps (Feature Map) with different resolutions in parallel, and interaction (fusion) between Feature maps with different resolutions is added on the basis of parallel connection, so that sufficient fusion of local features and global features is ensured in a convolution process.

In the embodiment of the application, before performing three-dimensional reconstruction, a gaussian mixture network needs to be trained, and the training process of the gaussian mixture network is shown in fig. 3, which mainly includes the following steps:

s301: a training data set is acquired.

In an optional implementation manner, in S301, a physical-based Rendering method (Physics-based Rendering) is used to render the high-quality three-dimensional human body scanning data, and a training data set is generated, so that the training data set includes human body images obtained by Rendering the three-dimensional human body scanning data under different postures and clothes. Specifically, for each human body object, the rendering viewpoint is rotated along the main direction of the three-dimensional human body model, and in the process of rotating for a circle (360 degrees), one high-quality human body image is rendered at intervals of 2 degrees, and 180 high-quality human body images are rendered in total. The rendering interval can be set according to actual requirements.

Another optional implementation is that in S301, human body images are collected from the existing human body data set and network resources to form a training data set.

S302: and aiming at each human body image, calculating a real indicative function of each sampling point on each three-dimensional projection ray inside and outside the model by adopting a pixel projection ray-based sampling method, and fitting a mixed Gaussian function true value corresponding to each three-dimensional projection ray according to the real indicative function.

In the embodiment of the application, in the imaging process, light rays in a three-dimensional space can uniformly irradiate a human body to form projection rays. Therefore, in S302, a plurality of sampling points are set on each three-dimensional projection ray, and for each human body image, a sampling method based on pixel projection rays is adopted to calculate a true representational function of each sampling point on each three-dimensional projection ray inside and outside the model, where the size of the sampling point exemplarily shows whether the sampling point is inside or outside the model (for example, when the sampling point is inside the model, the value is 1, and when the sampling point is outside the model, the value is 0).

In step S302, the starting positions and projection directions of a plurality of three-dimensional projection rays in a three-dimensional space are generated according to the camera external parameters for acquiring the human body image and the image coordinates of each pixel point; taking the intersection point of the current three-dimensional projection ray and the three-dimensional human body model as the end point of a sampling interval, and uniformly sampling the current three-dimensional projection ray in the sampling interval (for example, assuming that the length of the sampling interval is 2m and the number of the set sampling points is 1000) to obtain at least one sampling point; then, calculating the real illustrative function values of all sampling points inside and outside the model by using a geometric method (for example, the sampling points are inside the three-dimensional human body model, the function values are 1, the sampling points are outside the three-dimensional human body model, and the function values are 0), so as to obtain the real illustrative function of the current three-dimensional projection ray; and finally, fitting out a mixed Gaussian function true value corresponding to the real indicative function of each three-dimensional projection ray by using a probability distribution fitting method.

S303: inputting a plurality of human body images and corresponding real indicative functions and true values of the Gaussian mixture function into the Gaussian mixture network to be trained, and obtaining the trained Gaussian mixture network through at least one iteration.

Each iteration process is shown in fig. 4, and mainly includes the following steps:

s3031: and determining a predictive demonstrative function and a predictive value of the Gaussian mixture function corresponding to the three-dimensional projection rays of each human body image by adopting a Gaussian mixture network to be trained.

In S3031, after a plurality of human body images are input to the gaussian mixture network to be trained, for each human body image, the predictive characteristic function value inside and outside the model of each sampling point on the plurality of three-dimensional projection lines forming the human body image is determined, and the predictive characteristic function of the corresponding three-dimensional projection line is determined according to the predictive characteristic function value of each sampling point, and further, the predictive value of the gaussian mixture function of the corresponding three-dimensional projection line is determined according to the predictive characteristic function of each three-dimensional projection line.

S3032: and determining a target loss value of the Gaussian mixture network according to the predicted values of each predictive representational function and the Gaussian mixture function and the corresponding true values of the real representational function and the Gaussian mixture function.

In S3032, the target loss value of the gaussian mixture network comprises the negative log-likelihood loss, the mean square error of the indicative function and the mean square error of the gaussian mixture function. The process of determining the target loss value is shown in fig. 5, and mainly includes the following steps:

s3032_ 1: and determining the negative log likelihood loss according to the probability distribution of the mixed Gaussian function predicted value corresponding to each three-dimensional projection ray.

In S3032_1, each human body image corresponds to a plurality of three-dimensional projection rays, for each three-dimensional projection ray, the probability distribution of the corresponding gaussian mixture function prediction value is determined, and the negative log likelihood loss is determined according to the probability distribution corresponding to the plurality of three-dimensional projection rays. Wherein the negative log-likelihood loss is formulated as follows:

L _log ＝-logf _GMM (x) Equation 1

S3032_ 2: and determining the mean square error of the indicative function according to the predicted indicative function and the real indicative function of each three-dimensional projection ray.

In S3032_2, the mean square error of the indicative function is calculated as follows:

L _Occupancy ＝||Occupancy _infer (x)-Occupancy _GT (x)|| ² equation 2

Wherein L is _Occupancy Mean square error, Occupancy, representing an indicative function _infer (x) Indicative function prediction values (namely values of the sampling points x on the three-dimensional projection ray on the corresponding prediction indicative function) inside and outside the model, representing the sampling points x on the three-dimensional projection ray _GT (x) Showing the real value of the representative function of the sampling point x on the three-dimensional projection ray inside and outside the model (i.e. the value of the sampling point x on the three-dimensional projection ray on the corresponding real representative function).

S3032_ 3: and determining the mean square error of the Gaussian mixture according to the predicted value and the true value of the Gaussian mixture corresponding to each three-dimensional projection ray.

In S3032_3, the mean square error of the gaussian mixture function is calculated as follows:

wherein L is _GMM Represents the mean square error of the gaussian mixture function,

representing the mixed Gaussian function predicted value of the three-dimensional projection ray deduced by the mixed Gaussian network to be trained,

representing the true value of the gaussian mixture function of the three-dimensional projection ray.

S3032_ 4: and determining a target loss value of the Gaussian mixture network according to the negative log-likelihood loss, the mean square error of the indicative function and the mean square error of the Gaussian mixture function.

Loss＝L _log +L _Occupancy +L _GMM equation 4

S3033: and adjusting the parameters of the Gaussian mixture network to be trained according to the target loss value until the target loss value is within a preset range.

In S3033, the target loss value of each iteration is compared with a preset range, and if the target loss value is within the preset range or reaches the upper limit of the iteration times, the adjustment of the parameters of the hybrid gaussian network is stopped, and the network parameter with the minimum target loss value is used as the final parameter of the trained hybrid gaussian network.

Optionally, in S3033, an Adam optimizer may be used to optimally adjust parameters of the gaussian mixture network.

The gaussian mixture network in the above embodiments of the present application may be deployed on a server for holographic communication, including but not limited to a micro server, a cloud server, a server cluster, and may also be deployed on clients such as a notebook computer, a desktop computer, a smart phone, a tablet, VR glasses, and AR glasses having an interactive function. The server and the client are collectively referred to as a reconstruction device.

Based on the trained mixed gaussian network, fig. 6 exemplarily shows a flowchart of a single-image three-dimensional reconstruction method based on the mixed gaussian network provided in the embodiment of the present application, where the flowchart is executed by a reconstruction device and mainly includes the following steps:

s601: a single color image containing the entire body of the human body is acquired.

In an alternative embodiment, since the human body has abundant detail features on the front side, in S601, a camera is placed on the front side of the human body to capture a single color image including the whole body of the human body.

S602: and extracting the two-dimensional target characteristic of each pixel point from the single color image by adopting a pre-trained Gaussian mixture network, and calculating a Gaussian mixture function value of each pixel point on the corresponding three-dimensional projection ray according to the two-dimensional target characteristic of each pixel point.

According to the embodiment, the mixed Gaussian network is constructed by the convolutional neural network and the fully-connected neural network, and the features of different layers can be extracted from a single color image through the two neural networks. Referring to fig. 7, the specific feature extraction process mainly includes the following steps:

s6021: and extracting feature maps of different scales corresponding to the color image based on a convolutional neural network part in the mixed Gaussian network.

In an alternative embodiment, in S6021, the convolutional neural network part sets 3 scales, and obtains feature maps of 3 scales by performing convolution processing on a single color image.

It should be noted that, the number of different scales is not limited in the embodiment of the present application, and for example, 5 scales may also be set.

S6022: and fusing the feature maps with different scales to obtain a fused feature vector corresponding to each pixel point.

In S6022, because the feature maps of different scales have different amounts of human body information, the feature map of small scale (low resolution) includes rich deep semantic information of human body, and the feature map of large scale (high resolution) includes strong human body geometric information, a fused feature vector corresponding to each pixel point in a single color image is obtained after the feature maps of different scales are fused, wherein the fused feature vector includes rich human body detail information (such as human body geometric information and deep semantic information).

S6023: and extracting the mapping position of the image coordinate of each pixel point in the frequency domain based on the fully-connected neural network part in the mixed Gaussian network.

In S6023, the fully-connected neural network portion includes an input layer, a hidden layer, and an output layer. After the input layer obtains a single color image, performing frequency domain mapping on each pixel point, and extracting the mapping position of the image coordinate of each pixel point in the frequency domain, wherein the frequency domain mapping formula is as follows:

Z _pos ＝γ(p)＝(sin(2 ⁰ πp)，cos(2 ⁰ πp)，...，sin(2 ^L-1 πp)，cos(2 ^L-1 p)) formula 5

Where p is the two-dimensional image coordinate of each pixel, p is (u, v), u and v respectively represent the coordinates in the length (horizontal resolution) and width (vertical resolution) directions of the color image, and L is the encoding dimension of the mapping position in the frequency domain. Optionally, L ═ 32.

S6024: and splicing the mapping position of each pixel point and the fusion characteristic vector of the corresponding pixel point to obtain the two-dimensional target characteristic.

In S6024, the input layer of the fully connected neural network part comprises a fusion characteristic vector Z corresponding to each pixel point in addition to the mapping position of the image coordinate of each pixel point in the frequency domain _Img . Splicing the mapping position of each pixel point and the fusion characteristic vector of the corresponding pixel point through the hidden layer of the full-connection network part to obtain a spliced two-dimensional target characteristic Z _MLP ＝cat(Z _Img ，Z _pos )。

S6025: and calculating a Gaussian mixture function value of each pixel point on the corresponding three-dimensional projection ray according to the two-dimensional target characteristics of each pixel point.

In S6025, after the target two-dimensional features are obtained, a gaussian function value of each pixel point is regressed through the output layer of the fully connected network part, and the regression formula is expressed as follows:

P _GMM ＝f _MLP (Z _Img ，Z _pos ) Equation 6

Referring to fig. 8, a specific process for determining the gaussian mixture function value mainly includes the following steps:

s6025_ 1: based on a fully connected neural network part in the hybrid Gaussian network, Gaussian processing is carried out on the two-dimensional target characteristics of each pixel point, and the mean value and the variance of two Gaussian functions are obtained respectively.

The output layer of the fully-connected network part comprises two Gaussian functions, and in S6025_1, the two-dimensional target characteristics of each pixel point are subjected to Gaussian processing through the output layer of the fully-connected neural network part to respectively obtain the mean value and the variance of the two Gaussian functions.

S6025_ 2: and determining target parameters of the mixed Gaussian function according to the mean value and the variance of the two Gaussian functions.

In S6025_2, the target parameters of the mixture gaussian function are determined according to the mean and variance of the two gaussian functions, and the formula is as follows:

P _GMM ＝{μ ₁ ，σ ₁ ，μ ₂ ，σ ₂ ω, t equation 7

Wherein, mu ₁ 、σ ₁ Mean and variance, μ, of the first Gaussian function, respectively ₂ 、σ ₂ The mean and the variance of the second gaussian function, ω is the mixing weight of the mixed gaussian function, and t is the truncation parameter of the linear function.

S6025_ 3: and calculating the Gaussian mixture function value of each pixel point on the corresponding three-dimensional projection ray according to the target parameter of the Gaussian mixture function.

In S6025_3, after the target parameter of the gaussian mixture function is known, the gaussian mixture function value of each pixel point on the corresponding three-dimensional projection ray can be calculated, and the calculation formula is as follows:

s603: and obtaining a continuous occupation value function on each three-dimensional projection ray according to the mixed Gaussian function value corresponding to each pixel point, wherein each occupation value is used for representing whether the corresponding pixel point is positioned inside or outside the human body model.

In the embodiment of the present application, the gaussian mixture function value of each pixel is used as a continuous analytic expression of an indicative function of the pixel inside and outside the human body model, and in the analytic expression process, an available Occupancy value (Occupancy) is used to exemplarily represent whether the corresponding pixel is located inside the human body model or outside the human body model, as shown in fig. 9, when the pixel is located inside the human body model, the Occupancy value is 1, and when the pixel is located outside the human body model, the Occupancy value is 0.

The traditional human body three-dimensional reconstruction method based on a single image needs to densely sample the occupied value in the three-dimensional space, as shown in fig. 10, a circle represents a sampling point, which can seriously reduce the human body three-dimensional reconstruction efficiency.

From the perspective of projection of a single image, in the direction of the three-dimensional projection ray corresponding to each pixel point, the change of the occupancy value is piecewise discrete, as shown in fig. 11, if a dense sampling method is adopted to estimate the occupancy value function (i.e. an indicative function) of each three-dimensional projection ray, performance waste is caused, and therefore, the traditional occupancy value estimation method based on pixel point alignment two-dimensional target features still has limitations in the aspect of three-dimensional human body model representation.

In order to solve the above problem, in S603, the gaussian mixture function value corresponding to each pixel point is converted into an exemplary function value (i.e., an occupancy value) representing the pixel point inside and outside the human body model, so as to obtain a continuous occupancy value function on each three-dimensional projection ray, where the conversion relationship is as follows:

referring to fig. 12, for the comparison graph of continuous change of occupancy value expressed by using a mixture gaussian function and a conventional dense sampling mode provided in the embodiment of the present application, as shown in fig. 12, the mixture gaussian function is used to directly describe the change of occupancy value of a pixel point on a corresponding three-dimensional projection ray, and a continuous change form of the occupancy value in a three-dimensional projection ray direction (z direction) can be directly described by using a small amount of parameters (such as a mean value and a variance), so as to improve reconstruction and inference efficiency.

S604: and carrying out trilinear sampling on each continuous occupation value function in a three-dimensional space to generate a uniform discrete indicative function.

In S604, a uniform discrete illustrative function is used to characterize the three-dimensional geometric surface of the human body, and the process of the uniform discrete illustrative function is shown in fig. 13, which mainly includes the following steps:

s6041: and acquiring image coordinates of discrete uniform sampling points projected on the color image.

Since the color image is generated by perspective projection, when performing trilinear interpolation, it is necessary to project sampling points with uniform dispersion onto the color image to obtain the corresponding image (pixel) coordinates of the sampling points on the color image.

S6042: and aiming at each sampling point, determining at least one adjacent three-dimensional projection ray according to the image coordinate corresponding to the sampling point, and acquiring the intersection point of the sampling point and the at least one three-dimensional projection ray.

In S6042, for each sampling point, at least one projection ray adjacent to the sampling point (that is, the distance from the sampling point to each three-dimensional projection ray is less than a set threshold) is determined according to the image coordinate corresponding to the sampling point, and a perpendicular line is drawn to the determined at least one projection ray through the sampling point, so as to obtain an intersection point of the sampling point and the at least one three-dimensional projection ray.

S6043: and acquiring the occupation value of at least one intersection point from the continuous occupation value function, and performing trilinear interpolation on the occupation value of at least one intersection point to obtain a discrete occupation value of a sampling point.

In S6043, the continuous occupancy function on each three-dimensional projection ray is known, and for each sampling point, the occupancy value of the intersection point of the sampling point and at least one three-dimensional projection ray can be obtained from the continuous occupancy function on at least one three-dimensional projection ray adjacent to the sampling point, and the discrete occupancy value of the sampling point is obtained by performing trilinear interpolation on the occupancy value of at least one intersection point.

S6044: and generating a uniform discrete indicative function representing the geometric surface of the human body according to the discrete occupancy value of each sampling point.

S605: and extracting the geometric surface of the human body from the uniform discrete indicative function to obtain the three-dimensional human body model.

In an alternative embodiment, in S605, a three-dimensional human body model is obtained by extracting a human body geometric surface from a uniform discrete indicative function using a Marching-Cubes (Marching-Cubes) algorithm.

In some embodiments, in order to improve the precision of the three-dimensional human body model, as shown in fig. 14, after a human body parameter map described by a gaussian function value of each pixel point is obtained, a 2D image optimization method based on a Generative Adaptive Network (GAN) is used to refine and optimize the preliminarily reconstructed human body parameter map represented by the gaussian parameters, so as to obtain a more refined reconstruction result of the geometric surface of the three-dimensional human body.

According to the single-image three-dimensional reconstruction method based on the Gaussian mixture network, the two-dimensional target characteristics of each pixel point are extracted from a single color image comprising the whole body of a human body through the pre-trained Gaussian mixture network, the Gaussian mixture function corresponding to each pixel point is generated according to the two-dimensional target characteristics, the continuous occupation value function of each pixel point of the human body on the corresponding three-dimensional projection ray is obtained according to the Gaussian mixture parameter corresponding to each pixel point, and the occupation value change of the pixel point on the corresponding projection ray is directly described by using the Gaussian mixture function, so that the continuous change form of the occupation value in the ray direction can be directly described by using a small amount of parameters, and the geometric surface information of the human body is obtained, and the three-dimensional reconstruction efficiency is improved; further, trilinear sampling is carried out on the continuous occupancy value function in a three-dimensional space to generate a uniform discrete indicative function, the geometric surface of the human body is extracted from the uniform discrete indicative function to obtain a three-dimensional human body model, and the hybrid Gaussian network can extract more human body detail characteristics from a single color image, so that the three-dimensional reconstruction efficiency is improved based on the single color image, and meanwhile, the precision of the human body three-dimensional model can be ensured; and aiming at the human body parameter graph preliminarily reconstructed by utilizing the Gaussian mixture function, the GAN network is used for optimization, and the reconstruction precision of the human body three-dimensional model is further improved.

Based on the same technical concept, the embodiment of the present application provides a reconstruction device, which may be a client such as a notebook computer, a desktop computer, a smart phone, a tablet, VR glasses, and AR glasses having an interaction function, or a server for implementing an interaction process, including but not limited to a micro server, a cloud server, a server cluster, and the like, and the reconstruction device may implement the steps of the single image three-dimensional reconstruction method based on the hybrid gaussian network in the above embodiments, and may achieve the same technical effect.

Referring to fig. 15, the reconstruction device comprises a processor 1501, a memory 1502, a display screen 1503 and a communication interface 1504, the display screen 1503, the memory 1502 and the processor 1501 being connected by a bus 1505;

the memory 1502 includes a data storage unit and a program storage unit, the program storage unit storing a computer program, and the processor 1501 executes the following operations according to the computer program:

acquiring a single color image which is acquired by a camera and contains the whole body of the human body through the communication interface 1504, and storing the single color image in the data storage unit;

the human geometric surface is extracted from the uniform discrete illustrative function to obtain a three-dimensional human model, and the three-dimensional human model is displayed through the display screen 1503.

Optionally, the gaussian mixture network is constructed by a convolutional neural network and a fully connected neural network, the processor 1501 adopts a pre-trained gaussian mixture network, and extracts a two-dimensional target feature of each pixel point from the color image, and the specific operation is as follows:

extracting feature maps of different scales corresponding to the color image based on a convolutional neural network part in the Gaussian mixture network, and fusing the feature maps of different scales to obtain a fused feature vector corresponding to each pixel point;

and extracting the mapping position of the image coordinate of each pixel point in the frequency domain based on the fully-connected neural network part in the Gaussian mixture network, and splicing the mapping position of each pixel point and the fusion feature vector of the corresponding pixel point to obtain the two-dimensional target feature.

Optionally, the processor 1501 calculates a gaussian function value of each pixel point on the corresponding three-dimensional projection ray according to the two-dimensional target feature of each pixel point, and the specific operation is:

based on the fully connected neural network part in the Gaussian mixture network, performing Gaussian processing on the two-dimensional target feature of each pixel point to respectively obtain the mean value and the variance of two Gaussian functions;

determining target parameters of the mixed Gaussian function according to the mean value and the variance of the two Gaussian functions;

and calculating the Gaussian mixture function value of each pixel point on the corresponding three-dimensional projection ray according to the target parameter of the Gaussian mixture function.

Optionally, the processor 1501 performs trilinear sampling on each continuous occupancy value function in a three-dimensional space to generate a uniform discrete indicative function, and specifically performs the following operations:

acquiring image coordinates of discrete uniform sampling points projected on the color image;

aiming at each sampling point, determining at least one adjacent three-dimensional projection ray according to the image coordinate corresponding to the sampling point, and acquiring the intersection point of the sampling point and the at least one three-dimensional projection ray;

acquiring the occupation value of at least one intersection point from the continuous occupation value function, and carrying out trilinear interpolation on the occupation value of the at least one intersection point to obtain the discrete occupation value of the sampling point;

and generating a uniform discrete indicative function according to the discrete occupancy value of each sampling point.

Optionally, the processor 1501 trains the gaussian mixture network by:

acquiring a training data set, wherein the training data set comprises human body images obtained by rendering three-dimensional human body scanning data under different postures and clothes;

aiming at each human body image, calculating real indicative functions of each sampling point on each three-dimensional projection ray inside and outside the model by adopting a pixel projection ray-based sampling method, and fitting out a mixed Gaussian function true value corresponding to the corresponding three-dimensional projection ray according to the real indicative functions of each sampling point;

inputting a plurality of human body images and corresponding real indicative functions and true values of the Gaussian mixture function into a Gaussian mixture network to be trained, and obtaining the trained Gaussian mixture network through at least one iteration, wherein the following operations are executed in each iteration process:

determining a predictive demonstrative function and a predictive value of a Gaussian mixture function corresponding to a plurality of three-dimensional projection rays of each human body image by adopting the Gaussian mixture network to be trained;

determining a target loss value of the Gaussian mixture network according to the predicted value of each predictive demonstrative function and the Gaussian mixture function and the true value of the corresponding real demonstrative function and the Gaussian mixture function;

and adjusting the parameters of the Gaussian mixture network to be trained according to the target loss value until the target loss value is within a preset range.

Optionally, the processor 1501 determines the target loss value of the gaussian mixture network according to each predicted value of the predictive demonstrative function and the gaussian mixture function, and the corresponding true value of the real demonstrative function and the gaussian mixture function, and specifically operates as follows:

determining negative log likelihood loss according to the probability distribution of the mixed Gaussian function predicted value corresponding to each three-dimensional projection ray;

determining the mean square error of the indicative function according to the predicted indicative function and the real indicative function of each three-dimensional projection ray;

determining the mean square error of the Gaussian mixture according to the predicted value and the true value of the Gaussian mixture corresponding to each three-dimensional projection ray;

and determining a target loss value of the Gaussian mixture network according to the negative log likelihood loss, the mean square error of the indicative function and the mean square error of the Gaussian mixture function.

It should be noted that fig. 15 is only an example, and hardware necessary for implementing the steps of the hybrid gaussian network-based single-image three-dimensional reconstruction method provided in the embodiment of the present application is given, and not shown, the reconstruction apparatus further includes common components of interaction apparatuses such as a speaker, a microphone, a power supply, and an audio processor.

The Processor referred to in fig. 15 in this embodiment may be a Central Processing Unit (CPU), a general purpose Processor, a Graphics Processing Unit (GPU), a Digital Signal Processor (DSP), an Application-specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof.

Referring to fig. 16, a functional structure diagram of a reconstruction apparatus provided in the embodiment of the present application is a functional structure diagram of a reconstruction apparatus, where the reconstruction apparatus mainly includes an obtaining module 1601, a feature extraction module 1602, a function determination module 1603, and a model rendering module 1604, where:

an acquiring module 1601, configured to acquire a single color image including a whole body of a human body;

a feature extraction module 1602, configured to extract a two-dimensional target feature of each pixel from the color image by using a pre-trained gaussian mixture network, and calculate a gaussian mixture function value of each pixel on a corresponding three-dimensional projection ray according to the two-dimensional target feature of each pixel;

a function determining module 1603, configured to obtain a continuous occupancy value function on each three-dimensional projection ray according to the gaussian mixture function value corresponding to each pixel point, where each occupancy value is used to represent whether the corresponding pixel point is located inside or outside the human body model; carrying out trilinear sampling on each continuous occupancy value function in a three-dimensional space to generate a uniform discrete indicative function representing the geometric surface of the human body;

and a model rendering module 1604, configured to extract a geometric surface of the human body from the uniform discrete illustrative function to obtain a three-dimensional human body model.

The specific implementation of each functional module is referred to the foregoing embodiments, and will not be described repeatedly here.

The embodiment of the present application further provides a computer-readable storage medium for storing instructions, and when the instructions are executed, the method for three-dimensional reconstruction of a single image based on a hybrid gaussian network in the foregoing embodiments may be completed.

The embodiment of the present application further provides a computer program product for storing a computer program, where the computer program is used to execute the hybrid gaussian network-based single image three-dimensional reconstruction method in the foregoing embodiments.

As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A single image three-dimensional reconstruction method based on a mixed Gaussian network is characterized by comprising the following steps:

acquiring a single color image containing the whole body of the human body;

and extracting the geometric surface of the human body from the uniform discrete illustrative function to obtain the three-dimensional human body model.

2. The method of claim 1, wherein the Gaussian mixture network is constructed by a convolutional neural network and a fully connected neural network, and the extracting the two-dimensional target feature of each pixel point from the color image by using the pre-trained Gaussian mixture network comprises:

3. The method of claim 1, wherein said calculating a value of a gaussian function of each pixel point on a corresponding three-dimensional projection ray based on a two-dimensional target feature of each pixel point comprises:

4. The method of claim 1, wherein trilinear sampling of each continuous occupancy-value function in three-dimensional space to generate a uniform discrete indicative function comprises:

5. The method of any one of claims 1-4, wherein the Gaussian mixture network is trained by:

determining a predictive demonstrative function and a predictive value of the Gaussian mixture function corresponding to the three-dimensional projection rays of each human body image by adopting the Gaussian mixture network to be trained;

6. The method of claim 5, wherein determining a target loss value for the Gaussian mixture network based on each of the predicted indicative function and the predicted Gaussian mixture function values and corresponding true indicative function and true Gaussian mixture function values comprises:

determining the mean square error of the indicative function according to the prediction indicative function and the real indicative function of each three-dimensional projection ray;

7. A reconstruction device comprising a processor, a memory, a display screen and a communication interface, wherein said communication interface, said display screen, said memory and said processor are connected by a bus;

8. The reconstruction device according to claim 7, wherein the Gaussian mixture network is constructed by a convolutional neural network and a fully-connected neural network, and the processor extracts the two-dimensional target feature of each pixel point from the color image by using the pre-trained Gaussian mixture network, and specifically operates as follows:

9. The reconstruction apparatus according to claim 7, wherein the processor calculates a gaussian function value of each pixel point on the corresponding three-dimensional projection ray according to the two-dimensional object feature of each pixel point by:

determining target parameters of the Gaussian mixture function according to the mean value and the variance of the two Gaussian functions;

10. The reconstruction device of claim 7 wherein the processor trilinear samples the continuous occupancy function in three-dimensional space to generate a uniform discrete indicative function by: