CN115082636B

CN115082636B - Single image three-dimensional reconstruction method and device based on mixed Gaussian network

Info

Publication number: CN115082636B
Application number: CN202210792028.8A
Authority: CN
Inventors: 吴连朋; 于涛; 刘烨斌; 许瀚誉; 朱家林; 王宝云; 于芝涛
Original assignee: Tsinghua University; Hisense Visual Technology Co Ltd; Juhaokan Technology Co Ltd
Current assignee: Tsinghua University; Hisense Visual Technology Co Ltd; Juhaokan Technology Co Ltd
Priority date: 2022-07-05
Filing date: 2022-07-05
Publication date: 2024-05-17
Anticipated expiration: 2042-07-05
Also published as: CN115082636A

Abstract

The application relates to the technical field of three-dimensional reconstruction, and provides a single image three-dimensional reconstruction method and device based on a mixed Gaussian network, wherein a trained mixed Gaussian network is used for extracting two-dimensional target characteristics of each pixel point from a single Zhang Caise image containing the whole body of a human body, and determining a continuous occupation value function of each human body pixel point on a corresponding three-dimensional projection ray according to the two-dimensional target characteristics; the three-dimensional human body model is generated by carrying out uniform discrete indication function generated by three-linear sampling on the continuous occupation value function, and because the mixed Gaussian network can extract more human body detail features from a single color image, the three-dimensional reconstruction efficiency is improved based on a single Zhang Caise image, and meanwhile, the precision of the human body three-dimensional model can be ensured.

Description

Single image three-dimensional reconstruction method and device based on mixed Gaussian network

Technical Field

The application relates to the technical field of three-dimensional reconstruction, and provides a single-image three-dimensional reconstruction method and device based on a hybrid Gaussian network.

Background

In computer vision, three-dimensional reconstruction refers to a process of reconstructing three-dimensional information according to single-view or multi-view images, and is widely applied to Virtual fitting, 3D games, autonomous navigation and other scenes, particularly to a holographic communication scene based on Virtual Reality (VR)), augmented Reality (Augmented Reality, AR), and through a reconstructed human body three-dimensional model, immersive interactive experience is brought to a user.

Because the multi-view-based human body three-dimensional reconstruction needs to calibrate the multi-view cameras, the layout of the multi-view cameras has strict limit requirements, and the calibration process is complicated, so that the three-dimensional reconstruction efficiency is seriously affected. In the holographic communication scene, the real-time requirement is high, so that the three-dimensional reconstruction efficiency is improved, and the three-dimensional reconstruction of the human body based on the single view is a current research hot spot.

Disclosure of Invention

The application provides a single-image three-dimensional reconstruction method and device based on a mixed Gaussian network, which are used for improving the efficiency and the precision of three-dimensional reconstruction.

In one aspect, the application provides a single image three-dimensional reconstruction method based on a hybrid Gaussian network, comprising the following steps:

acquiring a single Zhang Caise image containing the whole body of a human body;

Extracting the two-dimensional target characteristics of each pixel point from the color image by adopting a pre-trained Gaussian mixture network, and calculating the Gaussian mixture function value of each pixel point on a corresponding three-dimensional projection ray according to the two-dimensional target characteristics of each pixel point;

According to the Gaussian mixture function value corresponding to each pixel point, a continuous occupation value function on each three-dimensional projection ray is obtained, and each occupation value is used for representing whether the corresponding pixel point is positioned inside or outside the human body model;

Performing tri-linear sampling on each continuous occupation value function in a three-dimensional space to generate a uniform discrete display function representing the geometrical surface of a human body;

And extracting the geometrical surface of the human body from the uniform discrete oscillography function to obtain the three-dimensional human body model.

In another aspect, the present application provides a reconstruction device, including a processor, a memory, a display screen, and a communication interface, where the communication interface, the display screen, the memory, and the processor are connected by a bus;

The memory includes a data storage unit and a program storage unit, the program storage unit stores a computer program, and the processor performs the following operations according to the computer program:

acquiring a single Zhang Caise image containing the whole body of the human body acquired by a camera through the communication interface, and storing the single Zhang Caise image in the data storage unit;

and extracting the geometric surface of the human body from the uniform discrete oscillography function to obtain a three-dimensional human body model, and displaying the three-dimensional human body model through the display screen.

In another aspect, an embodiment of the present application provides a computer readable storage medium, where computer executable instructions are stored, where the computer executable instructions are configured to cause a computer device to perform the single image three-dimensional reconstruction method based on a hybrid gaussian network provided by the embodiment of the present application.

According to the single image three-dimensional reconstruction method and the single image three-dimensional reconstruction device based on the mixed Gaussian network, the two-dimensional target feature of each pixel point is extracted from a single Zhang Caise image containing the whole human body through the pre-trained mixed Gaussian network, the mixed Gaussian function value corresponding to each pixel point is generated according to the two-dimensional target feature, and the continuous occupation value function of each human body pixel point on the corresponding three-dimensional projection ray is obtained according to the mixed Gaussian parameters corresponding to each pixel point; further, three-linear sampling is carried out on the continuous occupation value function in the three-dimensional space, a uniform discrete oscillography function is generated, the surface of the human body model is extracted from the uniform discrete oscillography function, and the three-dimensional human body model is obtained.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions of the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described, and it is obvious that the drawings in the description below are some embodiments of the application, and other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is an overall architecture diagram of a single image three-dimensional reconstruction method based on a hybrid gaussian network according to an embodiment of the present application;

fig. 2 is a block diagram of HRNet networks provided in an embodiment of the present application;

fig. 3 is a flowchart of a training method of a mixed gaussian network according to an embodiment of the present application;

FIG. 4 is a flowchart of a method for each round of iterative training of a Gaussian mixture network according to an embodiment of the application;

FIG. 5 is a flowchart of a method for determining a target loss value of a Gaussian mixture network according to an embodiment of the application;

Fig. 6 is a flowchart of a single image three-dimensional reconstruction method based on a hybrid gaussian network according to an embodiment of the present application;

FIG. 7 is a flowchart of a method for extracting features through a hybrid Gaussian network according to an embodiment of the application;

FIG. 8 is a flowchart of a method for calculating a Gaussian mixture function value of each pixel point on a three-dimensional projection ray according to an embodiment of the present application;

FIG. 9 is a schematic diagram showing pixel points located inside and outside a mannequin according to an embodiment of the present application;

FIG. 10 is a schematic diagram of densely sampling occupancy values according to an embodiment of the present application;

FIG. 11 is a schematic diagram showing a change of continuous occupancy values on each three-dimensional projection ray according to an embodiment of the present application;

FIG. 12 is a graph showing a comparison of continuous variation of occupancy values using a mixed Gaussian function and a densely sampled version provided by an embodiment of the application;

FIG. 13 is a flow chart of a method for generating a uniform discrete display function according to an embodiment of the present application;

FIG. 14 is a schematic diagram of an optimization process for a Gaussian mixture function graph according to an embodiment of the application;

FIG. 15 is a hardware configuration diagram of a reconstruction device according to an embodiment of the present application;

Fig. 16 is a functional block diagram of a reconstruction device according to an embodiment of the present application.

Detailed Description

With the continuous development of VR/AR holographic communication technology, how to directly reconstruct a corresponding three-dimensional model of a human body by using images becomes an emerging research direction.

Because the multi-view-based human body three-dimensional reconstruction needs to calibrate the multi-view cameras, the layout of the multi-view cameras has strict limit requirements, and the calibration process is complicated, so that the three-dimensional reconstruction efficiency is seriously affected. In the holographic communication scene, the real-time requirement is high, so that three-dimensional reconstruction of human bodies based on single view is mostly adopted.

Currently, popular single view-based three-dimensional reconstruction methods for human bodies are mainly based on three-dimensional reconstruction of parameterized human body models. A commonly used parameterized mannequin is SMPL (A Skinned Multi-Person Linear Model) which contains 72 parameters for describing body posture and describing body shape. When reconstructing based on the parameterized human body model, estimating the two-dimensional joint position according to a single image, and optimizing to obtain the SMPL parameters by the minimum projection distance between the three-dimensional joint and the two-dimensional joint, thereby obtaining the human body model. However, the parameterized model reconstructed by the method has limited expressive power on geometric details of the human surface, and detailed textures of clothes on the human surface cannot be reconstructed well.

In view of this, the embodiment of the application provides a single-image three-dimensional reconstruction method and device based on a mixed Gaussian network by utilizing stronger learning capability of the neural network, wherein the mixed Gaussian network is built by a convolutional neural network and a fully-connected neural network, training data are utilized to train the mixed Gaussian network in advance, in the reconstruction process, a single image containing the whole body of a human body is input into the trained mixed Gaussian network, a mixed Gaussian function of each pixel point in the single image is obtained, a continuous indication function on a projection ray is generated based on pixel-by-pixel mixed Gaussian parameters, the continuous indication function is subjected to three-line sampling in a three-dimensional space, a uniform discrete indication function is generated, and a human body three-dimensional model surface is extracted from the uniform discrete indication function, so that a human body three-dimensional model is obtained. Because the Gaussian mixture function is used for directly describing the change of the occupation value of the pixel point on the corresponding projection ray, a small amount of parameters can be directly used for describing the continuous change form of the occupation value of the three-dimensional projection ray direction, so that the reconstruction efficiency is improved; in addition, the Gaussian mixture network can extract more human body detail features from a single image, so that the accuracy of a human body three-dimensional model can be ensured while the three-dimensional reconstruction efficiency is improved based on the single image.

Referring to fig. 1, for the overall architecture diagram of the single image three-dimensional reconstruction method based on the mixed gaussian network provided by the embodiment of the application, a single Zhang Caise image including the whole body of a human body acquired by a camera is input into the mixed gaussian network, the mixed gaussian function value of each pixel point is output through the mixed gaussian network, and a three-dimensional human body model is generated according to a human body parameter diagram described by the mixed gaussian function value of each pixel point.

In an embodiment of the application, the mixed gaussian network is built up from a convolutional neural network (Convolutional Neural Networks, CNN) and a fully connected neural network (Fully Connected Neural Network, FCNN). Aiming at the coding requirement of high-quality two-dimensional images in the field of human body three-dimensional reconstruction, a HRNet network is adopted in a convolutional neural network part of a hybrid Gaussian network in the concrete implementation, the number of the scales of the network is 3, and the highest resolution scale comprises 8 convolutional residual error network modules; in the fully-connected neural network part of the hybrid Gaussian network, a 5-layer MLP (Multilayer Perception) -network is adopted, and the characteristic dimensions of the middle layer are {256, 512, 256, 256}, respectively.

Referring to fig. 2, for the block diagram of HRNet network provided by the embodiment of the present application, in order to obtain a Feature Map (Feature Map) with a high resolution scale, the HRNet network adopts a mode of firstly reducing resolution (i.e. downsampling) and then raising resolution (i.e. upsampling), and the HRNet network connects the Feature maps (Feature maps) with different resolutions in parallel, and on the basis of the parallel connection, interaction (fusion) between the Feature maps with different resolutions is added, so as to ensure sufficient fusion of local and global features in the convolution process.

In the embodiment of the present application, before three-dimensional reconstruction, a mixed gaussian network needs to be trained, and the training process of the mixed gaussian network is shown in fig. 3, and mainly includes the following steps:

S301: a training dataset is acquired.

In an alternative embodiment, in S301, a training dataset is generated by Rendering high-quality three-dimensional body scan data based on a physical-based Rendering method (Physics-based Rendering), so that the training dataset includes body images obtained by Rendering three-dimensional body scan data under different poses and clothes. Specifically, for each human object, the rendering viewpoint is rotated along the main direction of the three-dimensional human model, and in the process of rotating for one circle (i.e. 360 degrees), one high-quality human image is rendered every 2 degrees at intervals, and 180 high-quality human images are rendered in total. The rendering interval can be set according to actual requirements.

In another alternative embodiment, in S301, a human body image is collected from an existing human body data set and network resources to form a training data set.

S302: and (3) aiming at each human body image, calculating the real indication function of each sampling point on each three-dimensional projection ray inside and outside the model by adopting a sampling method based on the pixel projection rays, and fitting the true value of the Gaussian mixture function corresponding to each three-dimensional projection ray according to the real indication function.

In the embodiment of the application, in the imaging process, light rays in the three-dimensional space uniformly irradiate the human body to form projection rays. Therefore, in S302, a plurality of sampling points are set on each three-dimensional projection ray, a sampling method based on pixel projection rays is adopted for each human body image, and real indication functions of each sampling point on each three-dimensional projection ray inside and outside the model are calculated, wherein the size of sampling values is shown in an exemplary mode whether the sampling points are located inside or outside the model (for example, when the sampling points are located inside the model, the value is 1, and when the sampling points are located outside the model, the value is 0), further, according to the real indication functions of each sampling point, a mixed gaussian function true value corresponding to the corresponding three-dimensional projection ray is fitted, and the mixed gaussian function true value corresponding to each three-dimensional projection ray and the real indication function of each sampling point can be used as a supervision value for training the mixed gaussian network.

In the implementation, in S302, first, according to the external parameters of the camera for collecting the image of the human body and the image coordinates of each pixel point, starting positions and projection directions of a plurality of three-dimensional projection rays in the three-dimensional space are generated; taking the intersection point of the current three-dimensional projection ray and the three-dimensional human body model as the end point of a sampling interval, and uniformly sampling the current three-dimensional projection ray in the sampling interval (for example, assuming that the length of the sampling interval is 2m and the set sampling point number is 1000), so as to obtain at least one sampling point; then using a geometric method to calculate the true readiness function values of all sampling points inside and outside the model (for example, the sampling points are inside the three-dimensional human body model, the function value is 1, the sampling points are outside the three-dimensional human body model, and the function value is 0), thereby obtaining the true readiness function of the current three-dimensional projection ray; and finally, fitting a mixed Gaussian function true value corresponding to the true indication function of each three-dimensional projection ray by using a probability distribution fitting method.

S303: inputting a plurality of human body images, corresponding true indication functions and true values of the Gaussian mixture functions into the Gaussian mixture network to be trained, and obtaining the trained Gaussian mixture network through at least one round of iteration.

Wherein, each iteration process is shown in fig. 4, and mainly comprises the following steps:

s3031: and determining predictive functions and predictive values of the Gaussian mixture functions corresponding to the three-dimensional projection rays of each human body image by adopting a Gaussian mixture network to be trained.

In S3031, after inputting a plurality of human body images to a gaussian mixture network to be trained, for each human body image, determining predictive function values of each sampling point inside and outside a model on a plurality of three-dimensional projection rays forming the human body image, and determining a predictive function of a corresponding three-dimensional projection ray according to the predictive function values of each sampling point, further, determining a gaussian mixture function predicted value of the corresponding three-dimensional projection ray according to the predictive function of each three-dimensional projection ray.

S3032: and determining a target loss value of the Gaussian mixture network according to each predictive indirection function and the Gaussian mixture function predictive value and the corresponding true indirection function and the Gaussian mixture function true value.

In S3032, the target loss value of the gaussian mixture network includes a negative log likelihood loss, a mean square error of the display function, and a mean square error of the gaussian mixture function. The determining process of the target loss value is shown in fig. 5, and mainly includes the following steps:

S3032_1: and determining the negative log likelihood loss according to the probability distribution of the mixed Gaussian function predicted value corresponding to each three-dimensional projection ray.

In S3032_1, each human body image corresponds to a plurality of three-dimensional projection rays, a probability distribution of a corresponding mixed gaussian function predicted value is determined for each three-dimensional projection ray, and a negative log likelihood loss is determined according to the probability distribution corresponding to the plurality of three-dimensional projection rays. Wherein the negative log likelihood loss is formulated as follows:

l _log＝-logf_GMM (x) equation 1

S3032_2: and determining the mean square error of the oscillometric function according to the predicted oscillometric function and the real oscillometric function of each three-dimensional projection ray.

In S3032_2, the mean square error of the sexual function is calculated as follows:

L _Occupancy＝||Occupancy_infer(x)-Occupancy_GT(x)||² equation 2

Where L _Occupancy represents the mean square error of the oscillometric function, occup _infer (x) represents the predicted value of the oscillometric function for the sample point x on the three-dimensional projection line inside and outside the model (i.e., the value of the sample point x on the three-dimensional projection line on the corresponding predicted oscillometric function), and occup _GT (x) represents the true value of the oscillometric function for the sample point x on the three-dimensional projection line inside and outside the model (i.e., the value of the sample point x on the three-dimensional projection line on the corresponding true oscillometric function).

S3032_3: and determining the mean square error of the Gaussian mixture function according to the predicted value of the Gaussian mixture function and the true value of the Gaussian mixture function corresponding to each three-dimensional projection ray.

In S3032_3, the calculation formula of the mean square error of the mixed gaussian function is as follows:

wherein L _GMM denotes the mean square error of the mixture Gaussian function, Hybrid Gaussian function predicted value of three-dimensional projection rays inferred by hybrid Gaussian network to be trained is represented by/>And (5) representing the truth value of the Gaussian mixture function of the three-dimensional projection ray.

S3032_4: and determining a target loss value of the Gaussian mixture network according to the negative log-likelihood loss, the mean square error of the oscillography function and the mean square error of the Gaussian mixture function.

Loss=l _log+L_Occupancy+L_GMM equation 4

S3033: and adjusting parameters of the Gaussian mixture network to be trained according to the target loss value until the target loss value is within a preset range.

In S3033, the target loss value of each iteration is compared with a preset range, if the target loss value is within the preset range or reaches the upper limit of the iteration number, the adjustment of the mixed gaussian network parameter is stopped, and the network parameter with the minimum target loss value is used as the final parameter of the trained mixed gaussian network.

Optionally, in S3033, an Adam optimizer may be used to perform optimization adjustment on parameters of the gaussian mixture network.

The mixed Gaussian network in the embodiment of the application can be deployed on servers for holographic communication, including but not limited to a micro server, a cloud server and a server cluster, and can also be deployed on clients such as notebook computers, desktop computers, smart phones, tablets, VR glasses and AR glasses with an interaction function. Wherein the server and the client are collectively referred to as a reconstruction device.

Based on the trained mixed gaussian network, fig. 6 illustrates a flowchart of a single image three-dimensional reconstruction method based on the mixed gaussian network, where the flowchart is executed by a reconstruction device and mainly includes the following steps:

s601: a single color image is acquired that contains the whole body of the human body.

In an alternative embodiment, since the front detail features of the human body are rich, the camera is placed in front of the human body in S601 to take a single color image containing the whole body of the human body.

S602: and extracting the two-dimensional target characteristics of each pixel point from a single color image by adopting a pre-trained Gaussian mixture network, and calculating the Gaussian mixture function value of each pixel point on a corresponding three-dimensional projection ray according to the two-dimensional target characteristics of each pixel point.

From the foregoing embodiments, it can be seen that the gaussian mixture network is constructed from a convolutional neural network and a fully connected neural network, and features of different layers can be extracted from a single Zhang Caise image through the two neural networks. The specific feature extraction process, see fig. 7, mainly comprises the following steps:

S6021: and extracting feature graphs of different scales corresponding to the color images based on the convolutional neural network part in the mixed Gaussian network.

In an alternative embodiment, in S6021, the convolutional neural network part sets 3 scales, and the characteristic map of 3 scales is obtained by performing convolution processing on a single color image.

It should be noted that, in the embodiment of the present application, the number of different scales is not limited, for example, 5 scales may be set.

S6022: and fusing the feature images with different scales to obtain fusion feature vectors corresponding to each pixel point.

In S6022, because the feature images of different scales contain different amounts of human body information, the feature images of small scales (with low resolution) contain abundant deep semantic information of human body, and the feature images of large scales (with high resolution) contain stronger human body geometric information, fusion feature vectors corresponding to each pixel point in a single color image are obtained after the feature images of different scales are fused, wherein the fusion feature vectors contain abundant human body detail information (such as human body geometric information, deep semantic information and the like).

S6023: and extracting the mapping position of the image coordinates of each pixel point in the frequency domain based on the fully connected neural network part in the mixed Gaussian network.

In S6023, the fully-connected neural network part includes an input layer, a hidden layer, and an output layer. After the input layer obtains a single color image, frequency domain mapping is carried out on each pixel point, and the mapping position of the image coordinate of each pixel point in the frequency domain is extracted, wherein the frequency domain mapping formula is as follows:

Z _pos＝γ(p)＝(sin(2⁰πp),cos(2⁰πp),...,sin(2^L-1πp),cos(2^L-1 pi p)) equation 5

Where p is the two-dimensional image coordinates of each pixel point, p= (u, v), u, v represent the coordinates in the length (lateral resolution) and width (longitudinal resolution) directions of the color image, respectively, and L is the coding dimension of the mapping position of the frequency domain. Alternatively, l=32.

S6024: and splicing the mapping position of each pixel point and the fusion feature vector of the corresponding pixel point to obtain the two-dimensional target feature.

In S6024, the input layer of the fully-connected neural network part includes the fusion feature vector Z _Img corresponding to each pixel, in addition to the mapping position of the image coordinate of each pixel in the frequency domain. And splicing the mapping position of each pixel point and the fusion feature vector of the corresponding pixel point through the hidden layer of the fully-connected network part to obtain a spliced two-dimensional target feature Z _MLP＝cat(Z_Img,Z_pos).

S6025: and calculating the Gaussian mixture function value of each pixel point on the corresponding three-dimensional projection ray according to the two-dimensional target characteristics of each pixel point.

In S6025, after the target two-dimensional feature is obtained, the mixture gaussian function value of each pixel point is regressed through the output layer of the fully connected network part, and the regression formula is expressed as follows:

p _GMM＝f_MLP(Z_Img,Z_pos) equation 6

The specific determination process of the mixture gaussian function value is shown in fig. 8, and mainly comprises the following steps:

S6025_1: based on a fully connected neural network part in the mixed Gaussian network, gaussian processing is carried out on the two-dimensional target characteristics of each pixel point, and the mean value and the variance of two Gaussian functions are respectively obtained.

The output layer of the fully connected network part comprises two gaussian functions, and in S6025_1, the two-dimensional target feature of each pixel point is subjected to gaussian processing through the output layer of the fully connected neural network part, so that the mean value and the variance of the two gaussian functions are respectively obtained.

S6025_2: and determining the target parameters of the mixed Gaussian function according to the mean and the variance of the two Gaussian functions.

In s6025_2, the target parameters of the mixture gaussian function are determined according to the mean and variance of the two gaussian functions, and the formula is as follows:

P _GMM＝{μ₁,σ₁,μ₂,σ₂, ω, t } equation 7

Wherein mu ₁、σ₁ is the mean and variance of the first Gaussian function, mu ₂、σ₂ is the mean and variance of the second Gaussian function, omega is the mixing weight of the mixed Gaussian function, and t is the truncation parameter of the display function.

S6025_3: and calculating the Gaussian mixture function value of each pixel point on the corresponding three-dimensional projection ray according to the target parameters of the Gaussian mixture function.

In S6025_3, after knowing the target parameters of the gaussian mixture function, the gaussian mixture function value of each pixel point on the corresponding three-dimensional projection line can be calculated, where the calculation formula is as follows:

S603: and obtaining a continuous occupation value function on each three-dimensional projection ray according to the Gaussian mixture function value corresponding to each pixel point, wherein each occupation value is used for representing whether the corresponding pixel point is positioned inside or outside the human body model.

In the embodiment of the present application, the mixture gaussian function value of each pixel is used as a continuous analytic expression of the indication function of the pixel inside and outside the human body model, and in the analytic expression process, an occupation value (Occupancy) can be used to exemplarily represent whether the corresponding pixel is located inside or outside the human body model, as shown in fig. 9, when the pixel is located inside the human body model, the occupation value is 1, and when the pixel is located outside the human body model, the occupation value is 0.

The conventional human body three-dimensional reconstruction method based on a single image needs to perform dense sampling on the occupied value in the three-dimensional space, as shown in fig. 10, circles represent sampling points, and thus the efficiency of human body three-dimensional reconstruction is seriously reduced.

From the perspective of projection of a single image, in the direction of the three-dimensional projection ray corresponding to each pixel point, the change of the occupation value is piecewise discrete, as shown in fig. 11, if a dense sampling method is adopted to estimate the occupation value function (i.e. the indication function) of each three-dimensional projection ray, performance waste is caused, so that the traditional occupation value estimation method based on the alignment of the pixel points with the two-dimensional target features still has limitations in the aspect of three-dimensional human model representation.

In order to solve the above problem, in S603, the mixture gaussian function value corresponding to each pixel is converted into an example function value (i.e. occupation value) representing the pixel inside and outside the mannequin, so as to obtain a continuous occupation value function on each three-dimensional projection ray, where the conversion relationship is as follows:

Referring to fig. 12, a comparison chart of continuous change of occupancy values expressed by using a mixed gaussian function and a traditional dense sampling manner is provided for an embodiment of the present application, as shown in fig. 12, the change of occupancy values of pixel points on corresponding three-dimensional projection rays is directly described by using the mixed gaussian function, and a small number of parameters (such as a mean value and a variance) can be directly used for describing the continuous change of occupancy values of three-dimensional projection ray directions (z directions), thereby improving reconstruction and reasoning efficiency.

S604: and performing tri-linear sampling on each continuous occupation value function in a three-dimensional space to generate a uniform discrete indication function.

In S604, the uniform discrete oscillometric function is used to characterize the three-dimensional geometric surface of the human body, and the process of the uniform discrete oscillometric function is shown in fig. 13, and mainly includes the following steps:

S6041: image coordinates of the discrete and uniform sampling points projected on the color image are obtained.

Since the color image is generated by the perspective projection relationship, when performing the tri-linear interpolation, it is necessary to project the discrete and uniform sampling points onto the color image, and obtain the image (pixel) coordinates of the sampling points corresponding to the color image.

S6042: for each sampling point, determining at least one adjacent three-dimensional projection ray according to the image coordinates corresponding to the sampling point, and acquiring an intersection point of the sampling point and the at least one three-dimensional projection ray.

In S6042, for each sampling point, at least one projection ray adjacent to the sampling point (i.e., the distance from the sampling point to each three-dimensional projection ray is smaller than a set threshold) is determined according to the image coordinates corresponding to the sampling point, and a perpendicular is drawn to the determined at least one projection ray through the sampling point, so as to obtain an intersection point of the sampling point and the at least one three-dimensional projection ray.

S6043: and obtaining the occupation value of at least one intersection point from the continuous occupation value function, and performing tri-linear interpolation on the occupation value of at least one intersection point to obtain the discrete occupation value of one sampling point.

In S6043, the continuous occupation value function on each three-dimensional projection ray is known, and for each sampling point, from the continuous occupation value function on at least one three-dimensional projection ray adjacent to the sampling point, the occupation value of the intersection point of the sampling point and at least one three-dimensional projection ray can be obtained, and the discrete occupation value of the sampling point is obtained by performing tri-linear interpolation on the occupation value of at least one intersection point.

S6044: and generating a uniform discrete indication function representing the geometrical surface of the human body according to the discrete occupation value of each sampling point.

S605: and extracting the geometrical surface of the human body from the uniform discrete oscillography function to obtain the three-dimensional human body model.

An alternative embodiment is to extract the body geometry surface from the uniform discrete display function using a Marching-Cubes algorithm in S605 to obtain a three-dimensional body model.

In some embodiments, in order to improve the accuracy of the three-dimensional human body model, as shown in fig. 14, after obtaining a human body parameter map described by the gaussian mixture function value of each pixel point, a 2D image optimization method based on a generation countermeasure Network (GAN) is used to perform refinement optimization on the human body parameter map represented by the gaussian mixture parameter obtained by preliminary reconstruction, so as to obtain a reconstruction result of a finer three-dimensional human body geometric surface.

According to the single-image three-dimensional reconstruction method based on the mixed Gaussian network, the two-dimensional target feature of each pixel point is extracted from a single Zhang Caise image containing the whole body of a human body through the pre-trained mixed Gaussian network, the mixed Gaussian function value corresponding to each pixel point is generated according to the two-dimensional target feature, and the continuous occupation value function of each human body pixel point on a corresponding three-dimensional projection ray is obtained according to the mixed Gaussian parameter corresponding to each pixel point; further, three-linear sampling is carried out on the continuous occupation value function in a three-dimensional space, a uniform discrete oscillography function is generated, the geometric surface of a human body is extracted from the uniform discrete oscillography function, and a three-dimensional human body model is obtained, and as a mixed Gaussian network can extract more human body detail features from a single color image, the three-dimensional reconstruction efficiency is improved based on a single Zhang Caise image, and meanwhile, the precision of the human body three-dimensional model can be ensured; and the human body parameter map which is preliminarily reconstructed by using the Gaussian mixture function is optimized by using a GAN network, so that the reconstruction accuracy of the human body three-dimensional model is further improved.

Based on the same technical concept, the embodiment of the application provides a reconstruction device, which can be a notebook computer, a desktop computer, a smart phone, a tablet, VR glasses, AR glasses and other clients with an interaction function, and can also be a server for realizing an interaction process, including but not limited to a micro server, a cloud server, a server cluster and the like, and can realize the steps of the single-image three-dimensional reconstruction method based on the hybrid gaussian network in the above embodiment, and can achieve the same technical effect.

Referring to fig. 15, the reconstruction device comprises a processor 1501, a memory 1502, a display screen 1503 and a communication interface 1504, said display screen 1503, said memory 1502 and said processor 1501 being connected by a bus 1505;

The memory 1502 includes a data storage unit and a program storage unit, the program storage unit storing a computer program, and the processor 1501 performs the following operations according to the computer program:

acquiring a single Zhang Caise image containing the whole body of the human body acquired by a camera through the communication interface 1504 and storing the single Zhang Caise image in the data storage unit;

And extracting the geometric surface of the human body from the uniform discrete display function, obtaining a three-dimensional human body model, and displaying the three-dimensional human body model through the display screen 1503.

Optionally, the gaussian mixture network is built by a convolutional neural network and a fully-connected neural network, and the processor 1501 adopts a trained gaussian mixture network to extract the two-dimensional target feature of each pixel point from the color image, which specifically comprises the following steps:

Based on a convolutional neural network part in the Gaussian mixture network, extracting feature images with different scales corresponding to the color images, and fusing the feature images with different scales to obtain fused feature vectors corresponding to each pixel point;

and extracting the mapping position of the image coordinates of each pixel point in the frequency domain based on the fully connected neural network part in the Gaussian mixture network, and splicing the mapping position of each pixel point with the fusion feature vector of the corresponding pixel point to obtain the two-dimensional target feature.

Optionally, the processor 1501 calculates, according to the two-dimensional target feature of each pixel, a gaussian mixture function value of each pixel on a corresponding three-dimensional projection line, where the specific operations are as follows:

based on a fully connected neural network part in the Gaussian mixture network, carrying out Gaussian processing on the two-dimensional target characteristics of each pixel point to respectively obtain the mean value and the variance of two Gaussian functions;

Determining target parameters of the Gaussian mixture function according to the mean value and the variance of the two Gaussian functions;

And calculating the Gaussian mixture function value of each pixel point on the corresponding three-dimensional projection ray according to the target parameters of the Gaussian mixture function.

Optionally, the processor 1501 performs three-linear sampling on each continuous occupancy value function in a three-dimensional space to generate a uniform discrete indication function, which specifically includes:

Acquiring image coordinates of the discrete and uniform sampling points projected on the color image;

For each sampling point, determining at least one adjacent three-dimensional projection ray according to the image coordinates corresponding to the sampling point, and acquiring an intersection point of the sampling point and the at least one three-dimensional projection ray;

acquiring an occupation value of at least one intersection point from the continuous occupation value function, and performing tri-linear interpolation on the occupation value of the at least one intersection point to obtain a discrete occupation value of the sampling point;

And generating a uniform discrete indication function according to the discrete occupation value of each sampling point.

Optionally, the processor 1501 trains the mixed gaussian network by:

acquiring a training data set, wherein the training data set comprises human body images obtained by rendering three-dimensional human body scanning data under different postures and clothes;

For each human body image, a sampling method based on pixel projection rays is adopted, real indication functions of sampling points on each three-dimensional projection ray inside and outside a model are calculated, and a mixed Gaussian function true value corresponding to the corresponding three-dimensional projection ray is fitted according to the real indication functions of the sampling points;

Inputting a plurality of human body images and true values of a corresponding true indication function and a corresponding mixed Gaussian function into a mixed Gaussian network to be trained, and obtaining the trained mixed Gaussian network through at least one round of iteration, wherein each iteration process executes the following operations:

determining predictive functions and predictive values of the Gaussian mixture functions corresponding to a plurality of three-dimensional projection rays of each human body image by adopting the Gaussian mixture network to be trained;

Determining a target loss value of the Gaussian mixture network according to each predictive function and the Gaussian mixture function predictive value and the corresponding true predictive function and Gaussian mixture function true value;

And adjusting the parameters of the Gaussian mixture network to be trained according to the target loss value until the target loss value is within a preset range.

Optionally, the processor 1501 determines the target loss value of the gaussian mixture network according to each predictive oscillography function and the gaussian mixture function predicted value, and the true oscillography function and the gaussian mixture function true value, which specifically includes:

determining negative log likelihood loss according to probability distribution of the mixed Gaussian function predicted value corresponding to each three-dimensional projection ray;

Determining the mean square error of the oscillography function according to the predictive oscillography function and the real oscillography function of each three-dimensional projection ray;

determining the mean square error of the Gaussian mixture function according to the predicted value of the Gaussian mixture function and the true value of the Gaussian mixture function corresponding to each three-dimensional projection ray;

And determining a target loss value of the Gaussian mixture network according to the negative log-likelihood loss, the mean square error of the display function and the mean square error of the Gaussian mixture function.

It should be noted that fig. 15 is only an example, and provides hardware necessary for implementing the steps of the single-image three-dimensional reconstruction method based on the mixed gaussian network provided by the embodiment of the present application by using a reconstruction device, which is not shown, and includes common devices of interaction devices such as a speaker, a microphone, a power supply, an audio processor, and the like.

The Processor referred to in fig. 15 of the embodiments of the present application may be a central processing unit (Central Processing Unit, CPU), a general purpose Processor, a graphics Processor (Graphics Processing Unit, GPU) a digital signal Processor (DIGITAL SIGNAL Processor, DSP), an Application-specific integrated Circuit (ASIC), a field programmable gate array (Field Programmable GATE ARRAY, FPGA) or other programmable logic device, transistor logic device, hardware component, or any combination thereof.

Referring to fig. 16, a functional block diagram of a reconstruction device according to an embodiment of the present application mainly includes an acquisition module 1601, a feature extraction module 1602, a function determination module 1603, and a model rendering module 1604, where:

An acquisition module 1601 for acquiring a single Zhang Caise image including the whole body of the human body;

The feature extraction module 1602 is configured to extract a two-dimensional target feature of each pixel from the color image by using a pre-trained gaussian mixture network, and calculate a gaussian mixture function value of each pixel on a corresponding three-dimensional projection ray according to the two-dimensional target feature of each pixel;

The function determining module 1603 is configured to obtain a continuous occupation value function on each three-dimensional projection ray according to the mixture gaussian function value corresponding to each pixel point, where each occupation value is used to represent whether the corresponding pixel point is located inside or outside the human body model; performing tri-linear sampling on each continuous occupation value function in a three-dimensional space to generate a uniform discrete indication function representing the geometrical surface of the human body;

the model rendering module 1604 is configured to extract a geometric surface of a human body from the uniform discrete display function, and obtain a three-dimensional human body model.

The specific implementation of each of the above functional modules refers to the foregoing embodiments, and will not be repeated here.

The embodiment of the application also provides a computer readable storage medium for storing instructions which, when executed, can complete the single image three-dimensional reconstruction method based on the hybrid Gaussian network in the previous embodiment.

The embodiment of the application also provides a computer program product for storing a computer program for executing the single image three-dimensional reconstruction method based on the mixed Gaussian network in the previous embodiment.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. A single image three-dimensional reconstruction method based on a mixed Gaussian network is characterized by comprising the following steps:

acquiring a single Zhang Caise image containing the whole body of a human body;

Extracting feature images with different scales corresponding to the color images by adopting a convolutional neural network part in a pre-trained Gaussian mixture network, and fusing the feature images with different scales to obtain fused feature vectors corresponding to each pixel point in the color images;

Extracting the mapping position of the image coordinates of each pixel point in the frequency domain by adopting a fully connected neural network part in a pre-trained Gaussian mixture network, and splicing the mapping position of each pixel point with the fusion feature vector to obtain the two-dimensional target feature of the corresponding pixel point;

Carrying out Gaussian processing on the two-dimensional target features of each pixel point by adopting the fully-connected neural network part, respectively obtaining the mean value and the variance of two Gaussian functions, determining the target parameters of a Gaussian mixture function according to the mean value and the variance of the two Gaussian functions, and calculating the Gaussian mixture function value of each pixel point on a corresponding three-dimensional projection ray according to the target parameters of the Gaussian mixture function;

2. The method of claim 1, wherein the tri-linear sampling of each successive occupancy value function in three-dimensional space to generate a uniform discrete display function comprises:

3. The method of claim 1 or 2, wherein the gaussian mixture network is trained by:

4. The method of claim 3, wherein said determining the target loss value for the gaussian mixture network based on each predictive pilot function and gaussian mixture function predicted value, and the corresponding true pilot function and gaussian mixture function true value, comprises:

5. The reconstruction device is characterized by comprising a processor, a memory, a display screen and a communication interface, wherein the communication interface, the display screen, the memory and the processor are connected through a bus;

6. The reconstruction device of claim 5 wherein the processor tri-linearly samples the continuous occupancy value function in three-dimensional space to generate a uniform discrete display function, in particular by: