CN111402397A

CN111402397A - TOF depth data optimization method and device based on unsupervised data

Info

Publication number: CN111402397A
Application number: CN202010128595.4A
Authority: CN
Inventors: 刘烨斌; 赵笑晨; 王立祯; 于涛; 戴琼海
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2020-02-28
Filing date: 2020-02-28
Publication date: 2020-07-10
Anticipated expiration: 2040-02-28
Also published as: CN111402397B

Abstract

The invention discloses a TOF depth data optimization method and device based on unsupervised data, wherein the method comprises the following steps: acquiring a human body three-dimensional model database without noise; determining an independent variable of the analog noise; adding noise in the longitudinal direction and the transverse direction respectively; rendering the original three-dimensional model and the three-dimensional model subjected to noise adding, and building a coder decoder network; designing an energy function and a monitoring network, inputting a network output result into the monitoring network, and constraining the approximation of a feature graph extracted from the monitoring network by convolution layers; and performing iterative regression optimization on the parameter weights of a generator and a discriminator in the generative antagonistic neural network by using a human body three-dimensional model database and an energy function until the weights are converged, and taking the acquired human body depth picture under the real scene as input to obtain a human body mesh model which contains geometric details and does not contain noise. The method uses an unsupervised three-dimensional scanning data generation type countermeasure network, and improves the quality of depth data while keeping geometric details.

Description

TOF depth data optimization method and device based on unsupervised data

Technical Field

The invention relates to the technical field of three-dimensional reconstruction in computer vision, in particular to a method and a device for optimizing TOF depth data based on unsupervised data.

Background

With the continuous development of three-dimensional reconstruction technology in the field of computer vision, the three-dimensional reconstruction technology of human body becomes a research hotspot in the field of computer rooms. The use of depth cameras to capture depth images to provide additional depth information for three-dimensional reconstruction techniques is currently becoming an important direction in research efforts.

The current consumer class monocular depth cameras based on the TOF principle calculate the distance of an object under test from the camera by continuously transmitting a pulse of light (typically invisible light) onto the object under test, then receiving the pulse of light reflected back from the object, and detecting the time of flight (round trip) of the pulse of light. Although microsoft KinectV2 has been a great deal of development in accuracy and depth image quality compared to previous devices, the result is a significant noise distribution due to the limited accuracy of the device sensors and the imaging principles of TOF itself.

At present, the mainstream noise removal for the depth picture mainly filters high-frequency noise by designing a low-pass filter. However, when a depth picture of a person is acquired, due to the existence of geometric details such as the five sense organs of the person and clothes wrinkles, the low-pass filter is adopted to simultaneously lose fine geometric information. On the other hand, the depth neural network is utilized to realize that the quality of the depth data is improved while the geometric details are kept, but because the actually acquired depth data lack the actual depth picture matched with the actually acquired depth data, supervised actual data are difficult to construct for training optimization.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the related art.

To this end, it is an object of the present invention to propose an unsupervised data-based TOF depth data optimization method that can exploit generative countermeasure networks, using only unsupervised three-dimensional scan data, while preserving geometric details, and improving the depth data quality.

Another object of the invention is to propose a TOF depth data optimization device based on unsupervised data.

In order to achieve the purpose, the embodiment of the invention provides a TOF depth data optimization method based on unsupervised data, which comprises the following steps of obtaining a human body three-dimensional model database without noise, analyzing a noise distribution mode of an acquired picture, determining independent variables of simulated noise as the distance between an object surface point and a depth camera and the normal direction in a camera coordinate system, adding noise in the longitudinal direction and the transverse direction respectively, rendering the original three-dimensional model and the three-dimensional model after noise addition under different visual angles, different illumination and different backgrounds, taking the obtained picture as original data during network training, building an encoder decoder network based on a Pythrch depth learning platform, designing 3Dloss based on L1 norm as a main constrained energy function, designing a supervision network, inputting a normal gradient map of a network output lamination result and a normal gradient map of a noiseless model into the supervision network, constraining the supervision network to extract a characteristic map in a volume manner, optimizing parameter weights of a regression generator and a discriminator in a generative countermeasure neural network by utilizing the human body three-dimensional model database and the energy function until the weight of the parameter weight of the regression generator and the discriminator in the synthetic countermeasure neural network are optimized, and the TOF depth data optimization method obtains a real scene without noise by utilizing the human body three-dimensional model database and the energy function 2.

According to the method for optimizing the TOF depth data based on the unsupervised data, for a fine figure three-dimensional model database, noise distribution of a TOF principle depth camera is simulated by using a mathematical method, a generative confrontation neural network is built by using depth data obtained by rendering, and the optimization quality of a noise-containing depth map is improved; using a character depth picture under a real scene collected by a Microsoft KinectV2 depth camera as input, and outputting a human body mesh model containing geometric details by a network; therefore, only unsupervised three-dimensional scanning data is used, the generative countermeasure network can be used, and the quality of depth data is improved while geometric details are kept.

In addition, the TOF depth data optimization method based on unsupervised data according to the above embodiment of the invention may also have the following additional technical features:

further, in an embodiment of the present invention, the longitudinal direction is used for uniformly sampling in the full-image effective depth area along the X-axis and Y-axis directions respectively to determine a noisy central point, and the noise intensity is calculated according to the distance between the corresponding object surface point and the depth camera and the normal direction in the camera coordinate system to add gaussian distribution noise, and the transverse direction is used for simulating the effect of edge misalignment of the object edge depth drastically-changing area in the depth image to perform small-scale random erosion expansion processing on the mask of the human body.

Further, in an embodiment of the present invention, the obtaining a three-dimensional model database of human body without noise includes: and acquiring the three-dimensional model data in a uniform illumination indoor environment through scanning equipment.

Further, in one embodiment of the present invention, the human mesh model employs a neural network structure of unsupervised data and generative confrontation.

Further, in an embodiment of the present invention, the method further includes: obtaining a vivid corresponding TOF depth picture through a simulation method; and constructing a generative countermeasure network structure with a supervision network by taking the original data and the simulation data as network drivers, and taking each layer of feature diagram of the supervision network as constraint.

In order to achieve the above purpose, another embodiment of the invention provides a TOF depth data optimization device based on unsupervised data, which comprises an acquisition module, an analysis module, an adding module, a rendering module, a design module and a regression optimization module, wherein the acquisition module is used for acquiring a human body three-dimensional model database without noise, the analysis module is used for analyzing a noise distribution mode of an acquired picture and determining independent variables of simulated noise as the distance between an object surface point and a depth camera and the normal direction in a camera coordinate system, the adding module is used for adding noise in the longitudinal direction and the transverse direction respectively, the rendering module is used for rendering an original three-dimensional model and the three-dimensional model after noise addition under different visual angles, different illumination and different backgrounds, the obtained picture is used as original data during network training, an encoder decoder network is built based on a Pytorch depth learning platform, the design module is used for designing a 3Dloss based on a norm L1 as a main constraint energy function, designing a supervision network, inputting a normal gradient map of a network output result and a gradient map of the unsupervised model into the supervision network, constraining a feature map extracted by a supervision network convolution layer, the constraint layer, the regression optimization module is used for utilizing a microsoft energy model and a neural network regression optimization module, and a noise convergence model generator for optimizing a human body depth vector including a noise weight until a human body collection depth vector is input into a human body collection depth generator 2 under a human body geometric regression model.

According to the TOF depth data optimization device based on the unsupervised data, for a fine figure three-dimensional model database, noise distribution of a TOF principle depth camera is simulated by using a mathematical method, a generative confrontation neural network is built by using depth data obtained by rendering, and the optimization quality of a noise-containing depth map is improved; using a character depth picture under a real scene collected by a Microsoft KinectV2 depth camera as input, and outputting a human body mesh model containing geometric details by a network; therefore, only unsupervised three-dimensional scanning data is used, the generative countermeasure network can be used, and the quality of depth data is improved while geometric details are kept.

In addition, the TOF depth data optimizing device based on unsupervised data according to the above embodiment of the invention may also have the following additional technical features:

Further, in an embodiment of the present invention, the acquiring module is further configured to acquire the three-dimensional model data in a uniform illumination indoor environment through a scanning device.

Further, in an embodiment of the present invention, the method further includes: and the construction module is used for obtaining a vivid corresponding TOF depth picture by a simulation method, taking the original data and the simulation data as network drive, constructing a generation type countermeasure network structure with a supervision network, and taking each layer of feature diagram of the supervision network as constraint.

Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.

Drawings

The foregoing and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:

FIG. 1 is a flow chart of a method of unsupervised data based TOF depth data optimization according to an embodiment of the present disclosure;

fig. 2 is a schematic structural diagram of an apparatus for optimizing TOF depth data based on unsupervised data according to an embodiment of the invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

The TOF depth data optimization method and device based on unsupervised data according to the embodiment of the invention are described below with reference to the drawings, and firstly, the TOF depth data optimization method based on unsupervised data according to the embodiment of the invention will be described with reference to the drawings.

FIG. 1 is a flow chart of a method of unsupervised data based TOF depth data optimization according to an embodiment of the present invention.

As shown in fig. 1, the method for optimizing TOF depth data based on unsupervised data relates to optimization of TOF depth data based on unsupervised data, depth data obtained based on a TOF depth camera is widely applied to the field of three-dimensional reconstruction of computer vision, and the quality of the depth data directly influences the reconstruction effect, wherein the method comprises the following steps:

in step S101, a three-dimensional model database of a human body containing no noise is acquired.

In one embodiment of the present invention, obtaining a three-dimensional model database of a human body without noise includes: and acquiring the three-dimensional model data in a uniform illumination indoor environment through scanning equipment.

It can be appreciated that a database of three-dimensional models of a human body without noise is obtained by a scanning device in a uniformly illuminated indoor environment.

In step S102, the noise distribution pattern of the captured picture is analyzed, and the independent variables of the simulated noise are determined as the distance from the depth camera to the object surface point and the normal direction in the camera coordinate system.

It will be appreciated that analysis of the noise distribution pattern of the pictures taken by the microsoft KinectV2 depth camera based on the TOF principle determines the independent variables of the simulated noise as the distance of the object surface point from the depth camera and the normal in the camera coordinate system.

In step S103, noise is added in the longitudinal direction and the lateral direction, respectively.

In one embodiment of the invention, the longitudinal direction is used for uniformly sampling and determining a noise center point in the full-image effective depth area along the X-axis direction and the Y-axis direction respectively, the noise intensity is calculated according to the distance between the corresponding object surface point and the depth camera and the normal direction in the camera coordinate system so as to add Gaussian distribution noise, and the transverse direction is used for simulating the effect of uneven edges of the object edge depth drastic change area in the depth image so as to perform small-scale random corrosion expansion processing on the mask of the human body.

In step S104, rendering the original three-dimensional model and the three-dimensional model after noise addition under different viewing angles, different illumination and different backgrounds is performed, the obtained picture is used as original data during network training, and a coder decoder network is built based on a Pytorch deep learning platform.

In step S105, an energy function based on 3Dloss of the L1 norm as a main constraint is designed, a supervision network is designed, a normal gradient map of a network output result and a normal gradient map of a noise-free model are input into the supervision network, and the feature map extracted by the supervision network on a convolutional layer-by-convolutional layer basis is constrained to approach.

In order to enhance the learning capability of the network, a supervision network is designed, a normal gradient map of a network output result and a normal gradient map of a noise-free model are input into the supervision network, and the feature map extracted by the supervision network on a convolutional layer-by-convolutional layer basis is constrained to be approximate.

In step S106, iterative regression optimization is performed on the parameter weights of the generator and the discriminator in the generative confrontation neural network by using the human three-dimensional model database and the energy function until the weights converge, and a human depth image in a real scene acquired by using the microsoft KinectV2 depth camera is used as an input to obtain a human mesh model containing geometric details and no noise.

It can be understood that the parameter weights of the generator and the discriminator in the generative confrontation neural network are subjected to iterative regression optimization by utilizing the constructed human body model database and a reasonable energy function until the weights are basically converged; by taking the depth picture of the human body in the real scene acquired by the Microsoft KinectV2 depth camera as input, the network can output the human body mesh model which contains geometric details and does not contain noise.

It can be understood that the driven mesh deformation of the embodiment of the present invention adopts a neural network structure of unsupervised data and generative countermeasures, comprising the steps of:

A. the initial data is a three-dimensional scanning human body model without noise, and a vivid corresponding TOF depth picture is obtained through a simulation method.

B. The original data and the simulation data are used as network drive, a generating type confrontation network structure with a supervision network is constructed, each layer of feature diagram of the supervision network is used as constraint, and the mode effectively promotes network training convergence and result refinement.

The TOF depth data optimization method based on unsupervised data will be further explained by an embodiment as follows:

1. a training portion is collected. The method comprises the steps of establishing a database containing 800 fine figure three-dimensional models by using three-dimensional scanning equipment, rendering under different viewing angles and different illumination conditions to obtain about 8000 original image data sets, and generating TOF noise images corresponding to original images by using a simulation method. As raw data when training a generative antagonistic neural network. Training a neural network structure built on the basis of the Pythrch deep learning platform until the weight values in the network are basically converged, and finally obtaining a depth picture with optimized quality.

2. Test and use section. By using the RGBD pictures collected by the Microsoft KinectV2 depth camera as the input part of the trained network model, the generator part in the network can output the human body depth pictures containing geometric details.

In summary, the TOF depth data optimization method based on unsupervised data provided by the embodiment of the invention simulates the noise distribution of a TOF principle depth camera by using a mathematical method for a fine figure three-dimensional model database, builds a generative confrontation neural network by using depth data obtained by rendering, and performs optimization quality improvement on a noise-containing depth map; using a character depth picture under a real scene collected by a Microsoft KinectV2 depth camera as input, and outputting a human body mesh model containing geometric details by a network; therefore, only unsupervised three-dimensional scanning data is used, the generative countermeasure network can be used, and the quality of depth data is improved while geometric details are kept.

Next, a TOF depth data optimization apparatus based on unsupervised data proposed according to an embodiment of the present invention is described with reference to the drawings.

Fig. 2 is a schematic structural diagram of an unsupervised data-based TOF depth data optimization device according to an embodiment of the invention.

As shown in fig. 2, the unsupervised data-based TOF depth data optimization apparatus 10 includes: an acquisition module 100, an analysis module 200, an addition module 300, a rendering module 400, a design module 500, and an iterative regression optimization module 600.

The system comprises an acquisition module 100, an analysis module 200, a rendering module 400, a design module 500, a regression optimization module 600 and a regression optimization module 600, wherein the acquisition module 100 is used for acquiring a noise distribution pattern of an acquired picture, determining independent variables of simulated noise as the distance between an object surface point and a depth camera and the normal direction in a camera coordinate system, the adding module 300 is used for adding noise in the longitudinal direction and the transverse direction respectively, the rendering module 400 is used for rendering an original three-dimensional model and the three-dimensional model after noise addition in different visual angles, different illumination and different backgrounds, the acquired picture is used as original data during network training, a coder decoder network is built based on a Pythrch depth learning platform, the design module 500 is used for designing an energy function based on L1 norm as a main constraint, designing a supervision network, inputting a normal gradient map of a network output result and a normal gradient map of an noiseless model into the supervision network, the constraint network, constraining the feature map extracted from a convolution layer by the supervision network, the regression optimization module 600 is used for iterating weight of a Microsoft authentication generator and an iteration of a regression optimization module in a generative countermeasure neural network, the three-dimensional model database and the acquisition module 2, the image of the acquired scene, and the acquired three-dimensional model, and the depth data of the acquired scene, and the acquired.

Further, in one embodiment of the present invention, the longitudinal direction is used for uniformly sampling and determining a noisy central point in the full-image effective depth area along the X-axis direction and the Y-axis direction respectively, and the noise intensity is calculated according to the distance from the depth camera to the corresponding object surface point and the normal direction in the camera coordinate system to add gaussian distribution noise, and the transverse direction is used for simulating the effect of uneven edges of the object edge depth drastically-changing area in the depth image to perform small-scale random erosion and expansion processing on the mask of the human body.

Further, in an embodiment of the present invention, the obtaining module 100 is further configured to acquire the three-dimensional model data in a uniform illumination indoor environment through a scanning device.

It should be noted that the foregoing explanation of the embodiment of the TOF depth data optimization method based on unsupervised data is also applicable to the TOF depth data optimization device based on unsupervised data of the embodiment, and is not repeated here.

According to the TOF depth data optimization device based on the unsupervised data, provided by the embodiment of the invention, for a fine figure three-dimensional model database, noise distribution of a TOF principle depth camera is simulated by using a mathematical method, a generative confrontation neural network is built by using depth data obtained by rendering, and the optimization quality of a noise-containing depth map is improved; using a character depth picture under a real scene collected by a Microsoft KinectV2 depth camera as input, and outputting a human body mesh model containing geometric details by a network; therefore, only unsupervised three-dimensional scanning data is used, the generative countermeasure network can be used, and the quality of depth data is improved while geometric details are kept.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include at least one such feature. In the description of the present invention, "a plurality" means at least two, e.g., two, three, etc., unless specifically limited otherwise.

In the present invention, unless otherwise expressly stated or limited, the first feature "on" or "under" the second feature may be directly contacting the first and second features or indirectly contacting the first and second features through an intermediate. Also, a first feature "on," "over," and "above" a second feature may be directly or diagonally above the second feature, or may simply indicate that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature may be directly under or obliquely under the first feature, or may simply mean that the first feature is at a lesser elevation than the second feature.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above are not necessarily intended to refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, various embodiments or examples and features of different embodiments or examples described in this specification can be combined and combined by one skilled in the art without contradiction.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. A TOF depth data optimization method based on unsupervised data is characterized by comprising the following steps:

acquiring a human body three-dimensional model database without noise;

analyzing a noise distribution mode of the acquired picture, and determining independent variables of the simulated noise as the distance between the surface point of the object and the depth camera and the normal direction in a camera coordinate system;

adding noise in the longitudinal direction and the transverse direction respectively;

rendering the original three-dimensional model and the three-dimensional model subjected to noise addition under different visual angles, different illumination and different backgrounds to obtain pictures serving as original data during network training, and building a coder decoder network based on a Pythrch deep learning platform;

designing an energy function based on 3Dloss of L1 norm as main constraint, designing a monitor network, inputting the normal gradient map of the network output result and the normal gradient map of the noiseless model into the monitor network, constraining the monitor network to approximate the characteristic map extracted from convolution layer by layer, and

and performing iterative regression optimization on the parameter weights of a generator and a discriminator in the generative confrontation neural network by using the human body three-dimensional model database and the energy function until the weights are converged, and taking a human body depth picture under a real scene acquired by using a Microsoft KinectV2 depth camera as input to obtain a human body mesh model which contains geometric details and does not contain noise.

2. The method according to claim 1, wherein the longitudinal direction is used for uniformly sampling in the full-map effective depth area along the X-axis direction and the Y-axis direction respectively to determine a noise center point, and the noise intensity is calculated according to the distance of the corresponding object surface point from the depth camera and the normal direction in the camera coordinate system to add Gaussian distribution noise, and the transverse direction is used for simulating the effect of the edge irregularity of the object edge depth drastically-changed area in the depth picture to perform small-scale random erosion expansion processing on the mask of the human body.

3. The method of claim 1, wherein obtaining a noise-free three-dimensional model database of a human body comprises:

and acquiring the three-dimensional model data in a uniform illumination indoor environment through scanning equipment.

4. The method of claim 1, wherein the human mesh model employs a neural network structure of unsupervised data and generative confrontation.

5. The method of claim 4, further comprising:

obtaining a vivid corresponding TOF depth picture through a simulation method;

and constructing a generative countermeasure network structure with a supervision network by taking the original data and the simulation data as network drivers, and taking each layer of feature diagram of the supervision network as constraint.

6. An unsupervised data based TOF depth data optimization apparatus, comprising:

the acquisition module is used for acquiring a human body three-dimensional model database without noise;

the analysis module is used for analyzing the noise distribution mode of the acquired picture and determining the independent variable of the simulation noise as the distance between the surface point of the object and the depth camera and the normal direction in the camera coordinate system;

the adding module is used for adding noise in the longitudinal direction and the transverse direction respectively;

the rendering module is used for rendering the original three-dimensional model and the three-dimensional model subjected to noise addition under different visual angles, different illumination and different backgrounds to obtain pictures serving as original data during network training, and a coder decoder network is built based on a Pythrch deep learning platform;

a design module for designing energy function based on 3Dloss of L1 norm as main constraint, designing a monitor network, inputting the normal gradient map of network output result and the normal gradient map of noiseless model into the monitor network, constraining the monitor network to approximate the feature map extracted from convolution layer by convolution layer, and

and the iterative regression optimization module is used for performing iterative regression optimization on the parameter weights of the generator and the discriminator in the generative confrontation neural network by using the human body three-dimensional model database and the energy function until the weights are converged, and obtaining a human body mesh model which contains geometric details and does not contain noise by taking a human body depth picture under a real scene acquired by using a Microsoft KinectV2 depth camera as input.

7. The apparatus of claim 6, wherein the longitudinal direction is used for uniformly sampling in the full-image effective depth area along the X-axis direction and the Y-axis direction respectively to determine a noise center point, and calculating the noise intensity according to the distance of the corresponding object surface point from the depth camera and the normal direction in the camera coordinate system to add Gaussian distribution noise, and the transverse direction is used for simulating the effect of the edge irregularity of the object edge depth drastically-changed area in the depth image to perform small-scale random erosion expansion processing on the mask of the human body.

8. The apparatus of claim 6, wherein the acquiring module is further configured to acquire the three-dimensional model data by a scanning device in a uniform illumination room environment.

9. The apparatus of claim 6, wherein the human mesh model employs a neural network structure of unsupervised data and generative confrontation.

10. The method of claim 9, further comprising:

and the construction module is used for obtaining a vivid corresponding TOF depth picture by a simulation method, taking the original data and the simulation data as network drive, constructing a generation type countermeasure network structure with a supervision network, and taking each layer of feature diagram of the supervision network as constraint.