CN116309095A

CN116309095A - Multi-view ToF depth measurement denoising method combined with RGB picture

Info

Publication number: CN116309095A
Application number: CN202211547453.7A
Authority: CN
Inventors: 张越一; 常文杰; 熊志伟
Original assignee: University of Science and Technology of China USTC
Current assignee: University of Science and Technology of China USTC
Priority date: 2022-12-05
Filing date: 2022-12-05
Publication date: 2023-06-23

Abstract

The invention discloses a multi-view ToF depth measurement denoising method combining RGB pictures, which comprises the following steps: 1, obtaining imaging results of a measurement scene under multiple view angles by using an RGB-D camera, 2, calculating camera light rays corresponding to each pixel point in the imaging results, sampling 3D coordinates on the camera light rays, 3, predicting a density value, a radiation value, an infrared intensity value and a normal direction of each coordinate point by using a neural network, 4, rendering the prediction results of the neural network to obtain the imaging results of each camera light ray under multipath interference, 5, constructing a loss function training network by using the rendered imaging results and the acquired imaging results, and 6, generating depth measurement data for removing the multipath interference influence by using the trained network. According to the invention, through the multi-view imaging result and the noise caused by multipath interference in the TOF imaging process of RGB picture removal, more accurate depth measurement data is obtained, and the defect that a large amount of real depth data is needed as supervision is overcome.

Description

Multi-view ToF depth measurement denoising method combined with RGB picture

Technical Field

The invention belongs to the field of computer vision, and particularly relates to a method for removing noise caused by multipath interference in the imaging process of a ToF camera through multi-view RGB-D pictures.

Background

In recent years, RGB-D camera modules based on Time-of-Fight (TOF) have found great use in mobile devices. Which provides a reliable way of depth data measurement. Compared to structured light cameras or binocular imaging systems, TOF cameras provide more accurate depth data over a short range.

TOF devices calculate the depth of a geometrically scene by emitting modulated infrared light to the scene and calculating measurements on the sensor with different phase shifts. However, toF devices are subject to multipath interference (multipath interference, MPI): the single pixel signal is composed of multiple light reflected path signals, which can cause errors in acquiring depth information, thereby reducing the application range of the TOF camera. In order to eliminate the influence of the MPI effect as much as possible, most of the previous work has been to increase the accuracy of the acquired signal with additional measures, such as encoding the probe optical signal or using multiple modulation frequencies with different phase shifts, whereby the errors due to multipath effects can be eliminated, but this requires hardware modifications (e.g. modifying the built-in infrared light transmitter, using a sensor that can receive multiple modulation frequencies), or multiple scans using the same standard ToF camera.

Due to the rapid development of deep learning in recent years, more and more researchers want to solve the multipath effect by way of deep learning, so researchers start to solve the error problem in TOF imaging from the deep learning method, which is very dependent on the data set used for training. However, this approach requires a large amount of real depth data as a supervision, and one model can only be used on a single model of camera, without versatility.

Disclosure of Invention

The invention aims to solve the problems that a large amount of real depth map data is needed to be used as supervision when TOF denoising is carried out in the prior art and is only suitable for a single type of camera, and provides a multi-view TOF depth measurement denoising method combined with RGB pictures, so that noise caused by multipath interference in the TOF imaging process can be removed by combining multi-view imaging results with the RGB pictures, more accurate depth measurement data is obtained, and the defect that a large amount of real depth data is needed to be used as supervision is overcome.

In order to achieve the aim of the invention, the invention adopts the following technical scheme:

the invention relates to a multi-view ToF depth measurement denoising method combining RGB pictures, which is characterized by comprising the following steps:

step 1, obtaining N groups of RGB images and ToF phase measurement images { I } by using an RGB-D imaging system after calibration alignment _n ,P _n I n=1, 2, …, N }, where I _n Represents the nth RGB map, P _n Representing an nth ToF phase measurement plot;

will n RGB map I _n The pixel points of the ith column and the jth row in the array are marked as

Wherein (1)>

Representing the nth RGB map I _n R value of pixel point of ith column and jth row, < >>

Representing the nth RGB map I _n G value of pixel point of ith column and jth line, < >>

Representing the nth RGB map I _n B value of pixel point of ith column and jth row;

n-th ToF phase measurement map P _n The pixel points of the ith column and the jth row in the array are marked as

Wherein (1)>

Representing the nth ToF phase measurement map P _n Sinusoidal measurement components of pixel points of the ith column and jth row,

represents the nth ToFPhase measurement map P _n Cosine measurement components of pixel points of the ith column and the jth row;

step 2, taking the camera optical center of the nth group of pictures as an origin o _n Will have an origin o _n The direction of the pixel point (i, j) pointing to the ith column and jth row is noted as

Thereby obtaining the origin o from the equation (1) _n One ray passing through pixel point (i, j)>

As camera light:

in formula (1), x represents a ray

Any point on and origin o _n A distance therebetween; and has the following components:

o _n ＝-t _n (2)

in the formulas (2) and (3), K represents a camera internal reference; r is R _n Camera pose E representing nth set of images _n A lower rotation matrix; t is t _n Camera pose E representing nth set of images _n Lower translation vector, n=1, 2, …, N;

step 3, utilizing a hierarchical sampling method to extract rays from the rays

Upsampling a position points:

step 3.1, setting the sampling interval as [ x ] _near ,x _far ]And will [ x ] _near ,x _far ]Evenly dividing the space into A interval blocks; wherein x is _near Representing the sampling point and the originalPoint o _n X is the nearest distance of (x) _far Representing the sampling point and origin o _n Is the furthest distance from (a);

step 3.2, randomly sampling one sample x from the a-th block interval _a Wherein x is _a Representing the current sampling position point and origin o _n The distance between them is as follows:

in the formula (2), the amino acid sequence of the compound,

representing compliance; u represents even distribution;

step 3.3, sample x _a Substituting the obtained value into the formula (1) to obtain an a-th 3D coordinate point

Step 3.4, obtaining each 3D coordinate point of A intervals according to the process of the steps 3.2-3.3 and forming a 3D coordinate point set

Step 4, constructing a multi-layer perceptron network

And each layer adopts a ReLU as an activation function; and the a 3D coordinate point +.>

Input multi-layer perceptron network->

Thereby obtaining the a 3D coordinate points by using the formula (5) and the formula (6)

Corresponding density value sigma _a Radiation valuec _a Infrared intensity value b _a Normal direction n _a ：

In the formulas (5) and (6),

representing the gradient;

step 5, calculating camera light using formula (7), formula (8), formula (9) and formula (10), respectively

Corresponding RGB values

ToF intensity value->

Camera light->

Intersection point of passing plane and origin o _n Distance of (2)

Camera light->

Plane normal vector at plane intersection point +.>

In the formula (7), the formula (8), the formula (9) and the formula (10), c _a Representing the a 3D coordinate point

Radiation value of b _a Represents the a 3D coordinate point +.>

Infrared intensity value, x _a Represents the a 3D coordinate point +.>

From the origin o _n Distance n of (2) _a Represents the a 3D coordinate point +.>

Normal vector, w _a Represents the a 3D coordinate point +.>

Is weighted and has:

w _a ＝T _a (1-exp(-σ _a δ _a )) (11)

in the formula (11), T _a Representing the 1 st 3D coordinate point

And the a 3D coordinate point +.>

Transparency between them, and is obtained from formula (12), delta _a Represents the a+1th 3D coordinate point +.>

And the a 3D coordinate point +.>

The distance between the two is obtained by a formula (13);

δ _a ＝|x _a+1 -x _a | (13)

in the formula (13), x _a+1 Representing the (a+1) th 3D coordinate point

From the origin o _n Is a distance of (2);

step 6, constructing camera light by using the formula (14) -formula (16)

Reflected light at the intersection of the planes:

in the formulae (14) to (16),<,>representing a vector included angle cosine value operator;

representing reflected rays +.>

Is provided with a reference point (a) to the origin of (c),

representing reflected rays +.>

Is a direction of (2);

step 7, obtaining the reflected light by using the formula (7) -formula (9)

Corresponding RGB values->

The distance from the intersection point of the intersection plane is +.>

And infrared intensity value->

Thereby calculating path MPI of multipath reflection using equation (17):

step 8, obtaining an nth RGB map I under the multipath interference setting by using the formula (18) and the formula (19) respectively _n RGB measurements at pixel points of ith column and jth row of a medium

And the phase measurement value +_at the pixel point of the j-th row of the i-th column in the n-th ToF phase measurement map ToF>

In the formula (17), lambda is the wavelength of infrared light modulation of the ToF camera;

step 9, constructing the multi-layer perceptron network by utilizing the construction type (20)

Loss function of the nth group of graphs +.>

Step 10, RGB map and ToF phase measurement map { I > based on N groups _n ,P _n I n=1, 2, …, N }, the multi-layer perceptron network is gradient descent method

Training and calculating the loss function +.>

To update the network parameters until the loss function +.>

Converging to obtain trained multi-layer perceptron network->

The method is used for calculating the depth measurement result after denoising any one camera light.

The electronic device of the invention comprises a memory and a processor, wherein the memory is used for storing a program for supporting the processor to execute the multi-view ToF depth measurement denoising method, and the processor is configured to execute the program stored in the memory.

The invention relates to a computer readable storage medium, wherein a computer program is stored on the computer readable storage medium, and is characterized in that the computer program is executed by a processor to execute the steps of the multi-view TOF depth measurement denoising method.

Compared with the prior art, the invention has the beneficial effects that:

1. according to the method, the TOF depth measurement result under the multi-view angle is denoised by combining the RGB picture, the depth measurement result of the TOF camera is optimized through the depth information obtained by the multi-view angle geometry, and the noise caused by multipath interference in the imaging process of the TOF camera is removed, so that a more accurate depth measurement result is obtained.

2. In the invention, RGB pictures are introduced as assistance in the TOF denoising task, and compared with a phase diagram obtained by only adopting a TOF camera, the RGB pictures contain rich texture information, and the reliable depth information assistance denoising task can be obtained through multi-view geometry.

3. The invention relates to a self-supervision denoising method, which does not need a real depth map as supervision data, but adopts measurement results under different visual angles to mutually supervise, and the performance of the self-supervision denoising method is not limited by a training data set, so that the self-supervision denoising method has wider application scenes.

Drawings

FIG. 1 is a denoising flow chart according to an embodiment of the present invention;

FIG. 2 is a depth map calculated from a ToF phase measurement map;

fig. 3 is a depth map after denoising according to the present invention.

Detailed Description

In this embodiment, as shown in fig. 1, a denoising method for multi-view ToF depth measurement results of a combined RGB picture is performed according to the following steps:

step 1, obtaining N groups of RGB images and ToF phase measurement images { I } by using an RGB-D imaging system after calibration alignment _n ,P _n |n＝1,2,…,N}，Wherein I is _n Represents the nth RGB map, P _n Representing an nth ToF phase measurement plot; fig. 2 shows a depth map calculated from a phase measurement map containing noise.

Wherein (1)>

Wherein (1)>

representing the nth ToF phase measurement map P _n Cosine measurement components of pixel points of the ith column and the jth row;

As camera light:

in formula (1), x represents a ray

o _n ＝-t _n (2)

in the formulas (2) and (3), K represents a camera internal reference; r is R _n Camera pose E representing nth set of images _n A lower rotation matrix; t is t _n Camera pose E representing nth set of images _n Lower translation vector, n=1, 2, …, N; the camera internal parameters can be calibrated by Matlab, and the camera pose can be obtained by inputting N RGB images into COLMAP.

Step 3, utilizing a hierarchical sampling method to extract rays from the rays

Up-sampling 128 location points, the more location points sampled, the more accurate the resulting depth value, but the more training time for the network:

step 3.1, setting the sampling interval to be [0,10 ]]And will [0,10]Uniformly dividing the two blocks into 128 interval blocks; wherein 0 represents the sampling point and the origin o _n Is 0,10 represents the sampling point and the origin o _n Is 10;

step 3.2, randomly sampling one sample x from the a-th block interval _a Wherein x is _a Representing the current miningSample position point and origin o _n The distance between them is as follows:

in the formula (2), the amino acid sequence of the compound,

representing compliance; u represents even distribution;

Step 3.4, obtaining each 3D coordinate point of 128 intervals according to the process of the steps 3.2-3.3 and forming a 3D coordinate point set

Step 4, constructing a multi-layer perceptron network containing 8 full connection layers

Each layer contains 256 nodes and adopts a ReLU as an activation function; and the a 3D coordinate point +.>

Input multi-layer perceptron network->

Thereby obtaining the a 3D coordinate point +.>

Corresponding density value sigma _a Radiation value c _a Infrared intensity value b _a Normal direction n _a ：

In the formulas (5) and (6),

representing the gradient; which in actual operation is for the output result sigma _a At the input coordinates

Respectively obtaining partial derivatives in the x, y and z directions;

Corresponding RGB values

ToF intensity value->

Camera light->

Intersection point of passing plane and origin o _n Distance of (2)

Camera light->

Plane normal vector at plane intersection point +.>

Radiation value of b _a Represents the a 3D coordinate point +.>

Infrared intensity value, x _a Represents the a 3D coordinate point +.>

Normal vector, w _a Represents the a 3D coordinate point +.>

Is weighted and has:

w _a ＝T _a (1-exp(-σ _a δ _a )) (11)

in the formula (11), T _a Representing the 1 st 3D coordinate point

And the a 3D coordinate point +.>

And the a 3D coordinate point +.>

The distance between the two is obtained by a formula (13);

δ _a ＝|x _a+1 -x _a | (13)

in the formula (13), x _a+1 Representing the (a+1) th 3D coordinate point

From the origin o _n Is a distance of (2); and delta ₁₂₈ Taking the average value of the distances between sampling points, and calculating the average value as +.>

Step 6, constructing camera light by using the formula (14) -formula (16)

Reflected light at the intersection of the planes:

representing reflected rays +.>

Is provided with a reference point (a) to the origin of (c),

representing reflected rays +.>

Is a direction of (2);

step 7, obtaining the reflected light by using the formula (7) -formula (9)

Corresponding RGB values->

The distance from the intersection point of the intersection plane is +.>

And infrared intensity value->

Thereby calculating path MPI of multipath reflection using equation (17):

step 8, obtaining an nth RGB map I under the multipath interference setting by using the formula (18) and the formula (19) respectively _n Ith row of (b)RGB measurements at pixel points of j rows

In the formula (17), lambda is the wavelength modulated by infrared light of the ToF camera, and in the embodiment, lambda in the used acquisition equipment is 16m;

Loss function of the nth group of graphs +.>

Training and calculating the loss function +.>

To update the network parameters until the loss function +.>

Converging to obtain trained multi-layer perceptron network->

The method is used for calculating the depth measurement result after denoising any one camera light. The denoising result is shown in fig. 3, so that noise data in the phase measurement diagram acquired by the ToF camera is removed, and smoother results are obtained.

In this embodiment, an electronic device includes a memory for storing a program for supporting the processor to execute the multi-view ToF depth measurement denoising method described above, and a processor configured to execute the program stored in the memory.

In this embodiment, a computer readable storage medium stores a computer program, which when executed by a processor, performs the steps of the multi-view ToF depth measurement denoising method described above.

Claims

1. A multi-view ToF depth measurement denoising method combining RGB pictures is characterized by comprising the following steps:

step 1, obtaining N groups of RGB images and ToF phase measurement images (I) _n ，P _n I n=1, 2,..n }, where I _n Represents the nth RGB map, P _n Representing an nth ToF phase measurement plot;

Wherein (1)>

Wherein,,

representing the nth ToF phase measurement map P _n Sine measurement component of pixel point of ith column and jth row,/for the pixel point of the ith column and jth row>

As camera light:

in formula (1), x represents a ray

o _n ＝-t _n (2)

in the formulas (2) and (3), K represents a camera internal reference; r is R _n Camera pose E representing nth set of images _n A lower rotation matrix; t is t _n Camera pose E representing nth set of images _n The translation vector of the lower part of the frame, n=1, 2,. -%, N;

step 3, utilizing a hierarchical sampling method to extract rays from the rays

Upsampling a position points:

step 3.1, setting the sampling interval as [ x ] _near ，x _far ]And will [ x ] _near ，x _far ]Evenly dividing the space into A interval blocks; wherein x is _near Representing the sampling point and origin o _n X is the nearest distance of (x) _far Representing the sampling point and origin o _n Is the furthest distance from (a);

in the formula (2), the amino acid sequence of the compound,

representing compliance; u represents even distribution;

Step 4, constructing a multi-layer perceptron network

Input multi-layer perceptron network->

In the formulas (5) and (6),

representing the gradient;

Corresponding RGB values

ToF intensity value->

Camera light->

Intersection point of passing plane and origin o _n Distance of (2)

Camera light->

Plane normal vector at plane intersection point +.>

A compound of the formula (7), a compound of the formula (8),In the formula (9) and the formula (10), c _a Representing the a 3D coordinate point

Radiation value of b _a Represents the a 3D coordinate point +.>

Infrared intensity value, x _a Represents the a 3D coordinate point +.>

Normal vector, w _a Represents the a 3D coordinate point +.>

Is weighted and has:

w _a ＝T _a (1-exp(-σ _a δ _a )) (11)

in the formula (11), T _a Representing the 1 st 3D coordinate point

And the a 3D coordinate point +.>

And the a 3D coordinate point +.>

The distance between the two is obtained by a formula (13);

δ _a ＝|x _a+1 -x _a | (13)

in the formula (13), x _a+1 Representing the (a+1) th 3D coordinate point

From the origin o _n Is a distance of (2);

step 6, constructing camera light by using the formula (14) -formula (16)

Reflected light at the intersection of the planes:

in the formulae (14) to (16),<，>representing a vector included angle cosine value operator;

representing reflected rays +.>

Origin of>

Representing reflected rays +.>

Is a direction of (2);

step 7, obtaining the reflected light by using the formula (7) -formula (9)

Corresponding RGB values->

The distance from the intersection point of the intersection plane is +.>

And infrared intensity value->

Thereby calculating path MPI of multipath reflection using equation (17):

Loss function of the nth group of graphs +.>

Step 10, RGB map and ToF phase measurement map { I > based on N groups _n ，P _n I n=1, 2, & N, using gradient descent method for the multi-layer perceptron network

Training and calculating the loss function +.>

To update the network parameters until the loss function +.>

Converging to obtain trained multi-layer perceptron network->

2. An electronic device comprising a memory and a processor, wherein the memory is configured to store a program that supports the processor to perform the multi-view ToF depth measurement denoising method of claim 1, and the processor is configured to execute the program stored in the memory.

3. A computer readable storage medium having a computer program stored thereon, wherein the computer program when executed by a processor performs the steps of the multi-view ToF depth measurement denoising method according to claim 1.