DE102021109386A1

DE102021109386A1 - Method for correcting depth images of a time-of-flight camera

Info

Publication number: DE102021109386A1
Application number: DE102021109386.4A
Authority: DE
Inventors: Christopher Dietz; Katrin Bolay
Original assignee: IFM Electronic GmbH; PMDtechnologies AG
Current assignee: IFM Electronic GmbH; PMDtechnologies AG
Priority date: 2020-04-22
Filing date: 2021-04-14
Publication date: 2021-10-28

Abstract

1. Verfahren zur Korrektur von Tiefenbilder einer Lichtlaufzeitkamera, die eine Phasenverschiebung zwischen einem emittierten und empfangenen modulierten Lichts ermittelt,
wobei in wenigstens zwei Phasenmessungen das Licht mit unterschiedlichen Modulationsfrequenzen emittiert wird,
bei dem in einer unüberwachten Trainingsphase eines Modells maschinellen Lernens eine unüberwachte Gesamt-Konsistenz-Kostenfunktion optimiert wird. Wobei im Betrieb der Lichtlaufzeitkamera ein Tiefenbild anhand des trainierten Modells maschinellen Lernens ermittelt wird.

1. A method for correcting depth images of a time-of-flight camera that determines a phase shift between an emitted and received modulated light,
whereby the light is emitted with different modulation frequencies in at least two phase measurements,
in which an unsupervised overall consistency cost function is optimized in an unsupervised training phase of a machine learning model. Whereby, when the time-of-flight camera is in operation, a depth image is determined using the trained machine learning model.

Description

Mit Lichtlaufzeitkamera bzw. Lichtlaufzeitkamerasystem sollen hier insbesondere Systeme umfasst sein, die Entfernungen aus der Phasenverschiebung einer emittierten und empfangenen Strahlung gewinnen. Als Lichtlaufzeit- bzw. ToF-Kameras sind insbesondere PMD-Kameras mit Photomischdetektoren (PMD) geeignet, wie sie u.a. in der DE 197 04 496 A1 beschrieben sind.The time-of-flight camera or time-of-flight camera system is intended to include, in particular, systems that gain distances from the phase shift of an emitted and received radiation. PMD cameras with photonic mixer detectors (PMD) are particularly suitable as time-of-flight or ToF cameras, as they are, inter alia, in FIG DE 197 04 496 A1 are described.

Aus der US 2017/0262768 A1 ist ein TOF-System bekannt, das Entfernungen aus einer Phase und Amplitude eines emittierten und von einer Szene reflektierten und empfangen Lichts bestimmt. Zur Entfernungsbestimmung wird ein trainiertes ,machine learning' System herangezogen, das aus den Rohdaten des Sensors Entfernungswerte bestimmt. Das System wird anhand von simulierten Roh-TOF-Daten und simulierten ,multi path interference' Daten trainiert.From the US 2017/0262768 A1 a TOF system is known which determines distances from a phase and amplitude of light emitted and reflected and received by a scene. A trained 'machine learning' system is used to determine the distance, which determines distance values from the raw data of the sensor. The system is trained on the basis of simulated raw TOF data and simulated 'multi path interference' data.

Die DE 11 2011 104 644 T5 beschäftigt sich mit einem mobilen Roboter, insbesondere einem fahrerlosen Transportsystem, bei dem die Umgebung mit einer Kamera erfasst und an einem Cloud-Host gestreamt wird. Zur Erkennung von Gefahrensituationen ist ein überwachter Lernalgorithmus vorgesehen, der anhand echter Umgebungsbilder trainiert wird, so dass nach Abschluss des Trainings ein Gefahr-Bild-Klassen-Modell vorliegt, das ggf. auf eine komplette Flotte mobiler Roboter übertragen werden kann. Der hierzu vorgesehene Klassifikator ist ein Maschinen-Lernalgorithmus, der typischerweise einen iterativen Trainings-Algorithmus z.B. zur Optimierung einer Kostenfunktion einsetzt.the DE 11 2011 104 644 T5 deals with a mobile robot, in particular a driverless transport system, in which the environment is captured with a camera and streamed on a cloud host. To recognize dangerous situations, a monitored learning algorithm is provided, which is trained on the basis of real images of the surroundings, so that after the training is completed, a danger-image-class model is available that can be transferred to a complete fleet of mobile robots if necessary. The classifier provided for this purpose is a machine learning algorithm which typically uses an iterative training algorithm, for example to optimize a cost function.

Aufgabe der Erfindung ist es, die Entfernungsmessung eines Lichtlaufzeitkamerasystems zu verbessern.The object of the invention is to improve the distance measurement of a time-of-flight camera system.

Die Aufgabe wird in vorteilhafter Weise durch das erfindungsgemäße Verfahren zur Korrektur von Tiefenbildern einer Lichtlaufzeitkamera gelöst.The object is achieved in an advantageous manner by the method according to the invention for correcting depth images of a time-of-flight camera.

Erfindungsgemäß ist ein Verfahren vorgesehen, zur Korrektur von Tiefenbilder einer Lichtlaufzeitkamera, die eine Phasenverschiebung zwischen einem emittierten und empfangenen modulierten Lichts ermittelt,
wobei in wenigstens zwei Phasenmessungen das Licht mit unterschiedlichen Modulationsfrequenzen emittiert wird,
bei dem in einer unüberwachten Trainingsphase folgende Schritte durchgeführt werden:

- Optimierung einer unüberwachten Gesamt-Konsistenz-Kostenfunktion durch,
1. a) Ermittlung einer unüberwachten trigonometrischen Konsistenz-Kostenfunktion durch Herstellung einer Konsistenz der Rohdaten hinsichtlich trigonometrischen Eigenschaften (g₁ K_trig),
2. b) Ermittlung einer unüberwachten Distanz-Konsistenz-Kostenfunktion durch Herstellung einer Konsistenz der Rohdaten hinsichtlich Distanzen (g₂ K_d) zu wenigstens zwei unterschiedlichen Modulationsfrequenzen,
3. c) Einmalige Bildung einer Maske ausgehend von den in Schritt a) und b) ermittelten unüberwachten trigonometrischen und Distanz-Konsistenz-Kostenfunktion, um bereits konsistente Bildbereiche zu identifizieren,
4. d) Ermittlung einer unüberwachten Abweichungs-Konsistenz-Kostenfunktion (g₃ K_abs) ausgehend von der Maske gemäß Schritt c)
5. e) Optimierung von Gewichten und Vorspannungen eines Modells maschinellen Lernens anhand der aus den Schritten a) bis d) gebildeten unüberwachten Gesamt-Konsistenz-Kostenfunktionen,

wobei im Betrieb der Lichtlaufzeitkamera ein Tiefenbild anhand des trainierten Modells maschinellen Lernens ermittelt wird (Inferenz).According to the invention, a method is provided for correcting depth images of a time-of-flight camera that determines a phase shift between an emitted and received modulated light,
whereby the light is emitted with different modulation frequencies in at least two phase measurements,
in which the following steps are carried out in an unsupervised training phase:

- Optimization of an unsupervised overall consistency cost function through,
1. a) Determination of an unsupervised trigonometric consistency-cost function by establishing a consistency of the raw data with regard to trigonometric properties (g ₁ K _trig ),
2. b) Determination of an unsupervised distance-consistency-cost function by establishing a consistency of the raw data with regard to distances (g ₂ K _d ) for at least two different modulation frequencies,
3. c) One-time creation of a mask based on the unsupervised trigonometric and distance-consistency-cost functions determined in steps a) and b) in order to identify already consistent image areas,
4. d) Determination of an unsupervised deviation-consistency-cost function (g ₃ K _abs ) based on the mask according to step c)
5. e) Optimization of weights and biases of a machine learning model on the basis of the unsupervised overall consistency cost functions formed from steps a) to d),

wherein during operation of the time-of-flight camera, a depth image is determined on the basis of the trained machine learning model (inference).

Die 1 zeigt schematisch das erfindungsgemäße Vorgehen. Die Rohbilder einer ToF-Kamera werden im aktuellen Stand der Technik mit klassischen Algorithmen zu Tiefenkarten berechnet. Diese Tiefenkarten beinhalten Artefakte wie zum Beispiel Rauschen oder Mehrweginterferenzen / Multi Path Interference MPI. Die Erfindung verwendet stattdessen ein Modell maschinellen Lernens um Artefakte zu korrigieren. Dabei wird das Modell unüberwacht anhand von Trainingsdaten ohne wahre Tiefenkarte trainiert und erzeugt so fehlerfreie Tiefenkarten. Die Erzeugung einer großen Datenmenge von wahren Tiefenkarten („Ground Truth“) bedeutet enormen Aufwand. Zum Beispiel durch rechenaufwändige Simulationen der Daten oder die Hinzunahme eines hochgenauen Messgeräts mit langer Aufnahmezeit. Beide Methoden geben keine perfekten wahren Tiefenkarten, sondern beinhalten wieder ihre eigenen Artefakte. Dazu gehört zum Beispiel die Vereinfachung von Berechnungen in der Simulation (Domänenlücke zwischen simulierten und experimentellen Daten) oder verschiedene Blickwinkel von dem hochgenauen Messgerät und der Lichtlaufzeitkamera (Bereiche werden verdeckt oder fehlen). the 1 shows schematically the procedure according to the invention. In the current state of the art, the raw images from a ToF camera are calculated using classic algorithms to create depth maps. These depth maps contain artifacts such as noise or multipath interference / Multi Path Interference MPI. Instead, the invention uses a machine learning model to correct artifacts. The model is trained unsupervised on the basis of training data without a true depth map and thus generates error-free depth maps. The generation of a large amount of data from true depth maps (“ground truth”) means enormous effort. For example, through computationally intensive simulations of the data or the addition of a high-precision measuring device with a long recording time. Both methods do not give perfect trues Depth maps, but again contain their own artifacts. This includes, for example, the simplification of calculations in the simulation (domain gap between simulated and experimental data) or different perspectives from the high-precision measuring device and the time-of-flight camera (areas are covered or missing).

Dieses enorme technische Problem wird durch die Erfindung umgangen in dem es nicht mehr nötig ist, wahre Tiefenkarten in ausreichender Menge zum Training von Modellen maschinellen Lernens zu erzeugen.This enormous technical problem is circumvented by the invention in that it is no longer necessary to generate true depth maps in sufficient quantities for training machine learning models.

Unüberwachtes maschinelles Lernen benötigt im Gegensatz zu überwachtem maschinellen Lernen keine wahren Annotationen (In diesem Fall: keine wahre Tiefenkarte). Daher ist die Datenakquise bedeutend vereinfacht.In contrast to monitored machine learning, unsupervised machine learning does not require any true annotations (in this case: no true depth map). Therefore, the data acquisition is significantly simplified.

Die Rohbilder einer ToF-Kamera sind Messungen phasenverschobener periodischer Funktionen. Die Phase ist proportional zur Tiefe d und kann für jeden Bildpunkt (Pixel) bestimmt werden. Üblicherweise sind die Rohdaten sinusförmige Signale, welche als Real- (Re) und Imaginärteil (Im) bezeichnet werden. Dadurch ergibt sich die Phase zu: $ϕ = atan2 (- R e, I m)$

The raw images of a ToF camera are measurements of phase-shifted periodic functions. The phase is proportional to the depth d and can be determined for each image point (pixel). The raw data are usually sinusoidal signals, which are referred to as real (Re) and imaginary (Im) parts. This results in the phase:

ϕ = atan2 (- R. e, I. m)

Die Tiefe ist somit: $d = \frac{u r}{2 π} ϕ$

The depth is thus:

d = \frac{u r}{2 π} ϕ

Der Eindeutigkeitsbereich (ur) einer ToF-Kamera ist durch die Modulationsfrequenz (f) bestimmt. $u r = \frac{c}{2 f}$

The uniqueness range (ur) of a ToF camera is determined by the modulation frequency (f).

u r = \frac{c}{2 f}

Die Lichtgeschwindigkeit ist mit c bezeichnet.The speed of light is denoted by c.

Die Tiefendaten werden durch mehrere Messfehler verfälscht. Besonders hervorzuheben ist die Korrektur von Rauschfehlern, Mehrwegeinterferenzfehlern, Bewegungsartefakten und Phasenzuordnungsfehlern.

a) Zur Rauschkorrektur werden häufig räumliche Filter (Bilateral, Anisotrope Diffusion, ...) und/oder zeitliche Filter (Kalman, ...) angewendet.
b) Mehrwegeinterferenzfehler werden durch mehrfache Reflektionen in der Szene hervorgerufen. Dadurch ist der Lichtpfad länger als der direkte Weg und erhöht somit die Distanz auf dem Pixel. Weiterhin ist dieser Fehler in höchstem Maß szenenabhängig. Zum derzeitigen Stand ist kein klassischer Algorithmus bekannt welcher diesen Fehler in Echtzeit korrigiert.
c) Bewegungsartefakte entstehen dadurch, dass Real- und Imaginärteil nacheinander aufgenommen werden und daher zeitversetzt zueinander sind. Bei schnellen Bewegungen von Objekten in der Szene oder der Kamera treten daher Distanzfehler auf.
d) Phasenzuordnungsfehler treten bei der Zuordnung von Phasen mehrerer Frequenzen auf. Wird ein Pixel einer falschen Phase zugeordnet, so treten große Abweichungen in etwa von einem Eindeutigkeitsbereich ur auf. Üblicherweise werden Phasen mittels des chinesischen Restwerttheorems zugeordnet, bei hohem Rauschen gibt es allerdings zunehmend Zuordnungsfehler. Des Weiteren ist die maximale Reichweite der Kamera durch das kleinste gemeinsame Vielfache der Eindeutigkeitsbereiche der einzelnen Frequenzen begrenzt.

The depth data is falsified by several measurement errors. Particularly noteworthy is the correction of noise errors, multipath interference errors, movement artifacts and phase assignment errors.

a) Spatial filters (bilateral, anisotropic diffusion, ...) and / or temporal filters (Kalman, ...) are often used for noise correction.
b) Multipath interference errors are caused by multiple reflections in the scene. As a result, the light path is longer than the direct path and thus increases the distance on the pixel. Furthermore, this error is highly dependent on the scene. At the present time, there is no known conventional algorithm that corrects this error in real time.
c) Movement artifacts arise from the fact that the real and imaginary parts are recorded one after the other and are therefore time-shifted to one another. With fast movements of objects in the scene or the camera, distance errors therefore occur.
d) Phase assignment errors occur when phases are assigned to several frequencies. If a pixel is assigned to an incorrect phase, large deviations occur approximately from a uniqueness range ur. Usually phases are assigned using the Chinese residual value theorem, but with high noise there are increasing assignment errors. Furthermore, the maximum range of the camera is limited by the smallest common multiple of the uniqueness ranges of the individual frequencies.

Modelle basierend auf maschinellem Lernen können die genannten Fehler in Echtzeit korrigieren. Die größte Herausforderung ist jedoch präzise Trainingsdaten für das Training der Modelle zu gewinnen. Dies wird durch das unüberwachte Lernen bedeutend vereinfacht.Models based on machine learning can correct these errors in real time. The greatest challenge, however, is to obtain precise training data for training the models. This is made much easier by unsupervised learning.

Bei der Erfindung handelt es sich um den unüberwachten Trainingsvorgang eines Modells maschinellen Lernens mit folgenden Eigenschaften:

• Im Trainingsvorgang werden spezielle unüberwachte Kostenfunktionen optimiert, welche Konsistenzen in den Tiefendaten und Rohdaten sowie korrekte Tiefenwerte bei mindestens zwei Modulationsfrequenzen sicherstellen.
• Eine Konsistenz ist die Erhaltung trigonometrischer Eigenschaften (Additionstheoreme) der Rohdaten.
• Eine weitere Konsistenz besteht darin, dass Distanzen bei Verwendung unterschiedlicher Modulationsfrequenzen gleichbleiben müssen.
• Tiefenwerte der klassischen Tiefenberechnung, welche bereits konsistent sind (Weil dort z.B. keine Artefakte auftreten) werden ebenfalls genutzt, um zu große Abweichungen des Modells von der gemessenen Tiefe zu verhindern. Dabei wird eine Maske von bereits konsistenten Pixeln anhand eines Schwellwertes der Konsistenz-Kostenfunktionen einmalig für jeden Datensatz bestimmt. Diese Maske beinhaltet also Bildpunkte, die ein gutes Signal zu Rausch Verhältnis haben und auch keine MPI Artefakte enthalten, da sonst die trigonometrische Konsistenz und die Distanz Konsistenz verletzt werden würden. Da allein die Konsistenzen noch nicht zu absolut korrekten Distanzwerten führen, werden im Training die absoluten Abweichungen der Distanzen auf den maskierten Bildpunkten minimiert.
• Die Kombination aus allen drei unüberwacht definierten Kostenfunktionen wird Gesamt-Konsistenz-Kostenfunktion genannt.

The invention is the unsupervised training process of a machine learning model with the following properties:

• In the training process, special unsupervised cost functions are optimized, which ensure consistencies in the depth data and raw data as well as correct depth values for at least two modulation frequencies.
• Consistency is the preservation of trigonometric properties (addition theorems) of the raw data.
• Another consistency is that distances must remain the same when using different modulation frequencies.
• Depth values of the classic depth calculation, which are already consistent (because, for example, no artifacts occur there) are also used to prevent the model from deviating too much from the measured depth. In this case, a mask of already consistent pixels is determined once for each data record on the basis of a threshold value of the consistency cost functions. This mask therefore contains pixels that have a good signal-to-noise ratio and also do not contain any MPI artifacts, since otherwise the trigonometric consistency and the distance consistency would be violated. Since the consistencies alone do not lead to absolutely correct distance values, the absolute deviations of the distances on the masked image points are minimized during training.
• The combination of all three unsupervised defined cost functions is called the overall consistency cost function.

Im Training wird die Güte der fehlerkorrigierten Tiefenkarte t durch die differenzierbare und unüberwachte Gesamt-Konsistenz-Kostenfunktionen berechnet und die Gewichte und Vorspannungen des Modells maschinellen Lernens im Fehlerrückführungsverfahren angepasst.During training, the quality of the error-corrected depth map t is calculated using the differentiable and unsupervised total consistency cost functions and the weights and biases of the machine learning model are adjusted in the error feedback process.

Hierbei wird eine unüberwachte Gesamt-Konsistenz-Kostenfunktion aus einer Zusammenfassung einer unüberwachten trigonometrischen, Tiefen- und Abweichungs-Kostenfunktion ermittelt.Here, an unsupervised overall consistency cost function is determined from a summary of an unsupervised trigonometric, depth and deviation cost function.

Die unüberwachte trigonometrische Konsistenz-Kostenfunktion gilt für mindestens ein Frequenzpaar f, f̃ und wird für alle Pixel j = 1, ..., N berechnet: $\begin{array}{l} K_{t r i g} = \frac{1}{N} \sum_{f \neq \tilde{f}} \sum_{j} | A d d R e {(R e_{f, j}, I m_{f, j})}^{n} - A d d R e {(R e_{\tilde{f}, j}, I m_{\tilde{f}, j})}^{m} | \\ + | A d d I m {(R e_{f, j}, I m_{f, j})}^{n} - A d d I m {(R e_{\tilde{f}, j}, I m_{\tilde{f}, j})}^{m} | \end{array}$

The unsupervised trigonometric consistency cost function applies to at least one frequency pair f, f̃ and is calculated for all pixels j = 1, ..., N:

\begin{array}{l} K_{t r i G} = \frac{1}{N} \sum_{f \neq \tilde{f}} \sum_{j} | A. d d R. e {(R. e_{f, j}, I. m_{f, j})}^{n} - A. d d R. e {(R. e_{\tilde{f}, j}, I. m_{\tilde{f}, j})}^{m} | \\ + | A. d d I. m {(R. e_{f, j}, I. m_{f, j})}^{n} - A. d d I. m {(R. e_{\tilde{f}, j}, I. m_{\tilde{f}, j})}^{m} | \end{array}

AddRe und Addlm bezeichnet die folgenden Additionstheoreme: $A d d R e (R e, I m) = 2 I m R e$

A d d I m (R e, I m) = I m^{2} - R e^{2}

AddRe and Addlm denote the following addition theorems:

A. d d R. e (R. e, I. m) = 2 I. m R. e

A. d d I. m (R. e, I. m) = I. m^{2} - R. e^{2}

Die Additionen werden n mal (bzw. m mal) ausgeführt sodass folgender Zusammenhang für das Frequenzpaar f, f̃ gelten muss: $n f - m \tilde{f} = 0$

The additions are carried out n times (or m times) so that the following relationship must apply to the frequency pair f, f̃:

n f - m \tilde{f} = 0

Die unüberwachte Tiefen-Konsistenz-Kostenfunktion für mindestens zwei Frequenzen lautet: $K_{d} = \frac{I}{N} \sum_{f \neq \tilde{f}} \sum_{j} | t_{f, j} - t_{\tilde{f}, j} |$

The unsupervised depth consistency cost function for at least two frequencies is:

K_{d} = \frac{I.}{N} \sum_{f \neq \tilde{f}} \sum_{j} | t_{f, j} - t_{\tilde{f}, j} |

Ausgehend von der unüberwachten trigonometrischen und Tiefen -Konsistenz-Kostenfunktion kann eine Maske M und hiermit eine Abweichung-Konsistenz-Kostenfunktion bestimmt werden. Für jeden Pixel j wird daher einmalig für jede Messung der Lichtlaufzeitkamera die trigonometrische Konsistenz und die Tiefen-Konsistenz berechnet. Anhand eines Schwellwertes S lassen sich solche Pixel bestimmen, die im unverarbeiteten Bild bereits fehlerfrei sind. $M_{j} = (K_{t r i g, j} + K_{d, j}) \leq S$

On the basis of the unmonitored trigonometric and depth-consistency cost function, a mask M and with it a deviation-consistency cost function can be determined. The trigonometric consistency and the depth consistency are therefore calculated once for each pixel j for each measurement by the time-of-flight camera. A threshold value S can be used to determine those pixels which are already free of errors in the unprocessed image.

{M.}_{j} = (K_{t r i G, j} + K_{d, j}) \leq S.

Die Sammlung der fehlerfreien Pixel M_j bildet die konsistente Maske M.The collection of the error-free pixels M _j forms the consistent mask M.

Die pixelweise absolute Abweichung auf einer konsistenten Maske M wird als Abweichungs-Kostenfunktion bezeichnet und sorgt für korrekte Tiefenwerte: $K_{a b s} = \frac{I}{N_{M}} \sum_{i} | T_{i} - t_{i} |$

The pixel-by-pixel absolute deviation on a consistent mask M is referred to as the deviation cost function and ensures correct depth values:

K_{a b s} = \frac{I.}{N_{M.}} \sum_{i} | T_{i} - t_{i} |

Dabei bezeichnet T_i die ursprünglichen (als fehlerfrei anzunehmenden) Tiefenwerte auf der Maske M für jeden maskierten Pixel i = 1, ..., N_m. Damit ist auch diese Kostenfunktion unüberwacht definiert.T _i denotes the original depth values (which can be assumed to be error-free) on the mask M for each masked pixel i = 1,..., N _m . This means that this cost function is also defined in an unsupervised manner.

Insgesamt wird eine unüberwachte Gesamt-Konsistenz-Kostenfunktion mit Gewichten g₁ und g₂ minimiert: $K_{g e s} = K_{a b s} + g_{1} K_{t r i g} + g_{2} K_{d}$

Overall, an unsupervised overall consistency cost function with weights g ₁ and g _{2 is} minimized:

K_{G e s} = K_{a b s} + G_{1} K_{t r i G} + G_{2} K_{d}

Somit ist die Trainingsroutine vollständig unüberwacht definiert: Für keine der Kostenfunktionen wird eine wahre Tiefenkarte benötigt.The training routine is thus completely unsupervised: a true depth map is not required for any of the cost functions.

ZITATE ENTHALTEN IN DER BESCHREIBUNGQUOTES INCLUDED IN THE DESCRIPTION

Diese Liste der vom Anmelder aufgeführten Dokumente wurde automatisiert erzeugt und ist ausschließlich zur besseren Information des Lesers aufgenommen. Die Liste ist nicht Bestandteil der deutschen Patent- bzw. Gebrauchsmusteranmeldung. Das DPMA übernimmt keinerlei Haftung für etwaige Fehler oder Auslassungen.This list of the documents listed by the applicant was generated automatically and is included solely for the better information of the reader. The list is not part of the German patent or utility model application. The DPMA assumes no liability for any errors or omissions.

Zitierte PatentliteraturPatent literature cited

DE 19704496 A1 [0001]
US 2017/0262768 A1 [0002]
DE 112011104644 T5 [0003]

Claims

Method for correcting depth images of a time-of-flight camera that determines a phase shift between an emitted and received modulated light, with the light being emitted with different modulation frequencies in at least two phase measurements, in which the following steps are carried out in an unsupervised training phase without a true depth map: - Optimization of a unsupervised overall consistency cost function by, a) determining an unsupervised trigonometric consistency cost function by establishing a consistency of the raw data with regard to trigonometric properties (g ₁ K _trig ), b) determining an unsupervised distance consistency cost function by establishing a consistency of the raw data with regard to Distances (g ₂ K _d ) to at least two different modulation frequencies, c) One-time formation of a mask based on the unsupervised trigonometric and distance-consistency cos determined in step a) and b) ten function to identify already consistent image areas, d) determination of an unsupervised deviation-consistency-cost function (g ₃ K _abs ) based on the mask according to step c) e) optimization of weights and biases of a machine learning model based on the steps a ) to d) formed, unsupervised overall consistency cost function, with a depth image being determined on the basis of the trained machine learning model during operation of the time-of-flight camera.

Procedure according to Claim 1 , in which the quality of the error-corrected depth map is calculated using a differentiable cost function and the weights and biases of the machine learning model are adjusted in the error feedback method.

Method according to one of the preceding claims, in which the unsupervised trigonometric consistency cost function for at least one frequency pair f, f̃ and is calculated for all pixels j = 1, ..., N from:

\begin{array}{l} K_{t r i G} = \frac{1}{N} \sum_{f \neq \tilde{f}} \sum_{j} | A. d d R. e {(R. e_{f, j}, I. m_{f, j})}^{n} - A. d d R. e {(R. e_{\tilde{f}, j}, I. m_{\tilde{f}, j})}^{m} | \\ + | A. d d I. m {(R. e_{f, j}, I. m_{f, j})}^{n} - A. d d I. m {(R. e_{\tilde{f}, j}, I. m_{\tilde{f}, j})}^{m} | \end{array}

Method according to one of the preceding claims, in which the depth consistency cost function for the at least two frequencies of the function

K_{d} = \frac{1}{N} \sum_{f \neq \tilde{f}} \sum_{j} | t_{f, j} - t_{\tilde{f}, j} |

enough.

Method according to one of the preceding claims, in which, proceeding from the unsupervised trigonometric and depth consistency cost function, a mask M and, with the mask M, a deviation consistency cost function under the specification

K_{a b s} = \frac{1}{N_{M.}} \sum_{i} | T_{i} - t_{i} |

is determined.

Method according to one of the preceding claims, in which a mask of consistent image points M using a threshold value S based on the cost functions in Claims 3 and 4th is calculated.

{M.}_{j} = (K_{t r i G, j} + K_{d, j}) \leq S.

Method according to one of the preceding claims, in which a mask of consistent image points M using a threshold value S based on the cost functions in Claim 3 is calculated.

{M.}_{j} = K_{t r i G, j} \leq S.

Method according to one of the preceding claims, in which a mask of consistent image points M using a threshold value S based on the cost functions in Claim 4 is calculated.

{M.}_{j} = K_{d, j} \leq S.