DE102016218851A1

DE102016218851A1 - Compaction of an optical flow field for the detection of objects from images of a camera

Info

Publication number: DE102016218851A1
Application number: DE102016218851.8A
Authority: DE
Inventors: Michael Walter
Original assignee: Conti Temic Microelectronic GmbH
Current assignee: Conti Temic Microelectronic GmbH
Priority date: 2016-09-29
Filing date: 2016-09-29
Publication date: 2018-03-29
Also published as: DE112017003463A5; WO2018059629A1

Abstract

Die Erfindung betrifft ein Verfahren und eine Vorrichtung zur Erkennung von Objekten aus Bildern einer Kamera und kann insbesondere bei kamerabasierten Fahrerassistenzsystemen verwendet werden. Das Verfahren zur Detektion von Objekten aus einer Folge von Bildern einer Fahrzeugkamera umfasst die Schritte: a) Aufnahme einer Folge von Bildern mit der Fahrzeugkamera, b) Ermittlung von korrespondierenden Merkmalen in zwei aufeinander folgenden Bildern, d) Zuordnung ermittelter benachbarter korrespondierender Merkmale in einem Bildbereich zu einer Ebene im Raum, und f) Bestimmung zusätzlicher korrespondierender Merkmale in dem Bildbereich unter Berücksichtigung der zugeordneten Ebene. Die Erfindung bietet den Vorteil einer Verdichtung des optischen Flussfeldes.The invention relates to a method and a device for detecting objects from images of a camera and can be used in particular in camera-based driver assistance systems. The method for detecting objects from a sequence of images of a vehicle camera comprises the steps of: a) taking a sequence of images with the vehicle camera, b) determining corresponding features in two successive images, d) assigning determined neighboring corresponding features in an image region to a plane in space, and f) determining additional corresponding features in the image area taking into account the associated plane. The invention offers the advantage of densifying the optical flow field.

Description

Die Erfindung betrifft ein Verfahren zur Erkennung von Objekten aus Bildern einer Kamera und kann insbesondere bei kamerabasierten Fahrerassistenzsystemen verwendet werden. The invention relates to a method for detecting objects from images of a camera and can be used in particular in camera-based driver assistance systems.

Fahrzeugerkennungssysteme nach dem aktuellen Stand der Technik sind meist klassifikationsbasiert. Klassifikationsbasierte Systeme können Fahrzeuge bzw. Fahrzeugkomponenten wiedererkennen, die sie in ihren Trainingsdaten gesehen haben. Neue Fahrzeugdesigns, sowie sich verändernde Aufbauten können jedoch zu einer stark reduzierten System Performance führen und fordern generische Ansätze zur Objekterkennung. Vehicle recognition systems according to the current state of the art are mostly classification-based. Classification-based systems can recognize vehicles or vehicle components that they have seen in their training data. However, new vehicle designs, as well as changing structures can lead to a greatly reduced system performance and require generic approaches to object recognition.

EP 2 993 654 A1 zeigt ein Verfahren zur Frontkollisionswarnung (FCW) aus Kamerabildern. Hierbei wird ein Bildausschnitt analysiert, wohin das eigene Fahrzeug in einem vorgegebenen Zeitintervall gelangen wird. Sofern dort ein Objekt erkannt wird, wird eine Kollisionswarnung ausgegeben. EP 2 993 654 A1 shows a method for front collision warning (FCW) from camera images. Here an image section is analyzed, where the own vehicle will arrive within a given time interval. If an object is detected there, a collision warning is issued.

US 2014/0161323 A1 zeigt ein Verfahren zur Erzeugung dichter dreidimensionaler Strukturen in einer Straßenumgebung aus Bildern, die mit einer Monokamera aufgenommen werden. US 2014/0161323 A1 shows a method of creating dense three-dimensional structures in a road environment from images taken with a monocamera.

Es ist eine Aufgabe der vorliegenden Erfindung, ein verbessertes Verfahren zur Detektion von Objekten anzugeben. It is an object of the present invention to provide an improved method for detecting objects.

Ein Ausgangspunkt der Erfindung sind die folgenden Überlegungen: Sind die Kamerapositionen zweier Frames (Einzelbilder) bekannt, lassen sich Punkt-Korrespondenzen (korrespondierende Merkmalspunkte) triangulieren, aber es werden keine Objekte generiert, da die Triangulation über kein Modelwissen verfügt, das eine Punktewolke in sinnvolle Objekten clustern könnte. A starting point of the invention are the following considerations: If the camera positions of two frames (individual images) are known, point correspondences (corresponding feature points) can be triangulated, but no objects are generated since triangulation has no model knowledge, which makes a point cloud meaningful Could clusters objects.

Nachteile monokularer Systeme sind, dass Objekte nahe dem Epipol nur ungenau trianguliert werden können und sich dort kleinste Fehler in der Egomotion (Kamera-Eigenbewegung) bemerkbar machen. Als Epipol bezeichnet man den Bildpunkt in einem ersten Kamerabild, an dem das Zentrum der Kamera zu einem zweiten Zeitpunkt abgebildet wird. Während einer Geradeausfahrt entspricht z.B. der Fluchtpunkt dem Epipol. Dies ist jedoch der relevante Bereich, um Kollisionen mit stehenden bzw. vorrausfahrenden Fahrzeugen zu erkennen. Dynamische Objekte können trianguliert werden, wenn sie sich gemäß der Epipolar-Geometrie bewegen. Sie werden jedoch aufgrund der nicht bekannten Relativgeschwindigkeit zu nah oder zu weit entfernt geschätzt. Disadvantages of monocular systems are that objects close to the epipole can only be inaccurately triangulated and there are the smallest errors in the egomotion (camera self-motion) noticeable. Epipole refers to the pixel in a first camera image, where the center of the camera is displayed at a second time. During straight-ahead driving, e.g. the vanishing point of the epipole. However, this is the relevant area to detect collisions with stationary or forward vehicles. Dynamic objects can be triangulated as they move according to the epipolar geometry. However, they are estimated too close or too far away due to the unknown relative speed.

Werden anstelle einzelner Korrespondenzen, mehrere (benachbarte) Korrespondenzen (korrespondierende Merkmale) betrachtet, lassen sich Objekte aufgrund unterschiedlicher Geschwindigkeiten, Skalierungen und Deformation segmentieren. If, instead of individual correspondences, several (neighboring) correspondences (corresponding features) are considered, objects can be segmented due to different velocities, scaling and deformation.

Ein erfindungsgemäßes Verfahren zur Detektion von Objekten aus einer Folge von Bildern einer Fahrzeugkamera umfasst die Schritte:

a) Aufnahme einer Folge von Bildern mit der Fahrzeugkamera,
b) Ermittlung von korrespondierenden Merkmalen in zwei aufeinander folgenden Bildern,
d) Zuordnung (benachbarter) ermittelter korrespondierender Merkmale in einem Bildbereich zu einer Ebene im Raum, und
f) Bestimmung zusätzlicher korrespondierender Merkmale in dem Bildbereich unter Berücksichtigung der (in Schritt d)) zugeordneten Ebene.

An inventive method for the detection of objects from a sequence of images of a vehicle camera comprises the steps:

a) taking a series of images with the vehicle camera,
b) determination of corresponding features in two consecutive images,
d) assignment of (adjacent) determined corresponding features in an image area to a plane in space, and
f) determining additional corresponding features in the image area taking into account the level associated with (in step d)).

Bevorzugt ist die Fahrzeugkamera zur Aufnahme einer Umgebung eines Fahrzeugs ausgebildet. Bei der Umgebung handelt es sich insbesondere um die vor dem Fahrzeug liegende Umgebung. Vorzugsweise ist die Fahrzeugkamera in eine Fahrerassistenzvorrichtung integrierbar oder mit dieser verbindbar, wobei die Fahrerassistenzvorrichtung insbesondere zur Objekterkennung aus den von der Fahrzeugkameravorrichtung bereitgestellten Bilddaten ausgebildet ist. Bevorzugt ist die Fahrzeugkameravorrichtung eine im Innenraum des Kraftfahrzeugs hinter der Windschutzscheibe anzuordnende und in Fahrtrichtung gerichtete Kamera. Besonders bevorzugt ist die Fahrzeugkamera eine monokulare Kamera. Preferably, the vehicle camera is designed to receive an environment of a vehicle. The environment is in particular the environment in front of the vehicle. Preferably, the vehicle camera can be integrated in or connected to a driver assistance device, wherein the driver assistance device is designed in particular for object recognition from the image data provided by the vehicle camera device. Preferably, the vehicle camera device is a camera to be arranged in the interior of the motor vehicle behind the windshield and directed in the direction of travel. Particularly preferably, the vehicle camera is a monocular camera.

Bevorzugt werden mit der Fahrzeugkamera zu bestimmten bzw. bekannten Zeitpunkten Einzelbilder aufgenommen, woraus sich eine Folge von Bildern ergibt. Preferably, individual images are taken with the vehicle camera at specific or known times, resulting in a sequence of images.

Als Korrespondenz wird die Entsprechung eines Merkmals in einem ersten Bild zu demselben Merkmal in einem zweiten Bild bezeichnet. Korrespondierende Merkmale in zwei Bildern können auch als Flussvektor beschrieben werden, der angibt wie sich das Merkmal im Bild verschoben hat. Ein Merkmal kann insbesondere ein Bildausschnitt (bzw. Patch), ein Pixel, eine Kante oder eine Ecke sein. Unter Schritt d) wird auch subsummiert, dass eine Mehrzahl von Ebenen im Raum vorgegeben wird, und eine Zuordnung von (benachbarten) korrespondierenden Merkmalen zu jeweils einer der vorgegebenen Eben vorgenommen wird (vgl. unten Schritt d2) bzw. d3)). Correspondence is the equivalent of a feature in a first image to the same feature in a second image. Corresponding features in two images can also be used as a flow vector describe how the feature has moved in the image. In particular, a feature may be an image patch, a pixel, an edge or a corner. In step d), it is also subsumed that a plurality of levels is specified in space, and an assignment of (adjacent) corresponding features to one of the given levels is carried out (see step d2 below or d3)).

Der Begriff „Ebene“ beschreibt im Kontext der vorliegenden Erfindung folgende Zusammenhänge: einerseits ein Kriterium zur Akkumulation benachbarter korrespondierender Merkmale. D.h. diese werden als zusammengehörig angesehen, wenn sie in einer gemeinsamen Ebene im Raum liegen und sich entsprechend der Bewegung der Ebene zeitlich entwickeln. Derart akkumulierte korrespondierende Merkmale werden anschließend als z.B. „Bodenebene“ bezeichnet, da sie alle in der Ebene, die der Fahrbahnebene entspricht liegen. Jedoch erstreckt sich eine solche Bodenebene nicht ins Unendliche, sondern meint einen Teilbereich der Ebene, nämlich den, in dem tatsächlich korrespondierende Merkmale angeordnet sind. The term "level" in the context of the present invention describes the following relationships: on the one hand a criterion for the accumulation of adjacent corresponding features. That these are considered to belong together if they lie in a common plane in space and develop in time according to the movement of the plane. Such accumulated corresponding features are then used as e.g. "Ground level" refers to as they all lie in the plane that corresponds to the road level. However, such a ground plane does not extend to infinity, but rather means a subarea of the plane, namely the one in which corresponding features are actually arranged.

In Schritt f) meint die Formulierung „unter Berücksichtigung…“, dass die in Schritt d) korrespondierenden Merkmalen in einem Bildbereich zugeordnete Ebene bei der Bestimung zusätzlicher korrespondierender Merkmale berücksichtigt wird. Dies kann beispielsweise in der Art geschehen, dass anhand von wenigen korrespondierenden Merkmalen in einem Bildbereich eine Bewegung der zugeordneten Ebene bestimmt bzw. berechnet wird und dann prädiziert wird, wo Merkmale desselben Bildbereichs in einem ersten Bild infolge der Bewegung der zugeordnete Ebene in einem (nachfolgenden) zweiten Bild aufzufinden sein werden. Mit Hilfe dieser Prädiktion werden zusätzliche korrespondierende Merkmale ermittelt. In step f), the wording "taking into account ..." means that the level associated with a feature in step (d) is taken into account in the determination of additional corresponding features. This can be done, for example, in such a way that a movement of the assigned plane is determined or calculated on the basis of a few corresponding features in an image area and then predicted where features of the same image area in a first image as a result of the movement of the associated plane in a (subsequent ) second picture will be found. With the help of this prediction, additional corresponding features are determined.

Mit der Formulierung „Detektion von Objekten“ kann also beispielsweise eine Generierung von Objekthypothesen bzw. Objekten gemeint sein. The term "detection of objects" can thus mean, for example, a generation of object hypotheses or objects.

Gemäß einer bevorzugten Ausführungsform umfasst das Verfahren den Schritt:

c) Berechnung von Homographien für die ermittelten korrespondierende Merkmale in einem Bildbereich, damit diese einer Ebene im Raum zugeordnet werden können. Eine Homographie beschreibt die Korrespondenz von Punkten auf einer Ebene zwischen zwei Kamerapositionen bzw. die Korrespondenz zweier Punkte in zwei aufeinanderfolgenden Bildern der Fahrzeugkamera. Durch die Berechnung von Homographien für die ermittelten korrespondierenden Merkmale in einem Bildbereich kann so die Zuordnung zu jeweils einer Ebene im Raum erfolgen (s. Schritt d)).

According to a preferred embodiment, the method comprises the step:

c) Calculation of homographies for the determined corresponding features in an image area so that they can be assigned to a level in the room. A homography describes the correspondence of points on a plane between two camera positions or the correspondence of two points in two consecutive images of the vehicle camera. By calculating homographies for the determined corresponding features in an image area, the assignment to one level in space can thus take place (see step d)).

Die Homographien werden bevorzugt generisch aus dem Bild bzw. aus aufeinanderfolgenden Bildern bestimmt. Eine Vorgabe von einem Abstand einer Ebene zur Kameraposition ist typischerweise nicht erforderlich. The homographies are preferably determined generically from the image or from successive images. A default of a distance of a plane to the camera position is typically not required.

Vorteilhaft umfasst das Verfahren die Schritte:

d2) Zuordnung der ermittelten korrespondierenden Merkmale zu einer von einer Mehrzahl von Ebenen vorgegebener Orientierung im Raum, und
e) Zuordnung zu der Ebene im Raum, die für die ermittelten korrespondierenden Merkmale den kleinsten Rückprojektionsfehler ergibt, wobei der Rückprojektionsfehler den Unterschied an zwischen der gemessenen Korrespondenz eines Merkmals in zwei aufeinanderfolgenden Bildern und dem aus der berechneten Homographie prädizierten Korrespondenz des Merkmals angibt. Insbesondere anhand der berechneten Homographien können die korrespondierenden Merkmale segmentiert werden, also unterschiedlichen Bildbereichen (bzw. Segmenten) zugeordnet werden. In Schritt f) kann dann eine Bestimmung zusätzlicher korrespondierender Merkmale in einem Bildbereich unter Berücksichtigung der zugeordneten Ebene erfolgen.

Advantageously, the method comprises the steps:

d2) assignment of the determined corresponding features to one of a plurality of planes predetermined orientation in space, and
e) Assignment to the plane in space that gives the smallest backprojection error for the identified corresponding features, the backprojection error indicating the difference in between the measured correspondence of a feature in two consecutive images and the prediction of the feature from the computed homography. In particular, based on the calculated homographies, the corresponding features can be segmented, that is, assigned to different image regions (or segments). In step f), a determination of additional corresponding features in an image area can take place, taking into account the assigned level.

Eine vorteilhafte Weiterbildung des Verfahrens umfasst den Schritt d3): Zuordnung von (benachbarten) korrespondierenden Merkmalen zu jeweils einer Bodenebene, einer Rückwandebene oder einer Seitenwandebene. Im Falle eines Koordinatensystems, bei dem die x-Richtung horizontal bzw. lateral, die y-Richtung vertikal und die z-Richtung in Fahrzeuglängsrichtung verläuft, kann eine Bodenebene normal zur y-Richtung, eine Rückwandebene normal zur z-Richtung und eine Seitenwandebene normal zur x-Richtung vorgegeben werden. Durch eine Berechnung von Homographien einer Bodenebene, einer Rückwandebene und einer Seitenwandebene kann für korrespondierende Merkmale in einem Bildbereich bzw. für jedes korrespondierende Merkmal eine Zuordnung zu einer dieser Ebenen erfolgen. Der Abstand (von der Fahrzeugkamera) insbesondere zu einer Rückwand- oder Seitenwandebene ist vorteilhaft das Ergebnis der Homographieberechnung und nicht etwa eine vorgegebene Annahme. An advantageous development of the method comprises the step d3): Assignment of (adjacent) corresponding features to a respective ground level, a backplane or a sidewall level. In the case of a coordinate system in which the x-direction is horizontal or lateral, the y-direction vertical and the z-direction in the vehicle longitudinal direction, a ground plane normal to the y-direction, a backplane normal to the z-direction and a sidewall plane normal be specified to the x-direction. By calculating homographies of a ground plane, a back wall plane and a side wall plane, an assignment to one of these planes can take place for corresponding features in an image region or for each corresponding feature. The distance (from the vehicle camera) in particular to a backplane or sidewall plane is advantageously the result of the homography calculation and not a given assumption.

Bevorzugt können die Homographien für die Rückwandebene nach Gleichung (10) bzw. für die Bodenebene nach Gleichung (9) bzw. für die Seitenwandebene nach Gleichung (11) berechnet werden. Hierbei sind a, b, c Konstanten, x₀, y₀, x₁, y₁ bezeichnen Korrespondenzen im ersten Bild (Index 0) und zweiten Bild (Index 1) und t_x, t_y, t_z sind die Komponenten des Vektors t/d. t beschreibt die Translation der Fahrzeugkamera und d die Entfernung zu einer Ebene (senkrecht zu dieser Ebene). Die Komponenten t_x, t_y bzw. t_z werden im Folgenden auch als „inverse TTC“ bezeichnet. TTC kommt von ‚Time to collision‘ und ergibt sich in einer Raumrichtung als Abstand geteilt durch Translationsgeschwindigkeit. Preferably, the homographies for the backplane can be calculated according to equation (10) or for the ground plane according to equation (9) or for the sidewall plane according to equation (11). Here, a, b, c constants, x ₀ , y ₀ , x ₁ , y ₁ denote correspondences in the first image (index 0) and second image (index 1) and t _x , t _y , t _z are the components of the vector t / d. t describes the translation of the vehicle camera and d the distance to a plane (perpendicular to this plane). The components t _x , t _y and t _z are also referred to below as "inverse TTC". TTC comes from 'time to collision' and results in a spatial direction as a distance divided by translation speed.

Vorteilhaft können ausgehend von einer bereits ermittelten Aufteilung eines Bildes in unterschiedliche Bildbereiche mit jeweils einer zugeordneten Ebene für ein zweites Bild, erste bzw. wenige einander korrespondierende Merkmale in einem Bildbereich im zweiten und einem nachfolgenden dritten Bild bestimmt werden. Aus diesen ersten bzw. wenigen korrespondierenden Merkmalen wird dann eine Homographie für diesen Bildbereich neu berechnet, und die neu berechnete Homographie wird dazu verwendet, die Position und Form zusätzlicher bzw. weiterer korrespondierender Merkmale im dritten Bild zu prädizieren. Advantageously, starting from an already determined division of an image into different image regions, each with an associated plane for a second image, first or few mutually corresponding features in an image region in the second and a subsequent third image can be determined. Homography for this image area is then recalculated from these first or a few corresponding features, and the newly calculated homography is used to predict the position and form of additional or further corresponding features in the third image.

Bevorzugt können, falls in einem nachfolgenden dritten Bild in einem Bildbereich nicht genügend (erste) korrespondierende Merkmale zur Neuberechnung einer Homographie bestimmbar sind, die für den Bildbereich (anhand des ersten und zweiten Bildes) berechnete Homographie aus dem zweiten Bild dazu verwendet werden, die Position und Form zusätzlicher korrespondierender Merkmale des Bildbereichs im dritten Bild zu prädizieren. Dadurch wird die Korrespondenzfindung robuster gegen Form und Skalenänderungen gestaltet. Preferably, if in a subsequent third image in an image area not enough (first) corresponding features for the recalculation of a homography can be determined, the homography from the second image calculated for the image area (based on the first and second images) can be the position and predetermining additional corresponding features of the image area in the third image. This makes the correspondence finding more robust against shape and scale changes.

Gemäß einer bevorzugten Ausgestaltung kann für jeden Bildbereich mit einer zugeordneten Ebene ein aktuelles Bild auf ein vorheriges Bild entsprechend der berechneten Homographie (der zugeordneten Ebene) gewarpt (projiziert bzw. transformiert) werden, um zusätzliche einander im letzten und im aktuellen Bild korrespondierende Merkmale zu bestimmen. Auch diese Ausgestaltung führt zu einer Verdichtung des optischen Flussfeldes. According to a preferred embodiment, for each image area with an associated level, a current image may be warped (projected or transformed) onto a previous image corresponding to the computed homography (the associated plane) to determine additional features corresponding to each other in the last and in the current image , This embodiment also leads to a compression of the optical flux field.

Gemäß einer vorteilhaften Weiterbildung können, falls mehrere Ebenen mit identischer Orientierung auftreten, die Ebenen mit identischer Orientierung anhand der zugehörigen t_x, t_y, t_z-Werte getrennt werden. Beispielsweise können zwei Rückwandebenen, die in z-Richtung unterschiedlich weit von der Fahrzeugkamera entfernt sind, über unterschiedliche t_z-Werte voneinander unterschieden werden. According to an advantageous development, if several levels occur with identical orientation, the planes with identical orientation can be separated on the basis of the associated t _x , t _y , t _z values. For example, two backplane levels, which are different distances from the vehicle camera in the z-direction, can be distinguished from each other by different t _z values.

Bevorzugt kann ein Bild durch ein Gitter in gleichartige Zellen unterteilt werden, und für jede Zelle kann aus den darin ermittelten korrespondierenden Merkmalen eine Homographie berechnet werden. Zellen mit übereinstimmender Homographie können anschließend geclustert werden. Preferably, an image can be subdivided by a grid into similar cells, and for each cell a homography can be calculated from the corresponding features determined therein. Cells with consistent homography can then be clustered.

Bevorzugt kann, falls die berechnete Homographie einer ersten Zelle nicht hinreichend mit einer Homographie einer benachbarten Zelle übereinstimmt, zur Ermittlung einer Ebenengrenze vorteilhaft ein sogenannter Rückprojektionsfehler einzelner korrespondierender Merkmale betrachtet werden. Korrespondierende Merkmale können durch den Rückprojektionsfehler bewertet werden. Der Rückprojektionsfehler gibt den Unterschied an zwischen dem gemessenen Fluss und dem aus der berechneten Homographie prädizierten Fluss an. Wird der Rückprojektionsfehler eines korrespondierenden Merkmals in einer ersten Zelle mit den Rückprojektionsfehlern der Homographien der benachbarten Zellen verglichen und dieses korrespondierende Merkmal der Homographie mit dem geringstem Fehler zugewiesen werden, kann die Ebenengrenze (bzw. Segmentgrenze bzw. Clustergrenze) innerhalb der ersten Zelle verfeinert werden. Auf diese Weise können verschiedene korrespondierende Merkmale einer Zelle unterschiedlichen Ebenen zugeordnet werden. Preferably, if the calculated homography of a first cell does not sufficiently coincide with a homography of an adjacent cell, a so-called backprojection error of individual corresponding features can advantageously be considered to determine a plane boundary. Corresponding features can be evaluated by the backprojection error. The backprojection error indicates the difference between the measured flux and the flux predicted from the computed homography. If the backprojection error of a corresponding feature in a first cell is compared with the backprojection errors of the neighboring cell homographies and that corresponding feature is assigned the least error homography, then the boundary of the boundary (or cluster boundary) within the first cell can be refined. In this way, different corresponding features of a cell can be assigned to different levels.

Bevorzugt kann die Zuordnung von Ebenen zu benachbarten korrespondierenden Merkmalen im Wesentlichen im gesamten Bild der Fahrzeugkamera (z.B. in mindestens 80% der Bildfläche, bevorzugt mindestens 90%) ermittelt werden. Da das erfindungsgemäße Verfahren sehr schnell ausgestaltet werden kann, ist eine generische Objektdetektion bzw. Szeneninterpretation für nahezu das gesamte Bild in Echtzeit möglich. Preferably, the assignment of planes to adjacent corresponding features can be determined substantially in the entire image of the vehicle camera (for example in at least 80% of the image area, preferably at least 90%). Since the method according to the invention can be designed very quickly, generic object detection or scene interpretation is possible for almost the entire image in real time.

Gegenstand der Erfindung ist weiterhin eine Vorrichtung zur Detektion von Objekten aus einer Folge von Bildern einer Fahrzeugkamera umfassend ein Kamerasteuergerät und eine Auswerteelektronik. Das Kamerasteuergerät ist dazu ausgebildet,

a) eine Folge von Bildern mit der Fahrzeugkamera aufzunehmen. Die Auswerteelektronik ist dazu ausgebildet,
b) korrespondierende Merkmale in einem Bildbereich in zwei aufeinander folgenden Bildern zu ermitteln,
d) die ermittelten korrespondierender Merkmale in einem Bildbereich zu einer Ebene im Raum zuzuordnen, und
f) zusätzliche korrespondierende Merkmale in dem Bildbereich unter Berücksichtigung der zugeordneten Ebene zu bestimmen.

The invention further relates to a device for detecting objects from a sequence of images of a vehicle camera comprising a camera control device and an evaluation. The camera control unit is designed to

a) to record a sequence of images with the vehicle camera. The transmitter is designed to
b) to determine corresponding features in one image area in two successive images,
d) to assign the determined corresponding features in an image area to a plane in the room, and
f) determine additional corresponding features in the image area taking into account the associated plane.

Das Kamerasteuergerät bzw. die Auswertungselektronik können insbesondere einen Mikrocontroller oder -prozessor, einen Digital Signal Processor (DSP), einen ASIC (Application Specific Integrated Circuit), einen FPGA (Field Programmable Gate Array) und dergleichen mehr sowie Software zur Durchführung der entsprechenden Steuerungs- bzw. Auswertungsschritte umfassen. Die vorliegende Erfindung kann somit in digitalen elektronischen Schaltkreisen, Computer-Hardware, Firmware oder Software implementiert sein. In particular, the camera control unit or the evaluation electronics may include a microcontroller or processor, a Digital Signal Processor (DSP), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) and the like, as well as software for implementing the corresponding control system. or evaluation steps include. The present invention may thus be implemented in digital electronic circuits, computer hardware, firmware or software.

Weitere Merkmale, Vorteile und Wirkungen der Erfindung ergeben sich aus der nachfolgenden Beschreibung bevorzugter Ausführungsbeispiele der Erfindung. Dabei zeigen: Further features, advantages and effects of the invention will become apparent from the following description of preferred embodiments of the invention. Showing:

1 schematisch eine typische Deformation einer sich nähernden Rückwandebene; 1 schematically a typical deformation of an approaching backplane;

2 schematisch eine typische Deformation einer sich nähernden Bodenebene; 2 schematically a typical deformation of an approaching ground plane;

3 schematisch eine typische Deformation a) einer sich schnell und b) einer sich langsam nähernden oder weiter entfernten Rückwandebene; 3 schematically a typical deformation a) of a fast and b) a slowly approaching or farther backplane;

4 schematisch eine Unterteilung eines Bildes mit zwei unterschiedlichen Segmenten in Zellen; 4 schematically a subdivision of an image with two different segments in cells;

5 Segmentierungsergebnisse nach einem dritten Iterationsschritt; 5 Segmentation results after a third iteration step;

6 Ebenen-Orientation zur Target Validierung (Validierung potentieller Kollisionsobekte); 6 Plane orientation for target validation (validation of potential collision objects);

7 Time to Collision-Betrachtung; und 7 Time to collision viewing; and

8 Projektion (bzw. Warpen) des Leitplankensegments zum Zeitpunkt t-0 (rechts) auf t-1 (links). 8th Projection (or warping) of the guardrail segment at time t-0 (right) to t-1 (left).

Einander entsprechende Teile sind in der Regel in allen Figuren mit denselben Bezugszeichen versehen. Corresponding parts are generally provided in all figures with the same reference numerals.

In 1 ist schematisch eine Rückwandebene (back plane) dargestellt, die zu einem ersten Zeitpunkt t-1 den schraffiert dargestellten Bereich (20, gepunktete Linie) einnimmt. Zu einem darauf folgenden Zeitpunkt t hat sich der Abstand zwischen der Fahrzeugkamera und der Rückwandebene verringert, was zu der durch die Pfeile (d1) angedeuteten Deformation des Bereichs (21, durchgezogene Linie) der Rückwandebene im Bild führt. Der Bereich (20; 21) skaliert bzw. vergrößert sich infolge der Relativbewegung von der Fahrzeugkamera zu der Rückwandebene. In 1 schematically shows a backplane (back plane), which at a first time t-1 the hatched area ( 20 , dotted line) occupies. At a subsequent time t, the distance between the vehicle camera and the rear wall plane has decreased, which leads to the deformation of the area indicated by the arrows (d1) (FIG. 21 , solid line) of the backplane in the picture leads. The area ( 20 ; 21 ) scales or increases due to the relative movement of the vehicle camera to the backplane.

In 2 ist schematisch eine Bodenebene (ground plane) dargestellt, die die zu einem ersten Zeitpunkt t-1 den schraffiert dargestellten Bereich (30, gepunktete Linie) einnimmt. Dies könnte ein Abschnitt einer Fahrbahnoberfläche sein, auf der das Fahrzeug fährt. Infolge der Eigenbewegung der Fahrzeugkamera ändert sich der Bereich (im Bild) zu einem darauf folgenden Zeitpunkt t, was zu der durch die Pfeile (d2) skizzierten Deformation des Bereichs (32) der Bodenebene führt. Zum Zeitpunkt t begrenzen die mit 32 bezeichneten Linien den Bereich der Bodenebene. Unter der „Bodenebene“ wird hier also ein abgegrenzter Bereich auf der Fahrbahnoberfläche verstanden. Der Randbereich ergibt sich z.B. aus Signaturen (bzw. Randpunkten) auf der Fahrbahnoberfläche, die in der Bilderfolge getrackt werden können. In 2 FIG. 2 schematically shows a ground plane which shows the area (shown hatched) at a first time t-1 (FIG. 30 , dotted line) occupies. This could be a section of road surface on which the vehicle is traveling. As a result of the self-movement of the vehicle camera, the area (in the image) changes at a subsequent time t, which results in the deformation of the area (see FIG. 2 d). 32 ) the ground level leads. At time t limit the with 32 designated lines the area of the ground level. The "ground level" is thus understood to mean a delimited area on the road surface. The edge area results, for example, from signatures (or boundary points) on the road surface, which can be tracked in the image sequence.

3 veranschaulicht den Unterschied zwischen einer sich schnell (3a: 20, 21; Deformation d1) und einer sich langsam (3b) nähernden Rückwandebene (20, 23; Deformation d3), falls zum Zeitpunkt t-1 die Rückwandebene (20) in 3a denselben Abstand zur Fahrzeugkamera aufweist wie die Rückwandebene (20) in 3b. Alternativ könnte 3 den Unterschied zwischen einer nahen Rückwandebene (3a: 20, 21; Deformation d1) und einer weiter entfernten Rückwandebene (20, 23; Deformation d3) darstellen, die sich z.B. mit derselben (Relativ-)Geschwindigkeit bewegen, dann wäre das in 3b dargestellte Objekt (20, 21) im realen Raum größer als das in 3a dargestellte Objekt (20, 23). 3 illustrates the difference between a fast ( 3a : 20 . 21 ; Deformation d1) and one slowly ( 3b ) approaching backplane ( 20 . 23 ; Deformation d3), if at the time t-1 the backplane ( 20 ) in 3a has the same distance to the vehicle camera as the backplane ( 20 ) in 3b , Alternatively could 3 the difference between a near backplane ( 3a : 20 . 21 ; Deformation d1) and a more distant backplane ( 20 . 23 ; Deformation d3), for example, move with the same (relative) speed, then that would be in 3b represented object ( 20 . 21 ) in real space larger than that in 3a represented object ( 20 . 23 ).

Werden anstelle einzelner Korrespondenzen, mehrere benachbarte Korrespondenzen betrachtet, lassen sich Objekte aufgrund unterschiedlicher Geschwindigkeiten, Skalierungen und Deformation segmentieren. If several adjacent correspondences are considered instead of individual correspondences, objects can be segmented due to different speeds, scaling and deformation.

Geht man davon aus, dass die Welt aus Ebenen besteht, kann man diese durch Homographien beschreiben und wie im Folgenden gezeigt wird über Ihre Distanz, Geschwindigkeit und Orientierung trennen. Eine Homographie beschreibt die Korrespondenz von Punkten auf einer Ebene zwischen zwei Kamerapositionen bzw. die Korrespondenz zweier Punkte in zwei aufeinanderfolgenden Frames.

Assuming that the world is made up of layers, you can describe them with homographies and, as shown below, separate them by their distance, speed and orientation. Homography describes the correspondence of points on a plane between two camera positions or the correspondence of two points in two consecutive frames.

Eine Homographie kann bildbasiert durch Kenntnis von vier Punkt-Korrespondenzen berechnet werden (vgl. Tutorial: Multiple View Geometry, Hartley, R. and Zisserman, A., CVPR June 1999: https://de.scribd.com/document/96810936/Hartley-Tut-4up abgerufen am 26.09.2016 ). Die auf Seite 6 des Tutorials links oben (Folie 21) angegebenen Zusammenhänge lassen sich in der Notation von Gleichung 1 wie folgt formulieren:

Homography can be calculated image-based by knowing four point correspondences (cf. Tutorial: Multiple View Geometry, Hartley, R. and Zisserman, A., CVPR June 1999: https://en.scribd.com/document/96810936/Hartley-Tut-4up retrieved on 26.09.2016 ). The on page 6 of the tutorial top left (slide 21 ) can be formulated in the notation of Equation 1 as follows:

Alternativ kann bei Kenntnis der Kameratranslation t, der Rotation R und der Entfernung d entlang des Normalen Vektors n der Ebene die Homographie nach Gleichung 3 berechnet werden. Gleichung 3 verdeutlicht, dass sich bei einer inversen TTC t/d ungleich Null, Ebenen mit unterschiedlichen Orientierung n modellieren lassen und dass sich Ebnen mit identischer Orientierung n über ihre inverse TTC trennen lassen.

Alternatively, knowing the camera translation t, the rotation R and the distance d along the normal vector n of the plane, the homography of Equation 3 can be calculated. Equation 3 illustrates that with an inverse TTC t / d not equal to zero, planes with different orientation n can be modeled and that flattening with identical orientation n can be separated by their inverse TTC.

Eine Homographie lässt sich theoretisch in den Normalen Vektor n, die Rotationsmatrix R und die inverse TTC t/d zerlegen. Leider ist diese Zerlegung numerisch äußerst instabil und empfindlich auf Messfehler. A homography can theoretically be decomposed into the normals vector n, the rotation matrix R and the inverse TTC t / d. Unfortunately, this decomposition is numerically extremely unstable and sensitive to measurement errors.

Beschreibt man eine Szene durch Ebenen, lässt sie sich wie im Folgenden angegeben segmentieren. If you describe a scene through layers, you can segment it as shown below.

4 zeigt schematisch eine Unterteilung in Zellen (Grid, Gitter/-linien). Die Szene wird in NxM initiale Zellen unterteilt und jeder Punkt-Korrespondenz wird eine eindeutige ID zugewiesen. Diese ID gibt zunächst die Zugehörigkeit zu einer Zelle an. Im weiteren Verlauf kann die ID die Zugehörigkeit zu einem Cluster oder einem Objekt angeben. Schraffiert ist ein Objekt (insb. eine Rückwandebene) im Vordergrund dargestellt. Der Hintergrund ist weiß dargestellt. Beinhaltet eine Zelle nur ein Objekt (Zellen B3, D3) wird eine Homographie diese Zelle sehr gut beschreiben. Beinhaltet eine Zelle jedoch mehr als ein Objekt (Zelle C3), wird die Homographie keines der beiden Objekte gut beschreiben. Werden die Punkt-Korrespondenzen (schwarzer Punkt bzw. schwarzes Kreuz bzw. x) den Clustern (bzw. Segment) der benachbarten Zellen (B3 bzw. D3) über Ihre Rückprojektionsfehler zugeordnet, wird der schwarze Punkt dem Segment der Zelle B3 und das schwarze Kreuz dem Segment der Zelle D3 zugeordnet, da die Homographie für die Zelle C3 weder den Vordergrund noch den Hintergrund gut beschreibt. 4 schematically shows a subdivision into cells (grid, grid / lines). The scene is divided into NxM initial cells and each point correspondence is assigned a unique ID. This ID first indicates the affiliation to a cell. In the further course, the ID can indicate the membership of a cluster or an object. Hatched is an object (especially a backplane) in the foreground. The background is white. If a cell contains only one object (cells B3, D3), homography will describe this cell very well. However, if a cell contains more than one object (cell C3), the homography will not describe any of the two objects well. If the dot correspondences (black dot or black cross or x) are assigned to the clusters (or segment) of the neighboring cells (B3 and D3, respectively) via their backprojection errors, the black dot becomes the segment of cell B3 and the black cross assigned to the segment of the cell D3, since the homography for cell C3 describes neither the foreground nor the background well.

Ist Vorwissen über eine Szene vorhanden, lassen sich die Segmentgrößen an die Szene anpassen, indem z.B. größere Bereiche im Nahbereich des Fahrzeuges oder in Bereichen mit positiver Klassifikations-Antwort generiert werden. Für jedes Segment wird, wie in den Gleichungen 5 bis 10 gezeigt wird, eine dedizierte Back-/Ground- und Side-Plane-Homographie berechnet. If there is prior knowledge of a scene, the segment sizes can be adapted to the scene, for example, by generating larger areas in the vicinity of the vehicle or in areas with a positive classification answer. For each segment, as shown in Eqs. 5-10, dedicated back / ground and side-plane homography is computed.

Die Berechnung der Back-/Ground- und Side-Plane-Homographie, erhöht die Trennschärfe, da eine Homographie mit weniger Freiheitsgraden Bereiche, die unterschiedliche Ebenen beinhalten, nur schlecht modellieren kann und somit korrespondierende Punkte einen höheren Rückprojektionsfehler aufweisen werden, siehe 4. e = x_t0 – H_ix_t1 (4) The computation of back / ground and side-plane homography increases the selectivity, since homology with less degrees of freedom can only poorly model regions that contain different planes and thus corresponding points will have a higher backprojection error, see 4 , e = x _t0 - H _i x _t1 (4)

Setzt man die statische Einbaulage der Kamera und Kamera Rotation in zwei unterschiedlichen Ansichten als gegeben voraus (z.B. durch Kenntnis der Kamera Kalibration und durch die Berechnung der Fundamental-Matrix in einem monokularen System oder durch Rotationswerte eines Drehratensensor-Clusters), lässt die inverse TTC t/d mittels der um die statische Kamera-Rotation kompensierten Flussvektoren berechnen, wie im Folgenden exemplarisch für eine Ground Plane n' = [0 1 0] gezeigt wird. Ist die Rotation nicht bekannt, kann sie näherungsweise durch eine Einheitsmatrix ersetzt werden. Ersetzt man den Quotienten t/d in Gleichung 3 durch die inverse Assuming the static mounting position of the camera and camera rotation in two different views as given (eg by knowing the camera calibration and by calculating the fundamental matrix in a monocular system or by rotation values of a rotation rate sensor cluster), the inverse TTC t / d by means of the flux vectors compensated for the static camera rotation, as shown below by way of example for a ground plane n '= [0 1 0]. If the rotation is not known, it can be approximately replaced by a unit matrix. If we replace the quotient t / d in equation 3 by the inverse

Time to Collision

folgt:

Time to collision

follows:

Durch Normierung der homogenen Koordinaten ergibt sich: x₀(c – t_zy₁) = a – t_xy₁ (7) y₀(c – t_zy₁) = b – t_yy₁ (8) By normalizing the homogeneous coordinates results: x ₀ (c - t _z y ₁ ) = a - t _x y ₁ (7) y ₀ (c - t _z y ₁ ) = b - t _y y ₁ (8)

Für mehr als eine Messung ergibt sich ein Gleichungssystem der Form Mx = v, mit einer Matrix M und einem Vektor v, das sich für mindestens drei Bild-Korrespondenzen durch z.B. eine Singular Value Decomposition (Singulärwertzerlegung der Matrix) oder ein Least-Square-Verfahren lösen lässt.

For more than one measurement, an equation system of the form Mx = v, with a matrix M and a vector v, results for at least three image correspondences, for example, by a singular value decomposition (singular value decomposition of the matrix) or a least-square method solve.

Die Herleitung der Back- und Side-Plane-Homographien erfolgt analog und ergibt.

The derivation of the back and side plane homographies is analogous and results.

Um größere, aus mehreren Zellen bestehende Objekte zu segmentieren, lassen sich in einem weiteren Schritt benachbarte Zellen zusammenfassen, indem die Rückprojektionsfehler Σx i / t0 – H_jx i / t1 bzw. Σx j / t0 – H_ix j / t1 über Stützstellen (siehe unten Punkt 1.: RANSAC) der benachbarten Segmente j und i und deren Homographien berechnet werden. Zwei benachbarte Cluster werden zusammengefasst, falls Σx j / t0 – H_jx i / t1 kleiner Σx i / t0 – H_ix i / t1 ist oder z.B. der auf die prädizierte Flusslänge normierte Rückprojektionsfehler unter einer einstellbaren Schwelle liegt. Alternativ lassen sich Rückprojektionsfehler als Potentiale in einem Graph verwenden und eine globale Lösung berechnen. Die Kompaktheit der Cluster lässt sich hierbei über die Kantenpotentiale im Graphen bestimmen. In order to segment larger objects consisting of several cells, in a further step adjacent cells can be combined by the back projection errors Σx i / t0 - H _j xi / t1 respectively. Σx j / t0 - H _i xj / t1 via interpolation points (see below point 1 .: RANSAC) of the adjacent segments j and i and their homographies are calculated. Two adjacent clusters are grouped together if Σx j / t0 - H _j xi / t1 smaller Σx i / t0 - H _i xi / t1 or, for example, the backprojection error normalized to the predicted flow length is below an adjustable threshold. Alternatively, backprojection errors can be used as potentials in a graph and compute a global solution. The compactness of the clusters can be determined by the edge potentials in the graph.

Wurden die Segmente zusammengefasst werden die Homographien neu berechnet und die Punkt-Korrespondenzen den Clustern mit geringstem Rückprojektionsfehler zugeordnet. Betrachtet man nur direkt angrenzende Cluster, lassen sich sehr kompakte Objekte generieren. Überschreitet der minimale Fehler eine einstellbare Schwelle, werden den Korrespondenzen neue (Cluster-/Objekt-)IDs zugewiesen, um teilverdeckte Objekte oder Objekte mit leicht unterschiedlicher TTC erkennen zu können. Durch die Einstellung der Schwelle kann die Auflösung (leicht) unterschiedlicher Objekte angepasst werden. If the segments have been combined, the homographies are recalculated and the point correspondences are assigned to the clusters with the lowest backprojection error. If you only consider directly adjacent clusters, very compact objects can be generated. If the minimum error exceeds an adjustable threshold, the correspondences are assigned new (cluster / object) IDs in order to be able to recognize partially concealed objects or objects with slightly different TTCs. By setting the threshold, the resolution (slightly) of different objects can be adjusted.

Die Rückprojektionsfehler lassen sich mit einem Bias versehen, der Kosten für zusammenhängende Bereiche reduziert oder einem Bias, der die Kosten für einen ID-Wechsel erhöht, falls Punkt-Korrespondenzen über eine längere Zeit dieselbe ID-Zugehörigkeit hatten. The backprojection errors can be biased to reduce costs for contiguous areas or a bias that increases the cost of an ID change if point correspondences have had the same ID affiliation over an extended period of time.

5 zeigt ein Beispiel einer Szenensegmentierung: 5a zeigt ein Bild, das von einer Fahrzeugkamera aufgenommen wurde, die im Inneren des Fahrzeugs angeordnet ist und die vorausliegende Umgebung durch die Windschutzscheibe erfasst. Zu sehen ist eine dreispurige Fahrbahn (51), z.B. eine Autobahn. Die Fahrspuren sind durch entsprechende Fahrspurmarkierungen getrennt. Auf allen drei Fahrspuren fahren Fahrzeuge. Das auf der eigenen Fahrspur vorausfahrende Fahrzeug (53) verdeckt möglicherweise weitere auf der eigenen Fahrspur befindliche vorausfahrende Fahrzeuge. Links der dreispurigen Fahrbahn befindet sich eine bauliche erhabene Begrenzung (52) zur Gegenfahrbahn. Rechts der dreispurigen Fahrbahn (51) befindet sich ein Rand- bzw. Standstreifen, der nach rechts von einer Leitplanke begrenzt wird, hinter der sich ein Waldgebiet anschließt. In einiger Entfernung vor dem eigenen Fahrzeug sind Schilderbrücken (54) zu erkennen, von denen eine die dreispurige Fahrbahn (51) überspannt. 5 shows an example of scene segmentation: 5a FIG. 12 shows an image taken by a vehicle camera disposed inside the vehicle and detecting the surrounding environment through the windshield. FIG. You can see a three lane road ( 51 ), eg a motorway. The lanes are separated by corresponding lane markings. On all three lanes drive vehicles. The vehicle ahead in its own lane ( 53 ) may obscure further vehicles in front of their own lane. On the left of the three-lane roadway is a built-up raised boundary ( 52 ) to the opposite lane. Right of the three-lane road ( 51 ) is a border or hard shoulder, which is bounded on the right by a guard rail, behind which joins a forest area. At some distance in front of your own vehicle are gantries ( 54 ), one of which is the three-lane road ( 51 ) spans.

Analog dem anhand von 4 beschriebenen Verfahren, kann diese Szene segmentiert werden. In 5b bis 5d sind Zellen (56) zu erkennen. In den Zellen sind Punktkorrespondenzen (55) dargestellt. Die Zuordnung einer Zelle (56) zu einem Segment ist über die Farbe des Zellrahmens bzw. der Punktkorrespondenzen (55) dargestellt. 5b zeigt den roten Kanal des segmentierten Bildes, 5c den grünen Kanal und 5d den blauen Kanal. Unterschiedliche Segmente wurden mit unterschiedlichen Farben versehen. Ein Segment, welches im Original grün ist, erstreckt sich über die untersten fünf bis sechs Zeilen (in 5b und 5d entsprechend weiss dargestellt und ohne Zellrahmen). Dieses Segment entspricht der Bodenebene, also der Oberfläche der Fahrbahn (51), auf der das eigene Auto fährt. Ein weiteres Segment ist in der Mitte des Bildes zu erkennen, im Original ist es pink. Daher weist es in 5b hohe Rotwerte auf, in 5d schwächere Blauwerte und in 5c keine Grünwerte. Dieses Segment entspricht der Rückwandebene des auf der eigenen Fahrspur vorausfahrenden (Transporter-)Fahrzeugs (53). Das gezeigte Segmentierungsergebnis wurde ohne Vorwissen über die Szene in nur drei Iterationsschritten ermittelt. Das zeigt die enorme Schnelligkeit und Leistungsfähigkeit einer Ausführungsform der Erfindung durch zeitliche Integration. Analogous to that of 4 described method, this scene can be segmented. In 5b to 5d are cells ( 56 ) to recognize. In the cells are point correspondences ( 55 ). The assignment of a cell ( 56 ) to a segment is determined by the color of the cell frame or dot correspondences ( 55 ). 5b shows the red channel of the segmented image, 5c the green channel and 5d the blue channel. Different segments were provided with different colors. A segment that is green in the original extends over the bottom five to six lines (in 5b and 5d shown in white and without cell frame). This segment corresponds to the ground level, ie the surface of the road ( 51 ), on which the own car drives. Another segment can be seen in the middle of the picture, in the original it is pink. Therefore it points in 5b high red values on, in 5d weaker blue values and in 5c no green values. This segment corresponds to the backplane of the (transporter) vehicle traveling in its own lane ( 53 ). The segmentation result shown was determined without prior knowledge of the scene in only three iterations. This demonstrates the tremendous speed and performance of an embodiment of the invention through time integration.

6 zeigt eine Ermittlung der Orientierung von Ebenen in der bereits bei 5 beschriebenen Szene. 6a zeigt zur Orientierung nochmals die Umgebungssituation gemäß 5a. Sämtliche Korrespondenzen, die einer Seitenwandebene zugeordnet zeigt 6b. Die Korrespondenzen am linken Rand wurden einer rechten Seitenwandebene zugeordnet, was zutreffend ist, da dort im Bild die rechte Seite der baulichen Begrenzung (52) zur Gegenfahrbahn befindlich ist. Die Korrespondenzen in der rechten Bildhälfte wurden linken Seitenwandebenen zugeordnet, was ebenfalls zutreffend ist, da dort die „linke Seite“ der Fahrbahnrandbebauung bzw. -bepflanzung im Bild befindlich ist. 6c zeigt, welche Korrespondenzen einer Bodenebene zugeordnet werden, was zutreffend ist, da dort im Bild die Oberfläche der Fahrbahn (51) zu sehen ist. 6 shows a determination of the orientation of levels in the already at 5 described scene. 6a shows for orientation again the environmental situation according to 5a , All Correspondence that is assigned to a sidewall level 6b , The correspondences at the left edge have been assigned to a right sidewall plane, which is true, since in the picture the right side of the structural boundary ( 52 ) is located to the opposite lane. The correspondences in the right half of the picture have been assigned to left side wall levels, which is also correct, because there the "left side" of the roadside development or planting is located in the picture. 6c shows which correspondences are assigned to a ground level, which is true, because in the picture the surface of the road ( 51 ) you can see.

6d zeigt, welche Korrespondenzen einer Rückwandebene zugeordnet werden. Das ist weitestgehend zutreffend. Aus dieser Ermittlung alleine können unterschiedliche Rückwandebenen noch nicht hinreichend unterschieden werden, z.B. die von dem auf derselben Fahrspur vorausfahrenden Lieferwagen (53) von den im Bild darüber angeordneten Schildern der Schilderbrücke (54). Dieser Darstellung können aber bereits wichtige Hinweise entnommen werden, wo erhabene Objekte in der Umgebung des Fahrzeugs auftreten. 6d shows which correspondences are assigned to a backplane level. That is largely true. From this determination alone different backplane levels can not yet be sufficiently distinguished, for example, those of the on the same lane preceding vans ( 53 ) from the signs of the Schilderbrücke ( 54 ). However, important information can already be found in this representation of where elevated objects occur in the surroundings of the vehicle.

Wie in 7 veranschaulicht wird, kann zur Erkennung dynamischer Objekte die inverse TTC (t_x, t_y, t_z) verwendet werden. 7a zeigt wiederum das Bild der Fahrzeugsituation (identisch mit 6a). Das auf der eigenen Fahrspur vorausfahrende Fahrzeug (73) ist ein Lieferwagen. Auf der linken Spur fahren zwei Fahrzeuge (71 und 72) und auf der rechten Spur zwei weitere Fahrzeuge (74 und 75). 7b zeigt Korrespondenzen, die wiederum der Bodenebene entsprechen (im Original violett) und als einzige einen Rotanteil aufweisen. 7c zeigt Korrespondenzen, die bewegten Objekten zugeordnet werden. Diese sind im Original grün, wenn sie sich vom eigenen Fahrzeug entfernen (also schneller fahren) bzw. türkis, wenn sie langsamer fahren. 7d zeigt Korrespondenzen mit Blauanteil, also solche, die der Bodenebene entsprechen (vgl. 7b), bewegte Objekte, die sich dem eigenen Fahrzeug nähern (vgl. 7c) und solche, die statischen erhabenen Objekten entsprechen, diese sind nur in 7d dargestellt, wie z.B. Waldbereiche links und rechts der Autobahn und die Schilderbrücken. Aus 7c und 7d gemeinsam, ist zu erkennen, dass sich das Fahrzeug auf der eigenen Fahrspur (73) nähert. Dasselbe gilt für das vordere Fahrzeug auf der rechten Fahrspur (75). Dagegen entfernen sich die übrigen Fahrzeuge (71, 72 und 74). As in 7 is illustrated, can be used to detect dynamic objects, the inverse TTC (t _x, t _y, t _z) be used. 7a again shows the image of the vehicle situation (identical to 6a ). The vehicle ahead in its own lane ( 73 ) is a delivery van. On the left lane are two vehicles ( 71 and 72 ) and on the right lane two more vehicles ( 74 and 75 ). 7b shows correspondences, which in turn correspond to the ground level (in the original purple) and are the only one red. 7c shows correspondences that are assigned to moving objects. These are green in the original, if they move away from their own vehicle (ie drive faster) or turquoise, if they drive slower. 7d shows correspondences with blue, ie those corresponding to the ground level (cf. 7b ), moving objects approaching their own vehicle (cf. 7c ) and those that correspond to static raised objects, these are only in 7d such as forest areas left and right of the highway and the gantries. Out 7c and 7d together, it can be seen that the vehicle is on its own lane ( 73 ) approaches. The same applies to the front vehicle in the right lane ( 75 ). By contrast, the other vehicles ( 71 . 72 and 74 ).

Der Bereich, der im Bild dem Himmel entspricht, führt mangels Struktur im Bild zu keinen Korrespondenzen (weiß in 7b bis 7d). Wird die Eigenrotation in den Korrespondenzen vor der Berechnung der Homographie berücksichtigt, bzw. wird die Eigenrotation in der Rotationsmatrix R berücksichtigt, lassen sich überholende Fahrzeuge aufgrund Ihrer negativen t_z Komponente erkennen bzw. ausscherende oder in einer Kurve fahrende Fahrzeuge durch eine laterale t_x Komponente ungleich Null erkennen. Werden die dynamischen Segmente über ihre Homographien prädiziert (siehe unten „Verdichtung des optischen Flusses basierend auf Homographien“), kann über die Zeit eine dynamische Karte aufgebaut werden. Due to the lack of structure in the picture, the area corresponding to the sky in the picture does not lead to any correspondences (white in 7b to 7d ). Is included in the correspondence before calculating the homography the self-rotation, or the self-rotation in the rotation matrix R is taken into account, overtaking vehicles leave because of your negative t recognize _such component or ausscherende or driving in a curve vehicles through a lateral t _x component recognize nonzero. If the dynamic segments are predicated on their homographies (see "Compaction of the Optical Flow Based on Homography" below), a dynamic map can be built over time.

Betrachtet man Gleichung 3, erkennt man, dass Segmente mit einer inversen TTC gleich Null die Rotationsmatrix beschreiben und man kann sie durch Berechnung einer Homographie mit vollen Freiheitsgrad (Gleichung 2) aus Segmenten mit t/d gleich Null bestimmen. Geht man davon aus, dass sich die translatorische Komponenten in der Nähe des Epipols nicht bemerkbar machen, kann man die Pitch und Gierrate auch bestimmen, indem die Koordinaten des Epipols durch die Homographie statischer Segmente prädiziert werden und der atan((x_e0 – x_e1)/f) bzw. atan((y_e0 – y_e1)/f) mit der auf einen Pixel bezogenen Brennweite berechnet wird. Looking at Equation 3, it can be seen that segments with an inverse TTC equal to zero describe the rotation matrix and can be determined by computing a full degree of freedom homology (Equation 2) from segments with t / d equal to zero. Assuming that the translational components in the vicinity of the epipole are not noticeable, one can also determine the pitch and yaw rate by predicating the coordinates of the epipole by the homography of static segments and the atan ((x _e0 - x _e1 ) / f) or atan ((y _e0 -y _e1 ) / f) is calculated with the pixel-related focal length.

Wird für jedes Cluster eine Homographie mit allen Freiheitsgraden berechnet, können diese auch zur Rekonstruktion der 3D Umgebung verwendet werden, indem anstelle der gemessenen Position x_t0, die prädizierte Position H*x_t1 zur Triangulation verwendet wird. Dies reduziert nicht nur den Einfluss von Messfehlern, sondern ermöglicht es auch Objekte nahe des Epipols zu rekonstruieren. If a homography with all degrees of freedom is calculated for each cluster, these can also be used to reconstruct the 3D environment by using the predicted position H * x _t1 for triangulation instead of the measured position x _t0 . This not only reduces the influence of measurement errors, but also allows objects near the epipole to be reconstructed.

Im Folgenden wird ein Ausführungsbeispiel zur Verdichtung des optischen Flusses basierend auf Homographien beschrieben. Hereinafter, an embodiment for condensing the optical flux based on homographs will be described.

Ist die Segmentierung zum Zeitpunkt t-1 bekannt, kann sie sowohl zur Prädiktion der Objekte als auch zur Generierung eines dichten Flussfeldes verwendet werden. Signaturbasierte Flussverfahren erzeugen Signaturen und versuchen diese in aufeinanderfolgenden Frames eindeutig zuzuordnen. Meist werden die Signaturen aus einem Patch definierter Größe berechnet. Verändern sich jedoch Größe und Form eines Patches, ist eine Korrespondenzfindung mit einem festen Template nicht mehr möglich. (z.B. nähert man sich einer Back-Plane an, verändert sich die Größe eines Patches, bzw. bewegt man sich über eine einer Ground-Plane oder parallel zu einer Side-Plane, verändern sich sowohl Größe als auch Form eines Patches, siehe 1 und 2). Ist die Segmentierung zum Zeitpunkt t-1 vorhanden, können die Homographien über bereits gefunden Flussvektoren neu berechnet und dazu verwendet werden die Position und Form bereits etablierter Korrespondenzen von t-1 auf t-0 zu prädizieren. If the segmentation is known at time t-1, it can be used both to predict the objects and to generate a dense flow field. Signature-based flow methods generate signatures and try to uniquely assign them in consecutive frames. Mostly the signatures are calculated from a patch of defined size. However, if the size and shape of a patch changes, a correspondence determination with a fixed template is no longer possible. (eg if you approach a back-plane, the size of a patch changes, or if you move over a ground-plane or parallel to a side-plane, both the size and the shape of a patch change, see 1 and 2 ). If the segmentation is present at time t-1, the homographies can be recomputed over already found flow vectors and used to predict the position and form of already established correspondences from t-1 to t-0.

Alternativ lässt sich der aktuelle Frame zum Zeitpunkt t-0 auf den Zeitpunkt t-1 transformieren um Skalen und Form Änderungen zu kompensieren. 8 veranschaulicht ein solches Vorgehen. Alternatively, the current frame can be transformed to time t-1 at time t-0 to compensate for scale and shape changes. 8th illustrates such an approach.

8a zeigt ein Bild einer anderen Fahrsituation, das von der Fahrzeugkamera zu einem Zeitpunkt t-1 aufgenommen wurde. Zu sehen ist eine Autobahn mit drei Fahrspuren je Fahrtrichtung. Links der eigenen dreispurigen Fahrbahn befindet sich eine Leitplanke (81) als erhabene Begrenzung zur Gegenfahrbahn. Rechts der Fahrbahn befindet sich eine Lärmschutzwand (82). 8a Fig. 11 shows an image of another driving situation taken by the vehicle camera at a time t-1. You can see a highway with three lanes in each direction of travel. On the left of the own three-lane roadway there is a guardrail ( 81 ) as a raised boundary to the opposite lane. To the right of the roadway is a noise barrier ( 82 ).

8b zeigt ein Bild, das zum darauffolgenden Zeitpunkt t aufgenommen wurde und über die Homographie der Leitplanke derart transformiert („gewarpt“, englisch: to warp) wurde, dass die infolge der Bewegung des Fahrzeuges und damit der Fahrzeugkamera zwischen den beiden Aufnahmezeitpunkten auftretenden Änderungen im Bild im Bereich der Leitplanke kompensiert werden. Die Vorwärtsbewegung des eigenen Fahrzeugs führt in 8b dazu, dass der naheliegendste Teilstrich der Fahrspurmarkierung näher am eigenen Fahrzeug ist als in 8a. Die Transformation führt zu der trapezförmigen Versetzung des Bildes, welche in 8f durch eine gestrichelte Linie veranschaulicht ist. 8b shows an image, which was taken at the next time t and transformed via the homography of the guardrail in such a way ("warped", English: to warp), that the changes due to the movement of the vehicle and thus the vehicle camera between the two recording times in the image be compensated in the area of the guardrail. The forward movement of your own vehicle leads into 8b in that the most obvious part of the lane marking is closer to the own vehicle than in 8a , The transformation leads to the trapezoidal displacement of the image, which in 8f is illustrated by a dashed line.

8c zeigt nun korrespondierende Merkmale (85), die im Bereich der Leitplanke (81, vgl. 8a) ermittelt wurden, als weiße Punkte. 8d zeigt, wo diese korrespondierenden Merkmale im nächsten Bild zu erwarten sind (86), nachdem dieses wie zu 8b beschrieben transformiert worden ist. 8c now shows corresponding characteristics ( 85 ) in the area of the guardrail ( 81 , see. 8a ) were determined as white dots. 8d shows where these corresponding features are to be expected in the next picture ( 86 ) after this like to 8b has been transformed described.

In 8e und 8f ist dieser Sachverhalt nochmals in einer schwarz-weiß Darstellung gezeigt, wobei die korrespondierenden Merkmale (85) nun den schwarzen Punkten auf der Leitplanke (81) in der linken Bildhälfte entsprechen. In 8e and 8f this fact is shown again in a black-and-white representation, the corresponding features ( 85 ) now the black dots on the guardrail ( 81 ) in the left half of the picture.

Zur Generierung eines dichten Flussfeldes kann also für jedes Segment das aktuelle Bild auf das vorherige Bild gewarpt werden, um bereits bestehende Korrespondenzen, die sich in ihrer Skale oder Form veränderte haben, wieder zu finden bzw. um neue Korrespondenzen mittels deckungsgleicher Templates zu etablieren. In order to generate a dense flow field, the current image can be warmed to the previous image for each segment in order to find existing correspondences that have changed in their scale or shape, or to establish new correspondences by means of congruent templates.

Sind in einem aktuellen Frame nicht genügend Flussvektoren zur Neuberechnung einer Homographie vorhanden, lassen sich näherungsweise die Homographie aus dem letzten Frame verwenden um die Korrespondenzfindung robuster gegen Form und Skalenänderungen zu gestalten. If there are not enough flow vectors in a current frame to recalculate a homography, approximately the homography from the last frame can be used to make the correspondence finding more robust against shape and scale changes.

Folgende Ausgestaltungsformen bzw. -aspekte sind vorteilhaft und können einzeln oder in Kombination vorgesehen werden:

1. Das Bild wird in NxM Zellen unterteilt und den Punkt-Korrespondenzen einer Zelle wird eine eindeutige Zellen-ID zugewiesen. Aus den Korrespondenzen mit gleichen IDs werden mittels RANSAC die Back-/Ground- und Side-Plane-Homographien (Gleichung 9, 10 und 10) berechnet und sowohl die Homographie mit dem geringsten Rückprojektionsfehler, als auch die zur Berechnung der Homographie verwendeten Stützstellen gespeichert. Bei RANSAC (RAndom SAmple Consensus) Verfahren wird üblicherweise bei jeder Iteration eine minimale Anzahl an zufällig ausgewählten Korrespondenzen verwendet, um eine Hypothese zu bilden. Für jedes Merkmal wird anschließend ein Wert berechnet, der beschreibt, ob das Merkmal die Hypothese unterstützt. Wenn die Hypothese eine hinreichende Unterstützung durch die Merkmale erreicht, können die nicht-unterstützenden Merkmale als Ausreißer verworfen werden. Andernfalls wird erneut eine minimale Anzahl an Korrespondenzen zufällig ausgewählt.
2. Für benachbarte Zellen i, j werden die Rückprojektionsfehler Σx i / t0 – H_jx i / t1 bzw. Σx j / t0 – H_ix j / t1 über die Stützstellen der benachbarten Homographie berechnet. Ist der Rückprojektionsfehler Σx i / t0 – H_jx i / t1 kleiner Σx i / t0 – H_ix i / t1 bzw. unterschreiten die Fehler eine auf die Flusslänge normierte Schwelle, werden die IDs zusammengefasst und die Homographien neu berechnet.
3. Die Rückprojektionsfehler x_t0 – H_ix_t1 sämtlicher Punkt-Korrespondenzen werden für die angrenzenden Segmente berechnet und eine Punkt-Korrespondenz wird dem Segment mit geringstem Rückprojektionsfehler zugeordnet. Überschreitet der minimale Fehler eine Schwelle, werden die Korrespondenzen mit einer neuen Objekt ID versehen um auch kleinere bzw. teilverdeckte Objekte erkennen zu können.
4. Die Homographien der zum Zeitpunkt t-1 extrahierten Segmente werden zu Beginn eines neuen Frames über die bereits gefunden Bild Korrespondenzen neu berechnet und die bereits bestehenden Segment IDs in den aktuellen Frame prädiziert. Sind im aktuellen Frame nicht genügend Flussvektoren zur Neuberechnung einer Homographie vorhanden, lassen sich näherungsweise die Homographien aus dem letzten Frame verwenden.
5. Zur Generierung eines dichten Flussfeldes wird für jedes Segment der aktuelle Frame auf den letzten Frame gewarpt um bereits bestehende Korrespondenzen, die sich in ihrer Skale oder Form veränderte haben, wieder zu finden bzw. um neue Korrespondenzen zu etablieren.
6. Die Rückprojektionsfehler der Back-/Ground- und Side-Plane können zur Validierung erhabener Ziele verwendet werden, siehe 6.
7. Ist z.B. bei einer Fahrzeugstereokamera eine Disparitätskarte vorhanden, können die absoluten Geschwindigkeiten aus der inversen TTC t/d berechnet werden, da dann die absoluten Entfernungen d für einzelne Pixel in der Disparitätskarte vorliegen.
8. Wird für jedes Segment eine vollständige Homographie mit allen Freiheitsgraden berechnet, kann aus Segmenten mit einer TTC nahe unendlich (bzw. inverse TTCs annähernd Null) die Rotationsmatrix R bestimmt werden.
9. Die 3D-Umgebung kann aus der prädizierten Position (Hx_t1, x_t1) anstelle der gemessenen Position (x_t0, x_t1) rekonstruiert werden und erlaubt es auch Objekte am Epipol zu rekonstruieren.

The following embodiments or aspects are advantageous and can be provided individually or in combination:

1. The image is subdivided into NxM cells and the point correspondences of a cell are assigned a unique cell ID. From the correspondences with the same IDs RANSAC calculates the back / ground and side-plane homographies (Equations 9, 10 and 10) and stores both the homography with the lowest backprojection error and the vertices used to compute the homography. In RANSAC (RAndom SAmple Consensus) methods, a minimum number of randomly selected correspondences are usually used in each iteration to form a hypothesis. For each feature, a value is then calculated that describes whether the feature supports the hypothesis. If the hypothesis reaches sufficient support by the features, the non-supporting features may be discarded as outliers. Otherwise, a minimum number of correspondences will be randomly selected again.
2. For adjacent cells i, j, the backprojection errors become Σx i / t0 - H _j xi / t1 respectively. Σx j / t0 - H _i xj / t1 calculated over the supporting points of the neighboring homography. Is the back projection error Σx i / t0 - H _j xi / t1 smaller Σx i / t0 - H _i xi / t1 or if the errors fall below a threshold normalized to the flow length, the IDs are combined and the homographies recalculated.
3. The backprojection errors x _t0 - H _i x _{t1 of} all dot correspondences are calculated for the adjacent segments and a dot correspondence is assigned to the segment with least backprojection error. If the minimum error exceeds a threshold, the correspondences are provided with a new object ID in order to be able to recognize even smaller or partially hidden objects.
4. The homographies of the segments extracted at time t-1 are recalculated at the beginning of a new frame via the already found image correspondences and the already existing segment IDs are predicted in the current frame. If there are not enough flow vectors available in the current frame to recalculate a homography, the approximate homographies from the last frame can be used.
5. To generate a dense flow field, the current frame for each segment is warped to the last frame in order to find existing correspondences that have changed in their scale or shape, or to establish new correspondence.
6. Back-ground and side-plane backprojecting errors can be used to validate elevated targets, see 6 ,
7. If, for example, a disparity map is present in a car stereo camera, the absolute speeds can be calculated from the inverse TTC t / d, since the absolute distances d for individual pixels in the disparity map are then available.
8. If a complete homography with all degrees of freedom is calculated for each segment, the rotation matrix R can be determined from segments with a TTC close to infinity (or inverse TTCs approaching zero).
9. The 3D environment can be reconstructed from the predicted position (Hx _t1 , x _t1 ) instead of the measured position (x _t0 , x _t1 ) and also allows objects on the epipole to be reconstructed.

ZITATE ENTHALTEN IN DER BESCHREIBUNG QUOTES INCLUDE IN THE DESCRIPTION

Diese Liste der vom Anmelder aufgeführten Dokumente wurde automatisiert erzeugt und ist ausschließlich zur besseren Information des Lesers aufgenommen. Die Liste ist nicht Bestandteil der deutschen Patent- bzw. Gebrauchsmusteranmeldung. Das DPMA übernimmt keinerlei Haftung für etwaige Fehler oder Auslassungen.This list of the documents listed by the applicant has been generated automatically and is included solely for the better information of the reader. The list is not part of the German patent or utility model application. The DPMA assumes no liability for any errors or omissions.

Zitierte PatentliteraturCited patent literature

EP 2993654 A1 [0003]
US 2014/0161323 A1 [0004]

Zitierte Nicht-PatentliteraturCited non-patent literature

Tutorial: Multiple View Geometry, Hartley, R. and Zisserman, A., CVPR June 1999: https://en.scribd.com/document/96810936/Hartley-Tut-4up retrieved on 26.09.2016 [0045]

Claims

A method of detecting objects from a sequence of images of a vehicle camera comprising the steps of: a) taking a series of images with the vehicle camera, b) determination of corresponding features in two consecutive images, d) assignment of determined corresponding features in an image area to a plane in space, and f) determining additional corresponding features in the image area taking into account the associated level.

Method according to claim 1, comprising the step: c) Calculation of homographies for the determined corresponding features in an image area so that they can be assigned to a plane in space.

Method according to claim 2, comprising the steps: d2) assignment of the determined corresponding features to one of a plurality of planes predetermined orientation in space, and e) Assignment to the plane in space that gives the smallest backprojection error for the identified corresponding features, the backprojection error indicating the difference in between the measured correspondence of a feature in two consecutive images and the prediction of the feature from the computed homography.

Method according to one of the preceding claims, comprising the step: d3) assignment of the determined corresponding features in an image area to a respective ground level, a backplane or a sidewall level.

The method of claim 4, wherein the at least one backplane is calculated according to

where a, b, c are constants, x ₀ , y ₀ , x ₁ , y ₁ correspondences in the first image (index 0) and second image (index 1) and t _x , t _y , t _z , the components of the vector t / d are. t describes the translation of the vehicle camera and d the distance to a plane.

The method of claim 4 or 5, wherein the at least one ground plane is calculated according to

The method of claim 4, 5 or 6, wherein the at least one sidewall plane is calculated according to

Method according to one of the preceding claims, wherein on the basis of a determined division of an image into different image areas with associated planes for a second image, first mutually corresponding features are determined in an image area in the second and a subsequent third image, from these first corresponding features becomes one Homography is recalculated for this image area and the newly calculated homography is used to predict the position and shape of additional corresponding features in the third image.

Method according to one of the preceding claims, wherein if in a subsequent third image in an image area not enough first corresponding features for the recalculation of a homography can be determined, the homography calculated from the second image is used for the image area is to predict the position and shape of additional corresponding features of the image area in the third image.

Method according to one of the preceding claims, wherein, for each image area with an associated plane, a current image is warmed to a previous image corresponding to the calculated homography in order to determine additional features corresponding to each other in the last image and in the current image.

Device for detecting objects from a sequence of images of a vehicle camera comprising: a camera control device which is designed a) to record a sequence of images with the vehicle camera; and an evaluation electronics, which is designed to b) to determine corresponding features in one image area in two successive images, d) to assign corresponding features determined in an image area to a plane in the room, and f) determine additional corresponding features in the image area taking into account the associated plane.