DE102018100315A1

DE102018100315A1 - Generating input data for a convolutional neural network

Info

Publication number: DE102018100315A1
Application number: DE102018100315.3A
Authority: DE
Inventors: Stephen Foy; Rosalia Barros; Ian Clancy
Original assignee: Connaught Electronics Ltd
Current assignee: Connaught Electronics Ltd
Priority date: 2018-01-09
Filing date: 2018-01-09
Publication date: 2019-07-11
Also published as: WO2019137915A1

Abstract

Die vorliegende Erfindung betrifft ein Verfahren zum Erzeugen von Eingabedaten für ein konvolutionelles neuronales Netzwerk unter Verwendung mindestens einer Kamera (3) und mindestens eines Bereichssensors (5, 6), wobei die Kamera (3) und der Bereichssensor (5, 6) derart am Kraftfahrzeug (1) angeordnet sind, dass das Sichtfeld der Kamera (3) und das Sichtfeld des Bereichssensors (5, 6) sich zumindest teilweise überlappen, wobei das Verfahren die folgenden Schritte aufweist:
- Erfassen eines Bildrahmens durch die Kamera (3), wobei der Bildrahmen aus Bilddaten für Richtungen relativ zur Position der Kamera (3) und innerhalb des durch die Kamera (3) abgedeckten Raumwinkels besteht, wobei die Richtungen durch Koordinaten in einem Kamerakoordinatensystem dargestellt werden,
- gleichzeitiges Erfassen von Tiefeninformation durch den Bereichssensor (5, 6), wobei die Tiefeninformation aus Tiefendaten für Richtungen relativ zur Position des Bereichssensors (5, 6) und innerhalb des durch den Bereichssensor (5, 6) abgedeckten Raumwinkels bestehen, wobei die Richtungen durch Koordinaten in einem Bereichssensorkoordinatensystem dargestellt werden,
- Bereitstellen eines Kraftfahrzeugkoordinatensystems, das mit dem Kamerakoordinatensystem und dem Bereichssensorkoordinatensystem durch entsprechende Sätze von Translationen und Rotationen in Beziehung steht, die durch die Position der Kamera (3) und die Position des Bereichssensors (5, 6) relativ zum Ursprung des Kraftfahrzeugkoordinatensystems gegeben sind, und
- Transformieren der Koordinaten im Kamerakoordinatensystem und der Koordinaten im Bereichssensorkoordinatensystem in Koordinaten im Kraftfahrzeugkoordinatensystem auf der Basis der Sätze von Translationen und Rotationen, wodurch die Eingabedaten für das konvolutionelle neurale Netzwerk erhalten werden. Auf diese Weise kann eine semantische Segmentierung von Objekten in einem Bild in automobilem Computer Vision verbessert werden.

The present invention relates to a method for generating input data for a convolutional neural network using at least one camera (3) and at least one range sensor (5, 6), wherein the camera (3) and the range sensor (5, 6) on the motor vehicle (1) that the field of view of the camera (3) and the field of view of the area sensor (5, 6) overlap at least partially, the method comprising the following steps:
Capturing an image frame by the camera (3), the image frame consisting of image data for directions relative to the position of the camera (3) and within the solid angle covered by the camera (3), the directions being represented by coordinates in a camera coordinate system,
simultaneous detection of depth information by the area sensor (5, 6), the depth information consisting of depth data for directions relative to the position of the area sensor (5, 6) and within the space angle covered by the area sensor (5, 6), the directions passing through Coordinates are displayed in a range sensor coordinate system,
Providing a motor vehicle coordinate system related to the camera coordinate system and the area sensor coordinate system by respective sets of translations and rotations given by the position of the camera (3) and the position of the area sensor (5, 6) relative to the origin of the motor vehicle coordinate system, and
Transforming the coordinates in the camera coordinate system and the coordinates in the area sensor coordinate system into coordinates in the motor vehicle coordinate system on the basis of the sets of translations and rotations, thereby obtaining the input data for the convolutional neural network. In this way, a semantic segmentation of objects in an image in Automobile Computer Vision can be improved.

Description

Die vorliegende Erfindung betrifft ein Verfahren zum Erzeugen von Eingabedaten für ein konvolutionelles neuronales Netzwerks unter Verwendung mindestens einer Kamera und mindestens eines Bereichssensors.The present invention relates to a method for generating input data for a convolutional neural network using at least one camera and at least one area sensor.

Eines der grundlegendsten Probleme bei automobilem Computer Vision ist die semantische Segmentierung von Objekten in einem Bild. Der Segmentierungsansatz bezieht sich auf die Probleme der Zuordnung jedes Pixels zu seiner entsprechenden Objektklasse. In jüngster Zeit gab es einen sprunghaften Anstieg der Erforschung und des Designs konvolutioneller neuronaler Netzwerke (CNN), unterstützt durch eine Erhöhung der Rechenleistung in Computerarchitekturen und die Verfügbarkeit großer annotierter Datensätze.One of the most fundamental problems with automotive computer vision is the semantic segmentation of objects in an image. The segmentation approach refers to the problems of associating each pixel with its corresponding feature class. Recently, there has been a spike in research and design of convolutional neural networks (CNN), supported by increased computational power in computer architectures and the availability of large annotated datasets.

CNNs sind sehr erfolgreich bei Klassifizierungs- und Kategorisierungsaufgaben, aber ein großer Teil der Forschung befasst sich mit standardmäßigen photometrischen RGB-Bildern und konzentriert sich nicht auf eingebettete automobile Vorrichtungen. Automobile Hardware muss einen niedrigen Stromverbrauch und damit eine geringe Rechenleistung haben.CNNs are very successful in classifying and categorizing tasks, but much of the research is on standard RGB RGB photometric images and does not focus on embedded automotive devices. Automobile hardware must have a low power consumption and thus a low computing power.

Beim maschinellen Lernen ist ein konvolutionelles neuronales Netzwerk eine Klasse tiefer, vorwärtsgekoppelter künstlicher neuronaler Netzwerke, die erfolgreich zur Analyse visueller Bilder verwendet wurden. CNNs verwenden eine Variation mehrschichtiger Perzeptronen, die so designt sind, dass sie eine minimale Vorverarbeitung erfordern. Konvolutionelle Netzwerke wurden durch biologische Prozesse inspiriert, bei denen das Konnektivitätsmuster zwischen Neuronen durch die Organisation des tierischen visuellen Kortex inspiriert ist. Einzelne kortikale Neuronen reagieren auf Stimuli nur in einem eingeschränkten Bereich des visuellen Feldes, der als rezeptives Feld bekannt ist. Die rezeptiven Felder verschiedener Neuronen überlappen sich teilweise, so dass sie das gesamte visuelle Feld abdecken.In machine learning, a convolutional neural network is a class of deep, feedforward artificial neural networks that has been successfully used to analyze visual images. CNNs use a variation of multilayer perceptrons that are designed to require minimal preprocessing. Convolutional networks have been inspired by biological processes in which the connectivity pattern between neurons is inspired by the organization of the animal visual cortex. Individual cortical neurons respond to stimuli only in a restricted area of the visual field known as the receptive field. The receptive fields of different neurons partially overlap so that they cover the entire visual field.

CNNs verwenden relativ wenig Vorverarbeitung im Vergleich zu anderen Bildklassifizierungsalgorithmen. Dies bedeutet, dass das Netzwerk die Filter lernt, die in herkömmlichen Algorithmen von Hand entwickelt wurden. Diese Unabhängigkeit von vorherigem Wissen und menschlichem Arbeitsaufwand beim Merkmalsdesign ist ein großer Vorteil. CNNs finden Anwendung in der Bild- und Videoerkennung, in Empfehlungssystemen und bei natürlicher Sprachverarbeitung.CNNs use relatively little preprocessing compared to other image classification algorithms. This means that the network learns the filters that have been manually developed in conventional algorithms. This independence of prior knowledge and human effort in feature design is a great advantage. CNNs are used in image and video recognition, recommendation systems and natural language processing.

Der Artikel „Multimodal Deep Learning for Robust RGB-D Object Recognition, Andreas Eitel, Jost Tobias Springenberg, Luciano Spinello, Martin Riedmiller, Wolfram Burgard, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Hamburg, Deutschland, 2015“ schlägt eine RGB-D-Architektur zur Objekterkennung vor. Diese Architektur besteht aus zwei separaten CNN-Verarbeitungsströmen - einer für jede Modalität - die aufeinanderfolgend mit einem Late-Fusions-Netzwerk kombiniert sind. Der Fokus liegt auf dem Lernen mit unvollkommenen Sensordaten, ein typisches Problem bei realen Robotikaufgaben. Für genaues Lernen werden eine mehrstufige Trainingsmethodik und zwei entscheidende Faktoren für die Handhabung von Tiefendaten mit CNNs eingeführt. Der erste ist eine effektive Kodierung von Tiefeninformationen für CNNs, die ein Lernen ohne das Erfordernis für große Tiefen-Datensätze ermöglicht. Der zweite ist ein Datenerweiterungsschema für robustes Lernen mit Tiefenbildern, indem diese mit realistischen Rauschmustern verschlechtert werden.The article "Multimodal Deep Learning for Robust RGB-D Object Recognition, Andreas Eitel, Jost Tobias Springenberg, Luciano Spinello, Martin Riedmiller, Wolfram Burgard, IEEE / RSJ International Conference on Intelligent Robotics and Systems (IROS), Hamburg, Germany, 2015" proposes an RGB-D architecture for object recognition. This architecture consists of two separate CNN processing streams - one for each modality - that are sequentially combined with a late-fusion network. The focus is on learning with imperfect sensor data, a typical problem in real robotics tasks. For accurate learning, a multi-level training methodology and two key factors for handling depth data with CNNs are introduced. The first is effective encoding of depth information for CNNs, which allows learning without the need for large depth data sets. The second is a data enhancement scheme for robust learning with depth images by degrading them with realistic noise patterns.

Aus der US 2017/0099200 A1 ist bekannt, dass Daten empfangen werden, die eine Aufforderung für eine Agenten-basierte Berechnung von Sensordaten charakterisieren. Die Aufforderung enthält ein erforderliches Vertrauen und eine erforderliche Latenz für den Abschluss der Agenten-basierten Berechnung. Aufzufordernde Agenten werden basierend auf dem erforderlichen Vertrauen bestimmt. Es werden Daten übertragen, um die bestimmten Agenten aufzufordern, eine Analyse der Sensordaten auszuführen.From the US 2017/0099200 A1 It is known that data is received that characterizes a request for agent-based calculation of sensor data. The request contains the required trust and latency required to complete the agent-based calculation. Agents to request are determined based on the required trust. Data is transmitted to prompt the particular agents to perform an analysis of the sensor data.

Der Erfindung liegt die Aufgabe zugrunde, eine Möglichkeit zum Verbessern einer semantischen Segmentierung von Objekten in einem Bild in automobilem Computer Vision anzugeben.The invention has for its object to provide a way to improve a semantic segmentation of objects in an image in automotive vision computer.

Die Lösung der Aufgabe erfolgt durch den Gegenstand der unabhängigen Ansprüche. Bevorzugte Ausgestaltungen sind in den Unteransprüchen angegeben.The object is achieved by the subject matter of the independent claims. Preferred embodiments are specified in the subclaims.

Durch die Erfindung wird daher ein Verfahren zum Erzeugen von Eingabedaten für ein konvolutionelles neuronales Netzwerk unter Verwendung mindestens einer Kamera und mindestens eines Bereichssensors bereitgestellt, wobei die Kamera und der Bereichssensor derart am Kraftfahrzeug angeordnet sind, dass das Sichtfeld der Kamera und das Sichtfeld des Bereichssensors sich mindestens teilweise überlappen, wobei das Verfahren die folgenden Schritte aufweist:

- Erfassen eines Bildrahmens durch eine Kamera, wobei der Bildrahmen aus Bilddaten für Richtungen relativ zur Position der Kamera und innerhalb des durch die Kamera abgedeckten Raumwinkels besteht, wobei die Richtungen durch Koordinaten in einem Kamerakoordinatensystem dargestellt werden,
- gleichzeitiges Erfassen von Tiefeninformation durch den Bereichssensor, wobei die Tiefeninformation aus Tiefendaten für Richtungen relativ zur Position des Bereichssensors und innerhalb des durch den Bereichssensor abgedeckten Raumwinkels bestehen, wobei die Richtungen durch Koordinaten in einem Bereichssensorkoordinatensystem dargestellt werden,
- Bereitstellen eines Kraftfahrzeugkoordinatensystems, das mit dem Kamerakoordinatensystem und dem Bereichssensorkoordinatensystem durch entsprechende Sätze von Translationen und Rotationen in Beziehung steht, die durch die Position der Kamera und die Position des Bereichssensors relativ zum Ursprung des Kraftfahrzeugkoordinatensystems gegeben sind, und
- Transformieren der Koordinaten im Kamerakoordinatensystem und der Koordinaten im Bereichssensorkoordinatensystems in Koordinaten im Kraftfahrzeugkoordinatensystem auf der Basis der Sätze von Translationen und Rotationen, wodurch die Eingabedaten für das konvolutionelle neurale Netzwerk erhalten werden.

The invention therefore provides a method for generating input data for a convolutional neural network using at least one camera and at least one range sensor, wherein the camera and the range sensor are arranged on the motor vehicle such that the field of view of the camera and the field of view of the range sensor overlap at least in part, the method comprising the following steps:

Capturing an image frame by a camera, the image frame consisting of image data for directions relative to the position of the camera and within the solid angle covered by the camera, the directions being represented by coordinates in a camera coordinate system,
- simultaneous detection of depth information by the range sensor, the depth information consisting of depth data for directions relative to the position of the range sensor and within the solid angle covered by the range sensor, the directions being represented by coordinates in an area sensor coordinate system,
Providing a motor vehicle coordinate system related to the camera coordinate system and the area sensor coordinate system through respective sets of translations and rotations given by the position of the camera and the position of the area sensor relative to the origin of the motor vehicle coordinate system, and
Transforming the coordinates in the camera coordinate system and the coordinates in the area sensor coordinate system into coordinates in the motor vehicle coordinate system on the basis of the sets of translations and rotations, thereby obtaining the input data for the convolutional neural network.

Es ist also eine wesentliche Idee der Erfindung, dass die Eingabedaten für das konvolutionelle neuronale Netzwerk sowohl Bilddaten als auch Tiefendaten für gemeinsame Betrachtungsrichtungen relativ zum Ursprung des Kraftfahrzeugkoordinatensystems aufweisen, wobei die Richtungen durch Koordinaten des gemeinsamen Kraftfahrzeugkoordinatensystems dargestellt werden, das als ein gemeinsames Bezugssystem dient. Mit anderen Worten: Die Eingabedaten für das konvolutionelle neuronale Netzwerk bestehen aus Bilddaten und Tiefendaten für im Kraftfahrzeugkoordinatensystem dargestellte Richtungen, obwohl derartige Daten ursprünglich als Daten im Koordinatensystem der Kamera bzw. des Bereichssensors dargestellt waren. Die Transformation dieser Daten in das gemeinsame Kraftfahrzeugkoordinatensystem bietet die Möglichkeit, Daten von verschiedenen Sensoren/Kameras in einem gemeinsamen Datensatz zu verwenden, der in das konvolutionelle neuronale Netzwerk eingegeben wird. Vorzugsweise erfasst die Kamera fortlaufend Bildrahmen, und der Bereichssensor erfasst fortlaufend Tiefeninformation. Vorzugsweise wird als ein letzter Schritt des vorstehend beschriebenen Verfahrens der erzeugte Datensatz, der aus den Tiefendaten und den Bilddaten besteht, in das CNN eingegeben.It is thus an essential idea of the invention that the input data for the convolutional neural network have both image data and depth data for common viewing directions relative to the origin of the motor vehicle coordinate system, the directions being represented by coordinates of the common motor vehicle coordinate system serving as a common reference system. In other words, the input data for the convolutional neural network consists of image data and depth data for directions shown in the vehicle coordinate system, although such data was originally represented as data in the coordinate system of the camera and the area sensor, respectively. The transformation of this data into the common automotive coordinate system provides the ability to use data from various sensors / cameras in a common data set that is input to the convolutional neural network. Preferably, the camera continuously captures image frames, and the region sensor continuously captures depth information. Preferably, as a last step of the above-described method, the generated data set consisting of the depth data and the image data is input to the CNN.

Gemäß einer bevorzugten Ausführungsform der Erfindung weist das Verfahren ferner die folgenden Schritte auf:

- Darstellen der Koordinaten im Kamerakoordinatensystem durch eine Richtungskosinusmatrix und
- Darstellen der Koordinaten im Bereichssensorkoordinatensystem durch eine Richtungskosinusmatrix.

According to a preferred embodiment of the invention, the method further comprises the following steps:

Representing the coordinates in the camera coordinate system by a direction cosine matrix and
Representing the coordinates in the area sensor coordinate system by a direction cosine matrix.

Wie dem Fachmann bekannt ist, sind die Richtungskosinuswerte eines Vektors die Kosinuswerte der Winkel zwischen dem Vektor und den drei Koordinatenachsen. Gleichbedeutend sind sie die Beiträge jeder Komponente der Basis zu einem Einheitsvektor in dieser Richtung. Richtungskosinus ist eine analoge Erweiterung des üblichen Begriffs der Steigung auf höhere Dimensionen. Daher bezieht sich der Richtungskosinus auf den Kosinus des Winkels zwischen zwei beliebigen Vektoren. Sie werden unter anderem dazu verwendet, Richtungskosinusmatrizen zu bilden, die einen Satz orthonormaler Basisvektorer bezüglich eines anderen Satzes darstellen, oder zum Darstellen eines bekannten Vektors in einer anderen Basis.As known to those skilled in the art, the direction cosine values of a vector are the cosine values of the angles between the vector and the three coordinate axes. Equally meaningful are the contributions of each component of the base to a unit vector in that direction. Directional cosine is an analogous extension of the usual concept of slope to higher dimensions. Therefore, the direction cosine refers to the cosine of the angle between any two vectors. They are used inter alia to form direction cosine matrices representing one set of orthonormal basis vectors with respect to another set, or to represent a known vector in another base.

Vorzugsweise weist das Verfahren weiterhin die folgenden Schritte auf:

- Darstellen der Bilddaten durch einen Farbwert, vorzugsweise durch einen RGB-Wert, für jedes Koordinatentripel der Kosinusmatrix, und
- Darstellen der Tiefendaten durch einen Abstandswert für jedes Koordinatentripel der Kosinusmatrix.

Preferably, the method further comprises the following steps:

Representing the image data by a color value, preferably by an RGB value, for each coordinate triplet of the cosine matrix, and
Representing the depth data by a distance value for each coordinate triplet of the cosine matrix.

Auf diese Weise kann ein Datensatz, der einen Farbwert (als einen Teil des Bildrahmens) und einen jeweiligen Abstandswert (als einen Teil einer Tiefenkarte) für mehrere Richtungen relativ zum Ursprung des Kraftfahrzeugkoordinatensystems aufweist, in das CNN eingegeben und darin gemeinsam verarbeitet werden.In this way, a data set having a color value (as a part of the image frame) and a respective distance value (as a part of a depth map) for a plurality of directions relative to the origin of the motor vehicle coordinate system may be input to the CNN and processed together therein.

Allgemein können verschiedene Typen von Kameras verwendet werden. Gemäß einer bevorzugten Ausführungsform der Erfindung ist die Kamera jedoch eine Fischaugenkamera mit einem Sichtfeld, das mindestens 180°beträgt. F erner kann allgemein eine einzige Kamera für das erfindungsgemäße Verfahren ausreichend sein. Gemäß einer bevorzugten Ausführungsform der Erfindung werden jedoch mehrere Kameras zum Erzeugen der Eingabedaten für das konvolutionelle neuronale Netzwerk verwendet. Vorzugsweise haben diese Kameras unterschiedliche Sichtfelder. Noch bevorzugter decken diese Kameras die gesamte Umgebung des Kraftfahrzeugs ab.Generally, different types of cameras can be used. According to a preferred embodiment of the invention, however, the camera is a fisheye camera with a field of view that is at least 180 °. In general, a single camera may be sufficient for the method according to the invention. However, in accordance with a preferred embodiment of the invention, multiple cameras are used to generate the input data to the convolutional neural network. Preferably, these cameras have different fields of view. More preferably, these cameras cover the entire environment of the motor vehicle.

Ferner werden vorzugsweise mehrere Bereichssensoren zum Erzeugen der Eingabedaten für das konvolutionelle neuronale Netzwerk verwendet. Im Allgemeinen können diese Bereichssensoren vom gleichen Typ sein. Gemäß einer bevorzugten Ausführungsform der Erfindung weisen die Bereichssensoren jedoch mindestens zwei verschiedene Typen von Bereichssensoren auf, vorzugsweise mindestens einen LIDAR-Sensor und mindestens einen Ultraschallsensor. Vorzugsweise haben diese Bereichssensoren unterschiedliche Sichtfelder. Noch bevorzugter decken diese Bereichssensoren die gesamte Umgebung des Kraftfahrzeugs ab.Further, it is preferable to use a plurality of area sensors for generating the input data to the convolutional neural network. In general, these area sensors can be used by the same type. However, according to a preferred embodiment of the invention, the range sensors have at least two different types of range sensors, preferably at least one LIDAR sensor and at least one ultrasonic sensor. Preferably, these range sensors have different fields of view. More preferably, these range sensors cover the entire environment of the motor vehicle.

Die Erfindung betrifft auch die Verwendung eines vorstehend beschriebenen Verfahrens in einem Kraftfahrzeug, eine Sensoranordnung für ein Kraftfahrzeug, die dafür konfiguriert ist, ein derartiges Verfahren auszuführen, und ein nichtflüchtiges computerlesbares Medium, das darauf gespeicherte Befehle aufweist, die, wenn sie durch einen Prozessor ausgeführt werden, eine Sensoranordnung eines Kraftfahrzeugs veranlassen, ein solches Verfahren auszuführen.The invention also relates to the use of a method described above in a motor vehicle, a sensor assembly for a motor vehicle configured to perform such a method, and a non-transitory computer-readable medium having instructions stored thereon when executed by a processor be cause a sensor assembly of a motor vehicle to carry out such a method.

Es zeigen:

1 schematisch ein Kraftfahrzeug mit einer Sensoranordnung zum Erfassen eines Objekts gemäß einer bevorzugten Ausführungsform der Erfindung;
2 schematisch das Kamerakoordinatensystem und das Bereichssensorkoordinatensystem gemäß der bevorzugten Ausführungsform der Erfindung; und
3 schematisch das Kraftfahrzeugkoordinatensystem gemäß der bevorzugten Ausführungsform der Erfindung.

Show it:

1 schematically a motor vehicle with a sensor arrangement for detecting an object according to a preferred embodiment of the invention;
2 schematically the camera coordinate system and the area sensor coordinate system according to the preferred embodiment of the invention; and
3 schematically the motor vehicle coordinate system according to the preferred embodiment of the invention.

Wie in 1 schematisch dargestellt ist, wird in einem Kraftfahrzeug 1 gemäß einer bevorzugten Ausführungsform der Erfindung eine Sensoranordnung 2 mit einer Kamera 3, einer Auswerteeinheit 4, einem Ultraschallsensor 5 und einem LIDAR-Sensor 6 bereitgestellt. Wie durch gestrichelte Linien dargestellt ist, haben die Kamera 3, der Ultraschallsensor 5 und der LIDAR-Sensor 6 jeweilige Sichtfelder, die sich überlappen. Dies ermöglicht es, Szenen mit Bilddaten bzw. Tiefendaten zu erfassen, die in ein konvolutionelles neuronales Netzwerk eingegeben werden können, das in der Auswerteeinheit 4 zum Klassifizieren von Objekten, wie beispielsweise der Person 7, vor dem Kraftfahrzeug 1 vorgesehen ist.As in 1 is shown schematically, is in a motor vehicle 1 According to a preferred embodiment of the invention, a sensor arrangement 2 with a camera 3 , an evaluation unit 4 an ultrasonic sensor 5 and a LIDAR sensor 6 provided. As shown by dashed lines, have the camera 3 , the ultrasonic sensor 5 and the LIDAR sensor 6 respective fields of view that overlap. This makes it possible to capture scenes with image data or depth data that can be entered into a convolutional neural network that is in the evaluation unit 4 for classifying objects, such as the person 7 , in front of the motor vehicle 1 is provided.

Durch die Verwendung verschiedener Typen von Bereichssensoren 5, 6, d.h. eines Ultraschallsensors 5 und eines LIDAR-Sensors 6, ist es möglich, mehrere Eingabe-Tiefenkarten mit RGB-Bilddaten zu erzeugen, um ein CNN-Netzwerk zu verwenden, das Objekte erfassen und klassifizieren kann. Die Anwendung besteht hierbei darin, Fahrzeugsensoren, wie beispielsweise die Kamera 3, den Ultraschallsensor 5 und den LIDAR-Sensor 6, zu verwenden, um Tiefeninformation um ein Fahrzeug herum zu erzeugen und diese Daten mit Umgebungsansicht-Bilddaten zu kombinieren. Daher sind derartige Kraftfahrzeugsensoren vorzugsweise auf allen Seiten des Kraftfahrzeugs 1 derart angeordnet, dass die gesamte Umgebung des Kraftfahrzeugs überwacht werden kann. Aus Gründen der Klarheit konzentriert sich die vorliegende bevorzugte Ausführungsform der Erfindung nur auf die vorstehend als ein Beispiel erwähnten drei Kraftfahrzeugsensoren.By using different types of range sensors 5 . 6 ie an ultrasonic sensor 5 and a LIDAR sensor 6 , it is possible to generate multiple input depth maps with RGB image data to use a CNN network that can capture and classify objects. The application here is vehicle sensors, such as the camera 3 , the ultrasonic sensor 5 and the LIDAR sensor 6 to use to generate depth information around a vehicle and to combine this data with environmental view image data. Therefore, such automotive sensors are preferably on all sides of the motor vehicle 1 arranged such that the entire environment of the motor vehicle can be monitored. For the sake of clarity, the present preferred embodiment of the invention focuses only on the three automotive sensors mentioned above as an example.

Es ist ein wichtiger Aspekt der vorliegenden bevorzugten Ausführungsform der Erfindung, die Bereichssensoren 5, 6, d.h. den Ultraschallsensor 5 und den LIDAR-Sensor 6, im gleichen Koordinatensystem wie die Kameradaten zu codieren, um CNN-Eingabedaten zu erzeugen, die RGB- und Multi-Tiefen-Karten zusammen verwenden. Diese Eingabedaten können dann in ein konvolutionelles neuronales Netzwerk für eine Klassifizierung eingegeben werden.It is an important aspect of the present preferred embodiment of the invention, the range sensors 5 . 6 ie the ultrasonic sensor 5 and the LIDAR sensor 6 to encode in the same coordinate system as the camera data to produce CNN input data that uses RGB and multi-depth maps together. This input data can then be entered into a convolutional neural network for classification.

Wie in 2 schematisch dargestellt ist, hat jeder Sensor sein eigenes mechanisches Koordinatensystem. Hier sind aufgrund der Zweidimensionalität der Figur nur die x-Achsen und die z-Achsen dargestellt, d.h. x_C und z_C für die Kamera 3, x_U und z_U für den Ultraschallsensor 5 und x_L und z_L für den LIDAR-Sensor. Ferner ist, wie in 3 schematisch dargestellt ist, ein Kraftfahrzeugkoordinatensystem als ein gemeinsames Bezugskoordinatensystem für alle Kraftfahrzeugsensoren 3, 5, 6 definiert. Das Kraftfahrzeugkoordinatensystem hat seinen Ursprung (0, 0, 0) in der Mitte des vorderen Abschnitts des Kraftfahrzeugs 1 auf Straßenniveau. Bezüglich der jeweiligen Positionen der Kraftfahrzeugsensoren 3, 5, 6 (und aller anderen Kraftfahrzeugsensoren, die am Kraftfahrzeug angeordnet sein können) existiert ein Satz von Rotationen und Translationen zum Definieren der Beziehung zwischen jedem Sensor und dem Kraftfahrzeugkoordinatensystem. Alle Sensordaten können dann in das Kraftfahrzeugkoordinatensystem als ein gemeinsames Bezugssystem überführt und an das CNN übertragen werden.As in 2 is shown schematically, each sensor has its own mechanical coordinate system. Here are due to the two-dimensionality of the figure, only the x-axes and the z -Axis represented, ie x _C and z _C for the camera 3 . x _U and z _U for the ultrasonic sensor 5 and x _L and z _L for the LIDAR sensor. Furthermore, as in 3 is shown schematically, a motor vehicle coordinate system as a common reference coordinate system for all vehicle sensors 3 . 5 . 6 Are defined. The motor vehicle coordinate system has its origin ( 0 . 0 . 0 ) in the middle of the front portion of the motor vehicle 1 at street level. With regard to the respective positions of the motor vehicle sensors 3 . 5 . 6 (and all other automotive sensors that may be mounted on the motor vehicle) there is a set of rotations and translations for defining the relationship between each sensor and the vehicle coordinate system. All sensor data may then be transferred to the motor vehicle coordinate system as a common frame of reference and transmitted to the CNN.

Im Detail wird dieses Verfahren gemäß der vorliegenden bevorzugten Ausführungsform der Erfindung wie folgt implementiert:In detail, this method according to the present preferred embodiment of the invention is implemented as follows:

Durch die Kamera 3 werden fortlaufend Bildrahmen aufgenommen, wobei die Bildrahmen aus Bilddaten für Richtungen relativ zur Position der Kamera 3 und innerhalb des von der Kamera 3 abgedeckten Raumwinkels bestehen, wobei die Richtungen durch Koordinaten im vorstehend beschriebenen Kamerakoordinatensystem dargestellt werden. Gleichzeitig wird Tiefeninformation durch die Bereichssensoren 5, 6, d.h. den Ultraschallsensor 5 und den LIDAR-Sensor 6, erfasst, wobei die Tiefeninformation aus Tiefendaten für Richtungen relativ zu den Positionen der Bereichssensoren 5, 6 und innerhalb der durch die Bereichssensoren 5, 6 abgedeckten Raumwinkel bestehen, wobei die Richtungen durch Koordinaten in den Koordinatensystemen der Bereichssensoren dargestellt werden.Through the camera 3 Image frames are continuously recorded, with the image frames being image data for directions relative to the position of the camera 3 and within of the camera 3 covered spatial angle, wherein the directions are represented by coordinates in the above-described camera coordinate system. At the same time, depth information is provided by the range sensors 5 . 6 ie the ultrasonic sensor 5 and the LIDAR sensor 6 , wherein the depth information is from depth data for directions relative to the positions of the range sensors 5 . 6 and within through the area sensors 5 . 6 Covered solid angle, wherein the directions are represented by coordinates in the coordinate systems of the area sensors.

Wie vorstehend beschrieben wurde, wird ein Kraftfahrzeugkoordinatensystem bereitgestellt, das mit dem Kamerakoordinatensystem und den Koordinatensystemen der Bereichssensoren durch jeweilige Sätze von Translationen und Rotationen in Beziehung steht, die jeweils durch die Position der Kamera 3 und die Positionen der Bereichssensoren 5, 6 relativ zum Ursprung des Kraftfahrzeugkoordinatensystems gegeben sind. Dann werden die Koordinaten im Kamerakoordinatensystem und die Koordinaten in den Bereichssensorkoordinatensystemen auf der Basis der Sätze von Translationen und Rotationen in Koordinaten im Fahrzeugkoordinatensystem transformiert. Auf diese Weise werden die Eingabedaten für das konvolutionelle neuronale Netzwerk erhalten und in das CNN zur Objektklassifizierung eingegeben.As described above, there is provided an automotive coordinate system related to the camera coordinate system and the coordinate systems of the area sensors through respective sets of translations and rotations, each by the position of the camera 3 and the positions of the area sensors 5 . 6 given relative to the origin of the motor vehicle coordinate system. Then, the coordinates in the camera coordinate system and the coordinates in the area sensor coordinate systems are transformed based on the sets of translations and rotations in coordinates in the vehicle coordinate system. In this way, the input data for the convolutional neural network is obtained and input to the CNN for object classification.

Gemäß der hierin beschriebenen bevorzugten Ausführungsform der Erfindung werden die Koordinaten im Kamerakoordinatensystem und die Koordinaten im Bereichssensorkoordinatensystem durch eine jeweilige Richtungskosinusmatrix dargestellt. Ferner werden die Bilddaten durch einen Farbwert, d.h. durch einen RGB-Wert, für jedes Koordinatentripel der Kosinusmatrix dargestellt, und die Tiefendaten werden durch einen Abstandswert für jedes Koordinatentripel der Kosinusmatrix dargestellt.According to the preferred embodiment of the invention described herein, the coordinates in the camera coordinate system and the coordinates in the range sensor coordinate system are represented by a respective direction cosine matrix. Further, the image data is represented by a color value, i. by an RGB value, for each coordinate triplet of the cosine matrix, and the depth data is represented by a distance value for each coordinate triplet of the cosine matrix.

Auf diese Weise kann unter Verwendung von Bildinformation von der Kamera 3 zusammen mit Tiefeninformation von verschiedenen Bereichssensoren 5, 6 eine semantische Segmentierung von Objekten in einem Bild in automobilem Computer Vision wesentlich verbessert werden.This way, using image information from the camera 3 together with depth information from different area sensors 5 . 6 A semantic segmentation of objects in an image in automotive computer vision can be significantly improved.

BezugszeichenlisteLIST OF REFERENCE NUMBERS

11: Kraftfahrzeugmotor vehicle
22: Sensoranordnungsensor arrangement
33: Kameracamera
44: Auswerteeinheitevaluation
55: Ultraschallsensorultrasonic sensor
66: LIDAR-SensorLIDAR sensor
77: Personperson

ZITATE ENTHALTEN IN DER BESCHREIBUNG QUOTES INCLUDE IN THE DESCRIPTION

Diese Liste der vom Anmelder aufgeführten Dokumente wurde automatisiert erzeugt und ist ausschließlich zur besseren Information des Lesers aufgenommen. Die Liste ist nicht Bestandteil der deutschen Patent- bzw. Gebrauchsmusteranmeldung. Das DPMA übernimmt keinerlei Haftung für etwaige Fehler oder Auslassungen.This list of the documents listed by the applicant has been generated automatically and is included solely for the better information of the reader. The list is not part of the German patent or utility model application. The DPMA assumes no liability for any errors or omissions.

Zitierte PatentliteraturCited patent literature

US 2017/0099200 A1 [0007]

Claims

Method for generating input data for a convolutional neural network using at least one camera (3) and at least one area sensor (5, 6), wherein the camera (3) and the area sensor (5, 6) are arranged on the motor vehicle (1) in that the field of view of the camera (3) and the field of view of the area sensor (5, 6) overlap at least partially, the method comprising the following steps: Capturing an image frame by a camera (3), the image frame consisting of image data for directions relative to the position of the camera (3) and within the solid angle covered by the camera (3), the directions being represented by coordinates in a camera coordinate system; simultaneous detection of depth information by the area sensor (5, 6), the depth information consisting of depth data for directions relative to the position of the area sensor (5, 6) and within the space angle covered by the area sensor (5, 6), the directions passing through Coordinates are displayed in an area sensor coordinate system; Providing a motor vehicle coordinate system related to the camera coordinate system and the area sensor coordinate system by respective sets of translations and rotations given by the position of the camera (3) and the position of the area sensor (5, 6) relative to the origin of the motor vehicle coordinate system; and Transforming the coordinates in the camera coordinate system and the coordinates in the area sensor coordinate system into coordinates in the motor vehicle coordinate system on the basis of the sets of translations and rotations, thereby obtaining the input data for the convolutional neural network.

Method according to Claim 1 the method further comprising the steps of: - representing the coordinates in the camera coordinate system by a direction cosine matrix; and - representing the coordinates in the area sensor coordinate system by a direction cosine matrix.

Method according to Claim 1 or 2 wherein the method further comprises the steps of: - representing the image data by a color value, preferably by an RGB value, for each coordinate triplet of the cosine matrix; and - representing the depth data by a distance value for each coordinate triplet of the cosine matrix.

Method according to one of the preceding claims, wherein the camera (3) is a fisheye camera with a field of view which is at least 180 °.

Method according to one of the preceding claims, wherein a plurality of cameras (3) are used to generate the input data for the convolutional neural network.

Method according to one of the preceding claims, wherein a plurality of area sensors (5, 6) are used for generating the input data for the convolutional neural network.

Method according to Claim 6 wherein the area sensors (5, 6) comprise at least two different types of area sensors, preferably at least one LIDAR sensor (6) and at least one ultrasonic sensor (5).

Use of the method according to one of the preceding claims in a motor vehicle (1).

Sensor arrangement (2) for a motor vehicle (1), which is configured to perform the method according to one of Claims 1 to 8th perform.

A non-transitory computer-readable medium having instructions stored thereon that, when executed by a processor, cause a sensor assembly (2) of a motor vehicle (1) to perform the method of any one of Claims 1 to 8th perform.