WO2019137915A1 - Generating input data for a convolutional neuronal network - Google Patents

Generating input data for a convolutional neuronal network Download PDF

Info

Publication number
WO2019137915A1
WO2019137915A1 PCT/EP2019/050343 EP2019050343W WO2019137915A1 WO 2019137915 A1 WO2019137915 A1 WO 2019137915A1 EP 2019050343 W EP2019050343 W EP 2019050343W WO 2019137915 A1 WO2019137915 A1 WO 2019137915A1
Authority
WO
WIPO (PCT)
Prior art keywords
camera
coordinate system
range sensor
coordinates
automotive vehicle
Prior art date
Application number
PCT/EP2019/050343
Other languages
French (fr)
Inventor
Stephen FOY
Rosalia BARROS-QUINTANA
Ian Clancy
Original Assignee
Connaught Electronics Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Connaught Electronics Ltd. filed Critical Connaught Electronics Ltd.
Publication of WO2019137915A1 publication Critical patent/WO2019137915A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]

Definitions

  • the invention relates to a method for generating input data for a convolutional neuronal network using at least one camera and at least one range sensor.
  • CNN convolution neural network
  • CNNs are highly successful at classification and categorization tasks but much of the research is on standard photometric RGB images and is not focused on embedded automotive devices. Automotive hardware devices need to have low power consumption requirements and thus low computational power.
  • a convolutional neural is a class of deep, feed-forward artificial neural networks that has successfully been applied to analyzing visual imagery.
  • CNNs use a variation of multilayer perceptrons designed to require minimal preprocessing.
  • Convolutional networks were inspired by biological processes in which the connectivity pattern between neurons is inspired by the organization of the animal visual cortex. Individual cortical neurons respond to stimuli only in a restricted region of the visual field known as the receptive field. The receptive fields of different neurons partially overlap such that they cover the entire visual field.
  • CNNs use relatively little pre-processing compared to other image classification algorithms. This means that the network learns the filters that in traditional algorithms were hand-engineered. This independence from prior knowledge and human effort in feature design is a major advantage. CNNs have applications in image and video recognition, recommender systems and natural language processing.
  • the invention provides a method for generating input data for a convolutional neuronal network, using a at least one camera and at least one range sensor, the camera and the range sensor being arranged on the automotive vehicle in such a way that the field of view of the camera at least partially overlaps with the field of view of the range sensor, the method comprising the following method steps:
  • the image frame being comprised of image data for directions relative to the position of the camera and within the solid angle seen by the camera, the directions being expressed by coordinates in a camera coordinate system,
  • the depth information being comprised of depth data for directions relative to the position of the range sensor and within the solid angle seen by the range sensor, the directions being expressed by coordinates in a range sensor coordinate system, providing an automotive vehicle coordinate system which is related to the camera coordinate system and the range sensor coordinate system by respective sets of translations and rotations given by the position of the camera and the position of the range sensor relative to the origin of the automotive vehicle coordinate system, respectively,
  • the input data for the convolutional neuronal network comprises both image data and depth data for common viewing directions relative to the origin of the automotive vehicle coordinate system, the directions being expressed with coordinates of the common automotive vehicle coordinate system which serves as a common frame.
  • the input data for the convolutional neural network is comprised of image data and depth data for directions expressed in the automotive vehicle coordinate system though such data was originally captured and expressed as data in the coordinate system of the camera or the range sensor, respectively.
  • the transformation of this data into the common automotive coordinate system provides for the possibility of using data from different
  • the sensors/cameras in a common data set which is input into the convolutional neural network.
  • the camera consecutively acquires image frames and the range sensor consecutively acquires depth information.
  • the generated data set comprised by the depth data and the image data is input into the CNN.
  • the method further comprises the following steps:
  • Direction cosines are an analogous extension of the usual notion of slope to higher dimensions. Hence, direction cosine refers to the cosine of the angle between any two vectors. They are inter alia used for forming direction cosine matrices that express one set of orthonormal basis vectors in terms of another set, or for expressing a known vector in a different basis.
  • the method further comprises the following steps:
  • a data set comprising a color value (as a part of the image frame) and a respective distance value (as a part of a depth map) for multiple directions relative to the origin of the automotive vehicle coordinate system may be input into the CNN and processed therein together.
  • the camera is a fish eye camera with a filed of view which is at least 180°.
  • a single camera may be sufficient for the method according to the invention.
  • multiple cameras are used for generating the input data for the convolutional neuronal network.
  • these cameras have different fields of view. Even more preferably, these cameras cover the complete surrounding of the automotive vehicle.
  • multiple range sensors are used for generating the input data for the convolutional neuronal network.
  • these range sensors may be of the same type.
  • the range sensors comprise at least two different types of range sensors, preferably at least a LIDAR sensor and at least an ultrasonic sensor.
  • these range sensors have different fields of view. Even more preferably, these range sensor cover the complete surrounding of the automotive vehicle.
  • the invention also relates to the use of a method as described above in an automotive vehicle, to a sensor arrangement for an automotive vehicle configured for performing such a method, and to a non-transitory computer-readable medium, comprising instructions stored thereon, that when executed on a processor, induce a sensor arrangement of an automotive vehicle to perform such a method.
  • Fig. 1 schematically depicts an automotive vehicle with a sensor arrangement capturing an object according to a preferred embodiment of the invention
  • Fig. 2 schematically depicts the camera coordinate system and the range
  • Fig. 3 schematically depicts the automotive vehicle coordinate system
  • a sensor arrangement 2 comprising a camera 3, an evaluation unit 4, an ultrasonic sensor 5, and a LIDAR sensor 6 is provided.
  • the camera 3, the ultrasonic sensor 5, and the LIDAR sensor 6 comprises respective fields of view which overlap which each other. This allows to captures scenes with image data and depth data, respectively, which may be input into a convolutional neural network incorporated in the an evaluation unit 4 for classification of objects like the person 7 in front of the automotive vehicle 1 .
  • range sensors 5, 6, i.e. an ultrasonic sensor 5 and a LIDAR sensor 6 it is possible to create multiple input depth maps with RGB image data to use a CNN network that can detect and classify objects.
  • the application here is to use automotive sensors like the camera 3, the ultrasonic sensor 5 and the LIDAR sensor 6 to create depth information around a vehicle and combine this data with surround view image data. Therefore, such automotive sensors are preferably arranged on all sides of the automotive vehicle 1 in such a way that the complete surrounding of the automotive vehicle can be monitored.
  • the present preferred embodiment of the invention only refers to the three automotive sensors mentioned above as an example.
  • each sensor has its own mechanical coordinate system.
  • the x-axes and the z-axes are depicted, i.e. xc and z c for the camera 3, xu and zu for the ultrasonic sensor 5, and xi_ and zi_ for the LIDAR sensor 6.
  • an automotive vehicle coordinate system is defined as a common reference coordinate system for all automotive sensors 3, 5, 6 is defined.
  • the automotive vehicle coordinate system has its origin (0, 0, 0) in the middle of the front portion of the automotive vehicle 1 at street level.
  • image frame are consecutively acquired, the image frames being comprised of image data for directions relative to the position of the camera 3 and within the solid angle seen by the camera 3, the directions being expressed by coordinates in the camera coordinate system described above.
  • Simultaneously depth information with the range sensors 5, 6, i.e. the ultrasonic sensor 5 and the LIDAR sensor 6 is acquired, the depth information being comprised of depth data for directions relative to the positions of the range sensors 5, 6 and within the solid angles seen by the range sensors 5, 6, the directions being expressed by coordinates in the range sensors’ coordinate systems.
  • an automotive vehicle coordinate system is provided which is related to the camera coordinate system and the range sensors’ coordinate systems by respective sets of translations and rotations given by the position of the camera 3 and the positions of the range sensors 5, 6 relative to the origin of the automotive vehicle coordinate system, respectively. Then, the coordinates in the camera coordinate system and the coordinates in the range sensors coordinate systems are transformed into coordinates in the automotive coordinate system on the basis of the sets of translations and rotations. In this way the input data for the convolutional neural network is yielded and input in the CNN for object classification.
  • the coordinates in the camera coordinate system and the coordinates in the range sensor coordinate system are both expressed by a respective direction cosine matrix.
  • the image data is expressed by a color value, i.e. by a RGB value, for each coordinate triple of the cosine matrix
  • the depth data is expressed by a distance value for each coordinate triple of the cosine matrix.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Biophysics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

The present invention relates to a method for generating input data for a convolutional neuronal network, using a at least one camera (3) and at least one range sensor (5, 6), the camera (3) and the range sensor (5, 6) being arranged on the automotive vehicle (1) in such a way that the field of view of the camera (3) at least partially overlaps with the field of view of the range sensor (5, 6), the method comprising the following method steps: - acquiring an image frame with the camera (3), the image frame being comprised of image data for directions relative to the position of the camera (3) and within the solid angle seen by the camera (3), the directions being expressed by coordinates in a camera coordinate system, - simultaneously acquiring depth information with the range sensor (5, 6), the depth information being comprised of depth data for directions relative to the position of the range sensor (5, 6) and within the solid angle seen by the range sensor (5, 6), the directions being expressed by coordinates in a range sensor coordinate system, - providing an automotive vehicle coordinate system which is related to the camera coordinate system and the range sensor coordinate system by respective sets of translations and rotations given by the position of the camera (3) and the position of the range sensor (5, 6) relative to the origin of the automotive vehicle coordinate system, respectively, - transforming the coordinates in the camera coordinate system and the coordinates in the range sensor coordinate system into coordinates in the automotive coordinate system on the basis of the sets of translations and rotations yielding the input data for the convolutional neural network. In this way, semantic segmentation of objects in an image in automotive computer vision can be enhanced.

Description

Generating input data for a convolutional neuronal network
The invention relates to a method for generating input data for a convolutional neuronal network using at least one camera and at least one range sensor.
One of the most fundamental problems in automotive computer vision is the semantic segmentation of objects in an image. The segmentation approach refers to the problems of associating every pixel to its corresponding object class. In recent times, there was a surge of convolution neural network (CNN) research and design aided by increase in computational power in computer architectures and the availability of large annotated datasets.
CNNs are highly successful at classification and categorization tasks but much of the research is on standard photometric RGB images and is not focused on embedded automotive devices. Automotive hardware devices need to have low power consumption requirements and thus low computational power.
In machine learning, a convolutional neural is a class of deep, feed-forward artificial neural networks that has successfully been applied to analyzing visual imagery. CNNs use a variation of multilayer perceptrons designed to require minimal preprocessing. Convolutional networks were inspired by biological processes in which the connectivity pattern between neurons is inspired by the organization of the animal visual cortex. Individual cortical neurons respond to stimuli only in a restricted region of the visual field known as the receptive field. The receptive fields of different neurons partially overlap such that they cover the entire visual field.
CNNs use relatively little pre-processing compared to other image classification algorithms. This means that the network learns the filters that in traditional algorithms were hand-engineered. This independence from prior knowledge and human effort in feature design is a major advantage. CNNs have applications in image and video recognition, recommender systems and natural language processing.
The article“Multimodal Deep Learning for Robust RGB-D Object Recognition Andreas Eitel, Jost Tobias Springenberg, Luciano Spinello, Martin Riedmiller, Wolfram Burgard IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),
Hamburg, Germany, 2015” proposes a RGB-D architecture for object recognition. This architecture is composed of two separate CNN processing streams - one for each modality - which are consecutively combined with a late fusion network. The focus is on learning with imperfect sensor data, a typical problem in real-world robotics tasks. For accurate learning, a multi-stage training methodology and two crucial ingredients for handling depth data with CNNs are introduced. The first, an effective encoding of depth information for CNNs that enables learning without the need for large depth datasets. The second, a data augmentation scheme for robust learning with depth images by corrupting them with realistic noise patterns.
From US 2017/0099200 A1 it is known that data is received characterizing a request for agent computation of sensor data. The request includes a required confidence and required latency for completion of the agent computation. Agents to query are determined based on the required confidence. Data is transmitted to query the determined agents to provide analysis of the sensor data.
It is an objective of the present invention to provide a possibility for enhancing semantic segmentation of objects in an image in automotive computer vision.
This object is addressed by the subject matter of the independent claims. Preferred embodiments are described in the sub claims.
Therefore, the invention provides a method for generating input data for a convolutional neuronal network, using a at least one camera and at least one range sensor, the camera and the range sensor being arranged on the automotive vehicle in such a way that the field of view of the camera at least partially overlaps with the field of view of the range sensor, the method comprising the following method steps:
acquiring an image frame with the camera, the image frame being comprised of image data for directions relative to the position of the camera and within the solid angle seen by the camera, the directions being expressed by coordinates in a camera coordinate system,
simultaneously acquiring depth information with the range sensor, the depth information being comprised of depth data for directions relative to the position of the range sensor and within the solid angle seen by the range sensor, the directions being expressed by coordinates in a range sensor coordinate system, providing an automotive vehicle coordinate system which is related to the camera coordinate system and the range sensor coordinate system by respective sets of translations and rotations given by the position of the camera and the position of the range sensor relative to the origin of the automotive vehicle coordinate system, respectively,
transforming the coordinates in the camera coordinate system and the coordinates in the range sensor camera system into coordinates in the automotive camera system on the basis of the sets of translations and rotations yielding the input data for the convolutional neural network.
Hence, it is an essential idea of the invention that the input data for the convolutional neuronal network comprises both image data and depth data for common viewing directions relative to the origin of the automotive vehicle coordinate system, the directions being expressed with coordinates of the common automotive vehicle coordinate system which serves as a common frame. In other words: The input data for the convolutional neural network is comprised of image data and depth data for directions expressed in the automotive vehicle coordinate system though such data was originally captured and expressed as data in the coordinate system of the camera or the range sensor, respectively. The transformation of this data into the common automotive coordinate system provides for the possibility of using data from different
sensors/cameras in a common data set which is input into the convolutional neural network. Preferably, the camera consecutively acquires image frames and the range sensor consecutively acquires depth information. Preferably, as a last step of the method described before, the generated data set comprised by the depth data and the image data is input into the CNN.
According to a preferred embodiment of the invention, the method further comprises the following steps:
expressing the coordinates in the camera coordinate system by a direction cosine matrix, and
expressing the coordinates in the range sensor coordinate system by a direction cosine matrix. As known to the man skilled in the art, the direction cosines of a vector are
the cosines of the angles between the vector and the three coordinate axes.
Equivalently, they are the contributions of each component of the basis to a unit vector in that direction. Direction cosines are an analogous extension of the usual notion of slope to higher dimensions. Hence, direction cosine refers to the cosine of the angle between any two vectors. They are inter alia used for forming direction cosine matrices that express one set of orthonormal basis vectors in terms of another set, or for expressing a known vector in a different basis.
Preferably, the method further comprises the following steps:
expressing the image data by a color value, preferably by a RGB value, for each coordinate triple of the cosine matrix, and
expressing the depth data by a distance value for each coordinate triple of the cosine matrix.
In this way, a data set comprising a color value (as a part of the image frame) and a respective distance value (as a part of a depth map) for multiple directions relative to the origin of the automotive vehicle coordinate system may be input into the CNN and processed therein together.
In general, different types of cameras may be used. However, according to a preferred embodiment of the invention, the camera is a fish eye camera with a filed of view which is at least 180°. Further, in general, a single camera may be sufficient for the method according to the invention. However, according to a preferred embodiment of the invention, multiple cameras are used for generating the input data for the convolutional neuronal network. Preferably, these cameras have different fields of view. Even more preferably, these cameras cover the complete surrounding of the automotive vehicle.
Furthermore, preferably multiple range sensors are used for generating the input data for the convolutional neuronal network. In general, these range sensors may be of the same type. However, according to a preferred embodiment of the invention, the range sensors comprise at least two different types of range sensors, preferably at least a LIDAR sensor and at least an ultrasonic sensor. Preferably, these range sensors have different fields of view. Even more preferably, these range sensor cover the complete surrounding of the automotive vehicle.
The invention also relates to the use of a method as described above in an automotive vehicle, to a sensor arrangement for an automotive vehicle configured for performing such a method, and to a non-transitory computer-readable medium, comprising instructions stored thereon, that when executed on a processor, induce a sensor arrangement of an automotive vehicle to perform such a method.
In the drawings:
Fig. 1 schematically depicts an automotive vehicle with a sensor arrangement capturing an object according to a preferred embodiment of the invention,
Fig. 2 schematically depicts the camera coordinate system and the range
sensor coordinate system according to the preferred embodiment of the invention, and
Fig. 3 schematically depicts the automotive vehicle coordinate system
according to the preferred embodiment of the invention.
As schematically depicted in Fig. 1 , according to a preferred embodiment of the invention, in an automotive vehicle 1 a sensor arrangement 2 comprising a camera 3, an evaluation unit 4, an ultrasonic sensor 5, and a LIDAR sensor 6 is provided. As depicted by dashed lines the camera 3, the ultrasonic sensor 5, and the LIDAR sensor 6 comprises respective fields of view which overlap which each other. This allows to captures scenes with image data and depth data, respectively, which may be input into a convolutional neural network incorporated in the an evaluation unit 4 for classification of objects like the person 7 in front of the automotive vehicle 1 .
By using different types of range sensors 5, 6, i.e. an ultrasonic sensor 5 and a LIDAR sensor 6 it is possible to create multiple input depth maps with RGB image data to use a CNN network that can detect and classify objects. The application here is to use automotive sensors like the camera 3, the ultrasonic sensor 5 and the LIDAR sensor 6 to create depth information around a vehicle and combine this data with surround view image data. Therefore, such automotive sensors are preferably arranged on all sides of the automotive vehicle 1 in such a way that the complete surrounding of the automotive vehicle can be monitored. For the sake of clarity, the present preferred embodiment of the invention only refers to the three automotive sensors mentioned above as an example.
It is an important aspect of the present preferred embodiment of the invention to encode the range sensors 5, 6, i.e. the ultrasonic sensor 5 and the LIDAR sensor 6, in the same coordinate system as the camera data to create CNN input data that use RGB and multi depths maps together. This input data can then be input to a convolution neural network for classification.
As schematically depicted in Fig. 2, each sensor has its own mechanical coordinate system. Here, due to the two-dimensionality of the figure only the x-axes and the z-axes are depicted, i.e. xc and zc for the camera 3, xu and zu for the ultrasonic sensor 5, and xi_ and zi_ for the LIDAR sensor 6. Further, as schematically depicted in Fig. 3 an automotive vehicle coordinate system is defined as a common reference coordinate system for all automotive sensors 3, 5, 6 is defined. The automotive vehicle coordinate system has its origin (0, 0, 0) in the middle of the front portion of the automotive vehicle 1 at street level. With respect to the respective positions of the automotive sensors 3, 5, 6 (and to all other automotive sensors which may be arranged on the automotive vehicle) there exists a set of rotations and translations to define the relationship between each sensor and the automotive vehicle coordinate system. All sensor data can then be translated into the automotive vehicle coordinate system as a common frame of reference and passed into the CNN.
In detail, this method according to the present preferred embodiment of the invention is as follows:
With the camera 3, image frame are consecutively acquired, the image frames being comprised of image data for directions relative to the position of the camera 3 and within the solid angle seen by the camera 3, the directions being expressed by coordinates in the camera coordinate system described above. Simultaneously depth information with the range sensors 5, 6, i.e. the ultrasonic sensor 5 and the LIDAR sensor 6 is acquired, the depth information being comprised of depth data for directions relative to the positions of the range sensors 5, 6 and within the solid angles seen by the range sensors 5, 6, the directions being expressed by coordinates in the range sensors’ coordinate systems.
As described before, an automotive vehicle coordinate system is provided which is related to the camera coordinate system and the range sensors’ coordinate systems by respective sets of translations and rotations given by the position of the camera 3 and the positions of the range sensors 5, 6 relative to the origin of the automotive vehicle coordinate system, respectively. Then, the coordinates in the camera coordinate system and the coordinates in the range sensors coordinate systems are transformed into coordinates in the automotive coordinate system on the basis of the sets of translations and rotations. In this way the input data for the convolutional neural network is yielded and input in the CNN for object classification.
According to the preferred embodiment of the invention described here, the coordinates in the camera coordinate system and the coordinates in the range sensor coordinate system are both expressed by a respective direction cosine matrix. Further, the image data is expressed by a color value, i.e. by a RGB value, for each coordinate triple of the cosine matrix, and the depth data is expressed by a distance value for each coordinate triple of the cosine matrix.
In this way, by using image information from the camera 3 together with depth information from different range sensors 5, 6 semantic segmentation of objects in an image in automotive computer vision can be greatly enhanced. Reference signs list
automotive vehicle
sensor arrangement
camera
evaluation unit
ultrasonic sensor
LIDAR sensor
person

Claims

Claims
1. Method for generating input data for a convolutional neuronal network, using a at least one camera (3) and at least one range sensor (5, 6), the camera (3) and the range sensor (5, 6) being arranged on the automotive vehicle (1 ) in such a way that the field of view of the camera (3) at least partially overlaps with the field of view of the range sensor (5, 6), the method comprising the following method steps:
acquiring an image frame with the camera (3), the image frame being comprised of image data for directions relative to the position of the camera (3) and within the solid angle seen by the camera (3), the directions being expressed by coordinates in a camera coordinate system,
simultaneously acquiring depth information with the range sensor (5, 6), the depth information being comprised of depth data for directions relative to the position of the range sensor (5, 6) and within the solid angle seen by the range sensor (5, 6), the directions being expressed by coordinates in a range sensor coordinate system,
providing an automotive vehicle coordinate system which is related to the camera coordinate system and the range sensor coordinate system by respective sets of translations and rotations given by the position of the camera (3) and the position of the range sensor (5, 6) relative to the origin of the automotive vehicle coordinate system, respectively,
transforming the coordinates in the camera coordinate system and the coordinates in the range sensor coordinate system into coordinates in the automotive coordinate system on the basis of the sets of translations and rotations yielding the input data for the convolutional neural network.
2. Method according to claim 1 , the method further comprising the following steps: expressing the coordinates in the camera coordinate system by a direction cosine matrix, and
expressing the coordinates in the range sensor coordinate system by a direction cosine matrix.
3. Method according to claim 1 or 2, the method further comprising the following steps: expressing the image data by a color value, preferably by a RGB value, for each coordinate triple of the cosine matrix, and
expressing the depth data by a distance value for each coordinate triple of the cosine matrix.
4. Method according to any of the preceding claims, wherein the camera (3) is a fish eye camera with a filed of view which is at least 180°.
5. Method according to any of the preceding claims, wherein multiple cameras (3) are used for generating the input data for the convolutional neuronal network.
6. Method according to any of the preceding claims, wherein multiple range sensors (5, 6) are used for generating the input data for the convolutional neuronal network.
7. Method according to claim 6, wherein the range sensors (5, 6) comprise at least two different types of range sensors, preferably at least a LIDAR sensor (6) and at least an ultrasonic sensor (5).
8. Use of the method according to any of the previous claims in an automotive vehicle (1 ).
9. Sensor arrangement (2) for an automotive vehicle (1 ) configured for performing the method according to any of claims 1 to 8.
10. Non-transitory computer-readable medium, comprising instructions stored thereon, that when executed on a processor, induce a sensor arrangement (2) of an automotive vehicle (1 ) to perform the method of any of claims 1 to 8.
PCT/EP2019/050343 2018-01-09 2019-01-08 Generating input data for a convolutional neuronal network WO2019137915A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
DE102018100315.3A DE102018100315A1 (en) 2018-01-09 2018-01-09 Generating input data for a convolutional neural network
DE102018100315.3 2018-01-09

Publications (1)

Publication Number Publication Date
WO2019137915A1 true WO2019137915A1 (en) 2019-07-18

Family

ID=65013693

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/EP2019/050343 WO2019137915A1 (en) 2018-01-09 2019-01-08 Generating input data for a convolutional neuronal network

Country Status (2)

Country Link
DE (1) DE102018100315A1 (en)
WO (1) WO2019137915A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114419323A (en) * 2022-03-31 2022-04-29 华东交通大学 Cross-modal learning and domain self-adaptive RGBD image semantic segmentation method
CN114882727A (en) * 2022-03-15 2022-08-09 深圳市德驰微视技术有限公司 Parking space detection method based on domain controller, electronic device and storage medium
WO2023077432A1 (en) * 2021-11-05 2023-05-11 深圳市大疆创新科技有限公司 Movable platform control method and apparatus, and movable platform and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170060254A1 (en) * 2015-03-03 2017-03-02 Nvidia Corporation Multi-sensor based user interface
US20170099200A1 (en) 2015-10-06 2017-04-06 Evolv Technologies, Inc. Platform for Gathering Real-Time Analysis
WO2018000039A1 (en) * 2016-06-29 2018-01-04 Seeing Machines Limited Camera registration in a multi-camera system
EP3438776A1 (en) * 2017-08-04 2019-02-06 Bayerische Motoren Werke Aktiengesellschaft Method, apparatus and computer program for a vehicle
EP3438777A1 (en) * 2017-08-04 2019-02-06 Bayerische Motoren Werke Aktiengesellschaft Method, apparatus and computer program for a vehicle
EP3438872A1 (en) * 2017-08-04 2019-02-06 Bayerische Motoren Werke Aktiengesellschaft Method, apparatus and computer program for a vehicle

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5475768A (en) * 1993-04-29 1995-12-12 Canon Inc. High accuracy optical character recognition using neural networks with centroid dithering
US5642431A (en) * 1995-06-07 1997-06-24 Massachusetts Institute Of Technology Network-based system and method for detection of faces and the like
WO2016145379A1 (en) * 2015-03-12 2016-09-15 William Marsh Rice University Automated Compilation of Probabilistic Task Description into Executable Neural Network Specification
US9633282B2 (en) * 2015-07-30 2017-04-25 Xerox Corporation Cross-trained convolutional neural networks using multimodal images
AU2016374520C1 (en) * 2015-12-14 2020-10-15 Motion Metrics International Corp. Method and apparatus for identifying fragmented material portions within an image
WO2017156243A1 (en) * 2016-03-11 2017-09-14 Siemens Aktiengesellschaft Deep-learning based feature mining for 2.5d sensing image search

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170060254A1 (en) * 2015-03-03 2017-03-02 Nvidia Corporation Multi-sensor based user interface
US20170099200A1 (en) 2015-10-06 2017-04-06 Evolv Technologies, Inc. Platform for Gathering Real-Time Analysis
WO2018000039A1 (en) * 2016-06-29 2018-01-04 Seeing Machines Limited Camera registration in a multi-camera system
EP3438776A1 (en) * 2017-08-04 2019-02-06 Bayerische Motoren Werke Aktiengesellschaft Method, apparatus and computer program for a vehicle
EP3438777A1 (en) * 2017-08-04 2019-02-06 Bayerische Motoren Werke Aktiengesellschaft Method, apparatus and computer program for a vehicle
EP3438872A1 (en) * 2017-08-04 2019-02-06 Bayerische Motoren Werke Aktiengesellschaft Method, apparatus and computer program for a vehicle

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
ANDREAS EITEL; JOST TOBIAS SPRINGENBERG; LUCIANO SPINELLO; MARTIN RIEDMILLER, WOLFRAM BURGARD: "IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS", 2015, article "Multimodal Deep Learning for Robust RGB-D Object Recognition"
ANONYMOUS: "PlanetPhysics/Direction Cosines - Wikiversity", 1 July 2015 (2015-07-01), XP055558585, Retrieved from the Internet <URL:https://en.wikiversity.org/w/index.php?title=PlanetPhysics/Direction_Cosines&oldid=1402343> [retrieved on 20190219] *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023077432A1 (en) * 2021-11-05 2023-05-11 深圳市大疆创新科技有限公司 Movable platform control method and apparatus, and movable platform and storage medium
CN114882727A (en) * 2022-03-15 2022-08-09 深圳市德驰微视技术有限公司 Parking space detection method based on domain controller, electronic device and storage medium
CN114882727B (en) * 2022-03-15 2023-09-05 深圳市德驰微视技术有限公司 Parking space detection method based on domain controller, electronic equipment and storage medium
CN114419323A (en) * 2022-03-31 2022-04-29 华东交通大学 Cross-modal learning and domain self-adaptive RGBD image semantic segmentation method
CN114419323B (en) * 2022-03-31 2022-06-24 华东交通大学 Cross-modal learning and domain self-adaptive RGBD image semantic segmentation method

Also Published As

Publication number Publication date
DE102018100315A1 (en) 2019-07-11

Similar Documents

Publication Publication Date Title
EP3755204B1 (en) Eye tracking method and system
CN112446270B (en) Training method of pedestrian re-recognition network, pedestrian re-recognition method and device
JP6877623B2 (en) Computer-based systems and computer-based methods
US11205086B2 (en) Determining associations between objects and persons using machine learning models
CN109993707B (en) Image denoising method and device
JP6862584B2 (en) Image processing system and image processing method
WO2022001372A1 (en) Neural network training method and apparatus, and image processing method and apparatus
CN111797882B (en) Image classification method and device
CN110222718B (en) Image processing method and device
WO2019137915A1 (en) Generating input data for a convolutional neuronal network
CN111832592A (en) RGBD significance detection method and related device
WO2022179606A1 (en) Image processing method and related apparatus
US20230172457A1 (en) Systems and methods for temperature measurement
WO2022165722A1 (en) Monocular depth estimation method, apparatus and device
CN113807183A (en) Model training method and related equipment
KR20190128933A (en) Emotion recognition apparatus and method based on spatiotemporal attention
WO2019076867A1 (en) Semantic segmentation of an object in an image
CN110705564B (en) Image recognition method and device
CN115239581A (en) Image processing method and related device
WO2022179599A1 (en) Perceptual network and data processing method
CN118279206A (en) Image processing method and device
Velte Semantic image segmentation combining visible and near-infrared channels with depth information
CN112766063B (en) Micro-expression fitting method and system based on displacement compensation
JP2024528205A (en) Image processing method and device, and vehicle
CN115841427A (en) Image processing method, electronic device, storage medium, and computer program product

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19700354

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19700354

Country of ref document: EP

Kind code of ref document: A1