WO2019137915A1 - Generating input data for a convolutional neuronal network - Google Patents
Generating input data for a convolutional neuronal network Download PDFInfo
- Publication number
- WO2019137915A1 WO2019137915A1 PCT/EP2019/050343 EP2019050343W WO2019137915A1 WO 2019137915 A1 WO2019137915 A1 WO 2019137915A1 EP 2019050343 W EP2019050343 W EP 2019050343W WO 2019137915 A1 WO2019137915 A1 WO 2019137915A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- camera
- coordinate system
- range sensor
- coordinates
- automotive vehicle
- Prior art date
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/80—Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10024—Color image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30248—Vehicle exterior or interior
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30248—Vehicle exterior or interior
- G06T2207/30252—Vehicle exterior; Vicinity of vehicle
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
- G06V10/44—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
- G06V10/443—Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
- G06V10/449—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
- G06V10/451—Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
- G06V10/454—Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
Definitions
- the invention relates to a method for generating input data for a convolutional neuronal network using at least one camera and at least one range sensor.
- CNN convolution neural network
- CNNs are highly successful at classification and categorization tasks but much of the research is on standard photometric RGB images and is not focused on embedded automotive devices. Automotive hardware devices need to have low power consumption requirements and thus low computational power.
- a convolutional neural is a class of deep, feed-forward artificial neural networks that has successfully been applied to analyzing visual imagery.
- CNNs use a variation of multilayer perceptrons designed to require minimal preprocessing.
- Convolutional networks were inspired by biological processes in which the connectivity pattern between neurons is inspired by the organization of the animal visual cortex. Individual cortical neurons respond to stimuli only in a restricted region of the visual field known as the receptive field. The receptive fields of different neurons partially overlap such that they cover the entire visual field.
- CNNs use relatively little pre-processing compared to other image classification algorithms. This means that the network learns the filters that in traditional algorithms were hand-engineered. This independence from prior knowledge and human effort in feature design is a major advantage. CNNs have applications in image and video recognition, recommender systems and natural language processing.
- the invention provides a method for generating input data for a convolutional neuronal network, using a at least one camera and at least one range sensor, the camera and the range sensor being arranged on the automotive vehicle in such a way that the field of view of the camera at least partially overlaps with the field of view of the range sensor, the method comprising the following method steps:
- the image frame being comprised of image data for directions relative to the position of the camera and within the solid angle seen by the camera, the directions being expressed by coordinates in a camera coordinate system,
- the depth information being comprised of depth data for directions relative to the position of the range sensor and within the solid angle seen by the range sensor, the directions being expressed by coordinates in a range sensor coordinate system, providing an automotive vehicle coordinate system which is related to the camera coordinate system and the range sensor coordinate system by respective sets of translations and rotations given by the position of the camera and the position of the range sensor relative to the origin of the automotive vehicle coordinate system, respectively,
- the input data for the convolutional neuronal network comprises both image data and depth data for common viewing directions relative to the origin of the automotive vehicle coordinate system, the directions being expressed with coordinates of the common automotive vehicle coordinate system which serves as a common frame.
- the input data for the convolutional neural network is comprised of image data and depth data for directions expressed in the automotive vehicle coordinate system though such data was originally captured and expressed as data in the coordinate system of the camera or the range sensor, respectively.
- the transformation of this data into the common automotive coordinate system provides for the possibility of using data from different
- the sensors/cameras in a common data set which is input into the convolutional neural network.
- the camera consecutively acquires image frames and the range sensor consecutively acquires depth information.
- the generated data set comprised by the depth data and the image data is input into the CNN.
- the method further comprises the following steps:
- Direction cosines are an analogous extension of the usual notion of slope to higher dimensions. Hence, direction cosine refers to the cosine of the angle between any two vectors. They are inter alia used for forming direction cosine matrices that express one set of orthonormal basis vectors in terms of another set, or for expressing a known vector in a different basis.
- the method further comprises the following steps:
- a data set comprising a color value (as a part of the image frame) and a respective distance value (as a part of a depth map) for multiple directions relative to the origin of the automotive vehicle coordinate system may be input into the CNN and processed therein together.
- the camera is a fish eye camera with a filed of view which is at least 180°.
- a single camera may be sufficient for the method according to the invention.
- multiple cameras are used for generating the input data for the convolutional neuronal network.
- these cameras have different fields of view. Even more preferably, these cameras cover the complete surrounding of the automotive vehicle.
- multiple range sensors are used for generating the input data for the convolutional neuronal network.
- these range sensors may be of the same type.
- the range sensors comprise at least two different types of range sensors, preferably at least a LIDAR sensor and at least an ultrasonic sensor.
- these range sensors have different fields of view. Even more preferably, these range sensor cover the complete surrounding of the automotive vehicle.
- the invention also relates to the use of a method as described above in an automotive vehicle, to a sensor arrangement for an automotive vehicle configured for performing such a method, and to a non-transitory computer-readable medium, comprising instructions stored thereon, that when executed on a processor, induce a sensor arrangement of an automotive vehicle to perform such a method.
- Fig. 1 schematically depicts an automotive vehicle with a sensor arrangement capturing an object according to a preferred embodiment of the invention
- Fig. 2 schematically depicts the camera coordinate system and the range
- Fig. 3 schematically depicts the automotive vehicle coordinate system
- a sensor arrangement 2 comprising a camera 3, an evaluation unit 4, an ultrasonic sensor 5, and a LIDAR sensor 6 is provided.
- the camera 3, the ultrasonic sensor 5, and the LIDAR sensor 6 comprises respective fields of view which overlap which each other. This allows to captures scenes with image data and depth data, respectively, which may be input into a convolutional neural network incorporated in the an evaluation unit 4 for classification of objects like the person 7 in front of the automotive vehicle 1 .
- range sensors 5, 6, i.e. an ultrasonic sensor 5 and a LIDAR sensor 6 it is possible to create multiple input depth maps with RGB image data to use a CNN network that can detect and classify objects.
- the application here is to use automotive sensors like the camera 3, the ultrasonic sensor 5 and the LIDAR sensor 6 to create depth information around a vehicle and combine this data with surround view image data. Therefore, such automotive sensors are preferably arranged on all sides of the automotive vehicle 1 in such a way that the complete surrounding of the automotive vehicle can be monitored.
- the present preferred embodiment of the invention only refers to the three automotive sensors mentioned above as an example.
- each sensor has its own mechanical coordinate system.
- the x-axes and the z-axes are depicted, i.e. xc and z c for the camera 3, xu and zu for the ultrasonic sensor 5, and xi_ and zi_ for the LIDAR sensor 6.
- an automotive vehicle coordinate system is defined as a common reference coordinate system for all automotive sensors 3, 5, 6 is defined.
- the automotive vehicle coordinate system has its origin (0, 0, 0) in the middle of the front portion of the automotive vehicle 1 at street level.
- image frame are consecutively acquired, the image frames being comprised of image data for directions relative to the position of the camera 3 and within the solid angle seen by the camera 3, the directions being expressed by coordinates in the camera coordinate system described above.
- Simultaneously depth information with the range sensors 5, 6, i.e. the ultrasonic sensor 5 and the LIDAR sensor 6 is acquired, the depth information being comprised of depth data for directions relative to the positions of the range sensors 5, 6 and within the solid angles seen by the range sensors 5, 6, the directions being expressed by coordinates in the range sensors’ coordinate systems.
- an automotive vehicle coordinate system is provided which is related to the camera coordinate system and the range sensors’ coordinate systems by respective sets of translations and rotations given by the position of the camera 3 and the positions of the range sensors 5, 6 relative to the origin of the automotive vehicle coordinate system, respectively. Then, the coordinates in the camera coordinate system and the coordinates in the range sensors coordinate systems are transformed into coordinates in the automotive coordinate system on the basis of the sets of translations and rotations. In this way the input data for the convolutional neural network is yielded and input in the CNN for object classification.
- the coordinates in the camera coordinate system and the coordinates in the range sensor coordinate system are both expressed by a respective direction cosine matrix.
- the image data is expressed by a color value, i.e. by a RGB value, for each coordinate triple of the cosine matrix
- the depth data is expressed by a distance value for each coordinate triple of the cosine matrix.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
- Traffic Control Systems (AREA)
Abstract
The present invention relates to a method for generating input data for a convolutional neuronal network, using a at least one camera (3) and at least one range sensor (5, 6), the camera (3) and the range sensor (5, 6) being arranged on the automotive vehicle (1) in such a way that the field of view of the camera (3) at least partially overlaps with the field of view of the range sensor (5, 6), the method comprising the following method steps: - acquiring an image frame with the camera (3), the image frame being comprised of image data for directions relative to the position of the camera (3) and within the solid angle seen by the camera (3), the directions being expressed by coordinates in a camera coordinate system, - simultaneously acquiring depth information with the range sensor (5, 6), the depth information being comprised of depth data for directions relative to the position of the range sensor (5, 6) and within the solid angle seen by the range sensor (5, 6), the directions being expressed by coordinates in a range sensor coordinate system, - providing an automotive vehicle coordinate system which is related to the camera coordinate system and the range sensor coordinate system by respective sets of translations and rotations given by the position of the camera (3) and the position of the range sensor (5, 6) relative to the origin of the automotive vehicle coordinate system, respectively, - transforming the coordinates in the camera coordinate system and the coordinates in the range sensor coordinate system into coordinates in the automotive coordinate system on the basis of the sets of translations and rotations yielding the input data for the convolutional neural network. In this way, semantic segmentation of objects in an image in automotive computer vision can be enhanced.
Description
Generating input data for a convolutional neuronal network
The invention relates to a method for generating input data for a convolutional neuronal network using at least one camera and at least one range sensor.
One of the most fundamental problems in automotive computer vision is the semantic segmentation of objects in an image. The segmentation approach refers to the problems of associating every pixel to its corresponding object class. In recent times, there was a surge of convolution neural network (CNN) research and design aided by increase in computational power in computer architectures and the availability of large annotated datasets.
CNNs are highly successful at classification and categorization tasks but much of the research is on standard photometric RGB images and is not focused on embedded automotive devices. Automotive hardware devices need to have low power consumption requirements and thus low computational power.
In machine learning, a convolutional neural is a class of deep, feed-forward artificial neural networks that has successfully been applied to analyzing visual imagery. CNNs use a variation of multilayer perceptrons designed to require minimal preprocessing. Convolutional networks were inspired by biological processes in which the connectivity pattern between neurons is inspired by the organization of the animal visual cortex. Individual cortical neurons respond to stimuli only in a restricted region of the visual field known as the receptive field. The receptive fields of different neurons partially overlap such that they cover the entire visual field.
CNNs use relatively little pre-processing compared to other image classification algorithms. This means that the network learns the filters that in traditional algorithms were hand-engineered. This independence from prior knowledge and human effort in feature design is a major advantage. CNNs have applications in image and video recognition, recommender systems and natural language processing.
The article“Multimodal Deep Learning for Robust RGB-D Object Recognition Andreas Eitel, Jost Tobias Springenberg, Luciano Spinello, Martin Riedmiller, Wolfram Burgard IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS),
Hamburg, Germany, 2015” proposes a RGB-D architecture for object recognition. This
architecture is composed of two separate CNN processing streams - one for each modality - which are consecutively combined with a late fusion network. The focus is on learning with imperfect sensor data, a typical problem in real-world robotics tasks. For accurate learning, a multi-stage training methodology and two crucial ingredients for handling depth data with CNNs are introduced. The first, an effective encoding of depth information for CNNs that enables learning without the need for large depth datasets. The second, a data augmentation scheme for robust learning with depth images by corrupting them with realistic noise patterns.
From US 2017/0099200 A1 it is known that data is received characterizing a request for agent computation of sensor data. The request includes a required confidence and required latency for completion of the agent computation. Agents to query are determined based on the required confidence. Data is transmitted to query the determined agents to provide analysis of the sensor data.
It is an objective of the present invention to provide a possibility for enhancing semantic segmentation of objects in an image in automotive computer vision.
This object is addressed by the subject matter of the independent claims. Preferred embodiments are described in the sub claims.
Therefore, the invention provides a method for generating input data for a convolutional neuronal network, using a at least one camera and at least one range sensor, the camera and the range sensor being arranged on the automotive vehicle in such a way that the field of view of the camera at least partially overlaps with the field of view of the range sensor, the method comprising the following method steps:
acquiring an image frame with the camera, the image frame being comprised of image data for directions relative to the position of the camera and within the solid angle seen by the camera, the directions being expressed by coordinates in a camera coordinate system,
simultaneously acquiring depth information with the range sensor, the depth information being comprised of depth data for directions relative to the position of the range sensor and within the solid angle seen by the range sensor, the directions being expressed by coordinates in a range sensor coordinate system,
providing an automotive vehicle coordinate system which is related to the camera coordinate system and the range sensor coordinate system by respective sets of translations and rotations given by the position of the camera and the position of the range sensor relative to the origin of the automotive vehicle coordinate system, respectively,
transforming the coordinates in the camera coordinate system and the coordinates in the range sensor camera system into coordinates in the automotive camera system on the basis of the sets of translations and rotations yielding the input data for the convolutional neural network.
Hence, it is an essential idea of the invention that the input data for the convolutional neuronal network comprises both image data and depth data for common viewing directions relative to the origin of the automotive vehicle coordinate system, the directions being expressed with coordinates of the common automotive vehicle coordinate system which serves as a common frame. In other words: The input data for the convolutional neural network is comprised of image data and depth data for directions expressed in the automotive vehicle coordinate system though such data was originally captured and expressed as data in the coordinate system of the camera or the range sensor, respectively. The transformation of this data into the common automotive coordinate system provides for the possibility of using data from different
sensors/cameras in a common data set which is input into the convolutional neural network. Preferably, the camera consecutively acquires image frames and the range sensor consecutively acquires depth information. Preferably, as a last step of the method described before, the generated data set comprised by the depth data and the image data is input into the CNN.
According to a preferred embodiment of the invention, the method further comprises the following steps:
expressing the coordinates in the camera coordinate system by a direction cosine matrix, and
expressing the coordinates in the range sensor coordinate system by a direction cosine matrix.
As known to the man skilled in the art, the direction cosines of a vector are
the cosines of the angles between the vector and the three coordinate axes.
Equivalently, they are the contributions of each component of the basis to a unit vector in that direction. Direction cosines are an analogous extension of the usual notion of slope to higher dimensions. Hence, direction cosine refers to the cosine of the angle between any two vectors. They are inter alia used for forming direction cosine matrices that express one set of orthonormal basis vectors in terms of another set, or for expressing a known vector in a different basis.
Preferably, the method further comprises the following steps:
expressing the image data by a color value, preferably by a RGB value, for each coordinate triple of the cosine matrix, and
expressing the depth data by a distance value for each coordinate triple of the cosine matrix.
In this way, a data set comprising a color value (as a part of the image frame) and a respective distance value (as a part of a depth map) for multiple directions relative to the origin of the automotive vehicle coordinate system may be input into the CNN and processed therein together.
In general, different types of cameras may be used. However, according to a preferred embodiment of the invention, the camera is a fish eye camera with a filed of view which is at least 180°. Further, in general, a single camera may be sufficient for the method according to the invention. However, according to a preferred embodiment of the invention, multiple cameras are used for generating the input data for the convolutional neuronal network. Preferably, these cameras have different fields of view. Even more preferably, these cameras cover the complete surrounding of the automotive vehicle.
Furthermore, preferably multiple range sensors are used for generating the input data for the convolutional neuronal network. In general, these range sensors may be of the same type. However, according to a preferred embodiment of the invention, the range sensors comprise at least two different types of range sensors, preferably at least a LIDAR sensor and at least an ultrasonic sensor. Preferably, these range sensors have
different fields of view. Even more preferably, these range sensor cover the complete surrounding of the automotive vehicle.
The invention also relates to the use of a method as described above in an automotive vehicle, to a sensor arrangement for an automotive vehicle configured for performing such a method, and to a non-transitory computer-readable medium, comprising instructions stored thereon, that when executed on a processor, induce a sensor arrangement of an automotive vehicle to perform such a method.
In the drawings:
Fig. 1 schematically depicts an automotive vehicle with a sensor arrangement capturing an object according to a preferred embodiment of the invention,
Fig. 2 schematically depicts the camera coordinate system and the range
sensor coordinate system according to the preferred embodiment of the invention, and
Fig. 3 schematically depicts the automotive vehicle coordinate system
according to the preferred embodiment of the invention.
As schematically depicted in Fig. 1 , according to a preferred embodiment of the invention, in an automotive vehicle 1 a sensor arrangement 2 comprising a camera 3, an evaluation unit 4, an ultrasonic sensor 5, and a LIDAR sensor 6 is provided. As depicted by dashed lines the camera 3, the ultrasonic sensor 5, and the LIDAR sensor 6 comprises respective fields of view which overlap which each other. This allows to captures scenes with image data and depth data, respectively, which may be input into a convolutional neural network incorporated in the an evaluation unit 4 for classification of objects like the person 7 in front of the automotive vehicle 1 .
By using different types of range sensors 5, 6, i.e. an ultrasonic sensor 5 and a LIDAR sensor 6 it is possible to create multiple input depth maps with RGB image data to use a CNN network that can detect and classify objects. The application here is to use automotive sensors like the camera 3, the ultrasonic sensor 5 and the LIDAR sensor 6
to create depth information around a vehicle and combine this data with surround view image data. Therefore, such automotive sensors are preferably arranged on all sides of the automotive vehicle 1 in such a way that the complete surrounding of the automotive vehicle can be monitored. For the sake of clarity, the present preferred embodiment of the invention only refers to the three automotive sensors mentioned above as an example.
It is an important aspect of the present preferred embodiment of the invention to encode the range sensors 5, 6, i.e. the ultrasonic sensor 5 and the LIDAR sensor 6, in the same coordinate system as the camera data to create CNN input data that use RGB and multi depths maps together. This input data can then be input to a convolution neural network for classification.
As schematically depicted in Fig. 2, each sensor has its own mechanical coordinate system. Here, due to the two-dimensionality of the figure only the x-axes and the z-axes are depicted, i.e. xc and zc for the camera 3, xu and zu for the ultrasonic sensor 5, and xi_ and zi_ for the LIDAR sensor 6. Further, as schematically depicted in Fig. 3 an automotive vehicle coordinate system is defined as a common reference coordinate system for all automotive sensors 3, 5, 6 is defined. The automotive vehicle coordinate system has its origin (0, 0, 0) in the middle of the front portion of the automotive vehicle 1 at street level. With respect to the respective positions of the automotive sensors 3, 5, 6 (and to all other automotive sensors which may be arranged on the automotive vehicle) there exists a set of rotations and translations to define the relationship between each sensor and the automotive vehicle coordinate system. All sensor data can then be translated into the automotive vehicle coordinate system as a common frame of reference and passed into the CNN.
In detail, this method according to the present preferred embodiment of the invention is as follows:
With the camera 3, image frame are consecutively acquired, the image frames being comprised of image data for directions relative to the position of the camera 3 and within the solid angle seen by the camera 3, the directions being expressed by coordinates in the camera coordinate system described above. Simultaneously depth information with
the range sensors 5, 6, i.e. the ultrasonic sensor 5 and the LIDAR sensor 6 is acquired, the depth information being comprised of depth data for directions relative to the positions of the range sensors 5, 6 and within the solid angles seen by the range sensors 5, 6, the directions being expressed by coordinates in the range sensors’ coordinate systems.
As described before, an automotive vehicle coordinate system is provided which is related to the camera coordinate system and the range sensors’ coordinate systems by respective sets of translations and rotations given by the position of the camera 3 and the positions of the range sensors 5, 6 relative to the origin of the automotive vehicle coordinate system, respectively. Then, the coordinates in the camera coordinate system and the coordinates in the range sensors coordinate systems are transformed into coordinates in the automotive coordinate system on the basis of the sets of translations and rotations. In this way the input data for the convolutional neural network is yielded and input in the CNN for object classification.
According to the preferred embodiment of the invention described here, the coordinates in the camera coordinate system and the coordinates in the range sensor coordinate system are both expressed by a respective direction cosine matrix. Further, the image data is expressed by a color value, i.e. by a RGB value, for each coordinate triple of the cosine matrix, and the depth data is expressed by a distance value for each coordinate triple of the cosine matrix.
In this way, by using image information from the camera 3 together with depth information from different range sensors 5, 6 semantic segmentation of objects in an image in automotive computer vision can be greatly enhanced.
Reference signs list
automotive vehicle
sensor arrangement
camera
evaluation unit
ultrasonic sensor
LIDAR sensor
person
Claims
1. Method for generating input data for a convolutional neuronal network, using a at least one camera (3) and at least one range sensor (5, 6), the camera (3) and the range sensor (5, 6) being arranged on the automotive vehicle (1 ) in such a way that the field of view of the camera (3) at least partially overlaps with the field of view of the range sensor (5, 6), the method comprising the following method steps:
acquiring an image frame with the camera (3), the image frame being comprised of image data for directions relative to the position of the camera (3) and within the solid angle seen by the camera (3), the directions being expressed by coordinates in a camera coordinate system,
simultaneously acquiring depth information with the range sensor (5, 6), the depth information being comprised of depth data for directions relative to the position of the range sensor (5, 6) and within the solid angle seen by the range sensor (5, 6), the directions being expressed by coordinates in a range sensor coordinate system,
providing an automotive vehicle coordinate system which is related to the camera coordinate system and the range sensor coordinate system by respective sets of translations and rotations given by the position of the camera (3) and the position of the range sensor (5, 6) relative to the origin of the automotive vehicle coordinate system, respectively,
transforming the coordinates in the camera coordinate system and the coordinates in the range sensor coordinate system into coordinates in the automotive coordinate system on the basis of the sets of translations and rotations yielding the input data for the convolutional neural network.
2. Method according to claim 1 , the method further comprising the following steps: expressing the coordinates in the camera coordinate system by a direction cosine matrix, and
expressing the coordinates in the range sensor coordinate system by a direction cosine matrix.
3. Method according to claim 1 or 2, the method further comprising the following steps:
expressing the image data by a color value, preferably by a RGB value, for each coordinate triple of the cosine matrix, and
expressing the depth data by a distance value for each coordinate triple of the cosine matrix.
4. Method according to any of the preceding claims, wherein the camera (3) is a fish eye camera with a filed of view which is at least 180°.
5. Method according to any of the preceding claims, wherein multiple cameras (3) are used for generating the input data for the convolutional neuronal network.
6. Method according to any of the preceding claims, wherein multiple range sensors (5, 6) are used for generating the input data for the convolutional neuronal network.
7. Method according to claim 6, wherein the range sensors (5, 6) comprise at least two different types of range sensors, preferably at least a LIDAR sensor (6) and at least an ultrasonic sensor (5).
8. Use of the method according to any of the previous claims in an automotive vehicle (1 ).
9. Sensor arrangement (2) for an automotive vehicle (1 ) configured for performing the method according to any of claims 1 to 8.
10. Non-transitory computer-readable medium, comprising instructions stored thereon, that when executed on a processor, induce a sensor arrangement (2) of an automotive vehicle (1 ) to perform the method of any of claims 1 to 8.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
DE102018100315.3A DE102018100315A1 (en) | 2018-01-09 | 2018-01-09 | Generating input data for a convolutional neural network |
DE102018100315.3 | 2018-01-09 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019137915A1 true WO2019137915A1 (en) | 2019-07-18 |
Family
ID=65013693
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/EP2019/050343 WO2019137915A1 (en) | 2018-01-09 | 2019-01-08 | Generating input data for a convolutional neuronal network |
Country Status (2)
Country | Link |
---|---|
DE (1) | DE102018100315A1 (en) |
WO (1) | WO2019137915A1 (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114419323A (en) * | 2022-03-31 | 2022-04-29 | 华东交通大学 | Cross-modal learning and domain self-adaptive RGBD image semantic segmentation method |
CN114882727A (en) * | 2022-03-15 | 2022-08-09 | 深圳市德驰微视技术有限公司 | Parking space detection method based on domain controller, electronic device and storage medium |
WO2023077432A1 (en) * | 2021-11-05 | 2023-05-11 | 深圳市大疆创新科技有限公司 | Movable platform control method and apparatus, and movable platform and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170060254A1 (en) * | 2015-03-03 | 2017-03-02 | Nvidia Corporation | Multi-sensor based user interface |
US20170099200A1 (en) | 2015-10-06 | 2017-04-06 | Evolv Technologies, Inc. | Platform for Gathering Real-Time Analysis |
WO2018000039A1 (en) * | 2016-06-29 | 2018-01-04 | Seeing Machines Limited | Camera registration in a multi-camera system |
EP3438776A1 (en) * | 2017-08-04 | 2019-02-06 | Bayerische Motoren Werke Aktiengesellschaft | Method, apparatus and computer program for a vehicle |
EP3438777A1 (en) * | 2017-08-04 | 2019-02-06 | Bayerische Motoren Werke Aktiengesellschaft | Method, apparatus and computer program for a vehicle |
EP3438872A1 (en) * | 2017-08-04 | 2019-02-06 | Bayerische Motoren Werke Aktiengesellschaft | Method, apparatus and computer program for a vehicle |
Family Cites Families (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5475768A (en) * | 1993-04-29 | 1995-12-12 | Canon Inc. | High accuracy optical character recognition using neural networks with centroid dithering |
US5642431A (en) * | 1995-06-07 | 1997-06-24 | Massachusetts Institute Of Technology | Network-based system and method for detection of faces and the like |
WO2016145379A1 (en) * | 2015-03-12 | 2016-09-15 | William Marsh Rice University | Automated Compilation of Probabilistic Task Description into Executable Neural Network Specification |
US9633282B2 (en) * | 2015-07-30 | 2017-04-25 | Xerox Corporation | Cross-trained convolutional neural networks using multimodal images |
AU2016374520C1 (en) * | 2015-12-14 | 2020-10-15 | Motion Metrics International Corp. | Method and apparatus for identifying fragmented material portions within an image |
WO2017156243A1 (en) * | 2016-03-11 | 2017-09-14 | Siemens Aktiengesellschaft | Deep-learning based feature mining for 2.5d sensing image search |
-
2018
- 2018-01-09 DE DE102018100315.3A patent/DE102018100315A1/en active Pending
-
2019
- 2019-01-08 WO PCT/EP2019/050343 patent/WO2019137915A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170060254A1 (en) * | 2015-03-03 | 2017-03-02 | Nvidia Corporation | Multi-sensor based user interface |
US20170099200A1 (en) | 2015-10-06 | 2017-04-06 | Evolv Technologies, Inc. | Platform for Gathering Real-Time Analysis |
WO2018000039A1 (en) * | 2016-06-29 | 2018-01-04 | Seeing Machines Limited | Camera registration in a multi-camera system |
EP3438776A1 (en) * | 2017-08-04 | 2019-02-06 | Bayerische Motoren Werke Aktiengesellschaft | Method, apparatus and computer program for a vehicle |
EP3438777A1 (en) * | 2017-08-04 | 2019-02-06 | Bayerische Motoren Werke Aktiengesellschaft | Method, apparatus and computer program for a vehicle |
EP3438872A1 (en) * | 2017-08-04 | 2019-02-06 | Bayerische Motoren Werke Aktiengesellschaft | Method, apparatus and computer program for a vehicle |
Non-Patent Citations (2)
Title |
---|
ANDREAS EITEL; JOST TOBIAS SPRINGENBERG; LUCIANO SPINELLO; MARTIN RIEDMILLER, WOLFRAM BURGARD: "IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS", 2015, article "Multimodal Deep Learning for Robust RGB-D Object Recognition" |
ANONYMOUS: "PlanetPhysics/Direction Cosines - Wikiversity", 1 July 2015 (2015-07-01), XP055558585, Retrieved from the Internet <URL:https://en.wikiversity.org/w/index.php?title=PlanetPhysics/Direction_Cosines&oldid=1402343> [retrieved on 20190219] * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023077432A1 (en) * | 2021-11-05 | 2023-05-11 | 深圳市大疆创新科技有限公司 | Movable platform control method and apparatus, and movable platform and storage medium |
CN114882727A (en) * | 2022-03-15 | 2022-08-09 | 深圳市德驰微视技术有限公司 | Parking space detection method based on domain controller, electronic device and storage medium |
CN114882727B (en) * | 2022-03-15 | 2023-09-05 | 深圳市德驰微视技术有限公司 | Parking space detection method based on domain controller, electronic equipment and storage medium |
CN114419323A (en) * | 2022-03-31 | 2022-04-29 | 华东交通大学 | Cross-modal learning and domain self-adaptive RGBD image semantic segmentation method |
CN114419323B (en) * | 2022-03-31 | 2022-06-24 | 华东交通大学 | Cross-modal learning and domain self-adaptive RGBD image semantic segmentation method |
Also Published As
Publication number | Publication date |
---|---|
DE102018100315A1 (en) | 2019-07-11 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3755204B1 (en) | Eye tracking method and system | |
CN112446270B (en) | Training method of pedestrian re-recognition network, pedestrian re-recognition method and device | |
JP6877623B2 (en) | Computer-based systems and computer-based methods | |
US11205086B2 (en) | Determining associations between objects and persons using machine learning models | |
CN109993707B (en) | Image denoising method and device | |
JP6862584B2 (en) | Image processing system and image processing method | |
WO2022001372A1 (en) | Neural network training method and apparatus, and image processing method and apparatus | |
CN111797882B (en) | Image classification method and device | |
CN110222718B (en) | Image processing method and device | |
WO2019137915A1 (en) | Generating input data for a convolutional neuronal network | |
CN111832592A (en) | RGBD significance detection method and related device | |
WO2022179606A1 (en) | Image processing method and related apparatus | |
US20230172457A1 (en) | Systems and methods for temperature measurement | |
WO2022165722A1 (en) | Monocular depth estimation method, apparatus and device | |
CN113807183A (en) | Model training method and related equipment | |
KR20190128933A (en) | Emotion recognition apparatus and method based on spatiotemporal attention | |
WO2019076867A1 (en) | Semantic segmentation of an object in an image | |
CN110705564B (en) | Image recognition method and device | |
CN115239581A (en) | Image processing method and related device | |
WO2022179599A1 (en) | Perceptual network and data processing method | |
CN118279206A (en) | Image processing method and device | |
Velte | Semantic image segmentation combining visible and near-infrared channels with depth information | |
CN112766063B (en) | Micro-expression fitting method and system based on displacement compensation | |
JP2024528205A (en) | Image processing method and device, and vehicle | |
CN115841427A (en) | Image processing method, electronic device, storage medium, and computer program product |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19700354 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19700354 Country of ref document: EP Kind code of ref document: A1 |