EP3926360A1 - Neural network based methods and systems for object detection using concatenated lidar, radar and camera data sets - Google Patents
Neural network based methods and systems for object detection using concatenated lidar, radar and camera data sets Download PDFInfo
- Publication number
- EP3926360A1 EP3926360A1 EP20180636.1A EP20180636A EP3926360A1 EP 3926360 A1 EP3926360 A1 EP 3926360A1 EP 20180636 A EP20180636 A EP 20180636A EP 3926360 A1 EP3926360 A1 EP 3926360A1
- Authority
- EP
- European Patent Office
- Prior art keywords
- lidar
- implemented method
- radar
- data sets
- computer implemented
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 41
- 238000001514 detection method Methods 0.000 title claims abstract description 32
- 238000013528 artificial neural network Methods 0.000 title claims description 23
- 238000005266 casting Methods 0.000 claims abstract description 20
- 208000037170 Delayed Emergence from Anesthesia Diseases 0.000 claims description 32
- 230000004927 fusion Effects 0.000 description 19
- 238000013500 data storage Methods 0.000 description 7
- 230000001131 transforming effect Effects 0.000 description 6
- 238000013527 convolutional neural network Methods 0.000 description 4
- 238000004590 computer program Methods 0.000 description 3
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 235000019580 granularity Nutrition 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S13/00—Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
- G01S13/88—Radar or analogous systems specially adapted for specific applications
- G01S13/93—Radar or analogous systems specially adapted for specific applications for anti-collision purposes
- G01S13/931—Radar or analogous systems specially adapted for specific applications for anti-collision purposes of land vehicles
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S13/00—Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
- G01S13/86—Combinations of radar systems with non-radar systems, e.g. sonar, direction finder
- G01S13/865—Combination of radar systems with lidar systems
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S13/00—Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
- G01S13/86—Combinations of radar systems with non-radar systems, e.g. sonar, direction finder
- G01S13/867—Combination of radar systems with cameras
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S17/00—Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
- G01S17/86—Combinations of lidar systems with systems other than lidar, radar or sonar, e.g. with direction finders
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S17/00—Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
- G01S17/88—Lidar systems specially adapted for specific applications
- G01S17/89—Lidar systems specially adapted for specific applications for mapping or imaging
- G01S17/894—3D imaging with simultaneous measurement of time-of-flight at a 2D array of receiver pixels, e.g. time-of-flight cameras or flash lidar
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S17/00—Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
- G01S17/88—Lidar systems specially adapted for specific applications
- G01S17/93—Lidar systems specially adapted for specific applications for anti-collision purposes
- G01S17/931—Lidar systems specially adapted for specific applications for anti-collision purposes of land vehicles
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S7/00—Details of systems according to groups G01S13/00, G01S15/00, G01S17/00
- G01S7/02—Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00
- G01S7/41—Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00 using analysis of echo signal for target characterisation; Target signature; Target cross-section
- G01S7/417—Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00 using analysis of echo signal for target characterisation; Target signature; Target cross-section involving the use of neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/251—Fusion techniques of input or preprocessed data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/803—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of input or preprocessed data
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10032—Satellite or aerial image; Remote sensing
- G06T2207/10044—Radar image
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
Definitions
- the present disclosure relates to methods and systems for object detection, for example for bounding box detection.
- Object detection is an essential pre-requisite for various tasks, in particular in autonomously driving vehicles.
- the present disclosure provides a computer implemented method, a computer system, a vehicle, and a non-transitory computer readable medium according to the independent claims. Embodiments are given in the subclaims, the description and the drawings.
- the present disclosure is directed at a computer implemented method for object detection, the method comprising the following steps performed (in other words: carried out) by computer hardware components: acquiring a plurality of lidar data sets from a lidar sensor; acquiring a plurality of radar data sets from a radar sensor; acquiring at least one image from a camera; determining concatenated data based on casting (in other words: projecting) the plurality of lidar data sets and the plurality of radar data sets to the at least one image; and detecting an object based on the concatenated data.
- Concatenating data may include enhancing sensor data from several prior sensor readings into one frame subsequently casted onto single camera frame.
- casting and “projecting” (and likewise “cast” and “projection”) may be used interchangeably.
- casting points onto a 2D camera space may be understood as projecting the points onto the camera plane (of the 2D camera space).
- the projecting may be carried out using a pinhole camera model.
- the computer implemented method further comprises the following step carried out by the computer hardware components: determining a plurality of camera residual blocks.
- Residual blocks may be part of the artificial neural network which transforms data aiming to obtain features applicable in prediction, i.e., layers of neural network with skip connections as introduced in ResNet architecture.
- the block includes a 2D convolutional layer, batch normalization, and a leaky ReLu activation function.
- the camera data may be processed using a first artificial neural network.
- the first artificial neural network may be a ResNet type convolutional neural network architecture transforming camera data.
- the casting comprises aligning a plurality of sweeps of the lidar data sets. For example, several prior lidar sweeps are aligned to the most current sweep to enhance lidar points number, as if it was a single sweep with more dense lidar. Alignment is carried out as in nuScenes open API method that returns a point cloud that aggregates multiple sweeps. A pre-determined number of previous frames (for example 10 previous frames) may be mapped to a single reference frame, which may be the "current" (or a "current frame”) in this case. Homogeneous transformations matrices considering differences in translation and rotation of ego car may be used to align a previous frame with the reference frame.
- the computer implemented method further comprises the following step carried out by the computer hardware components: carrying out linear depth completion of the lidar data sets.
- Linear depth completion may be used to further enhance lidar points density. It is performed on already projected lidar points onto 2D camera plane. Afterwards each 2D plane point has depth linearly estimated from the lidar depths from nearest points.
- Such depth completion is quick and allows to obtain "depth images", which even though coming only from lidar allow to utilize for example convolutional neural networks used with images processing.
- the computer implemented method further comprises the following step carried out by the computer hardware components: determining a plurality of lidar residual blocks based on the linear depth completed lidar data. Having "depth images", the same type of residual blocks as in case of camera data may be utilized.
- the lidar data sets are processed using a second artificial neural network.
- the second artificial neural network may be a ResNet type artificial neural network architecture, as for camera, transforming "depth images”.
- the casting comprises aligning a plurality of sweeps of the plurality of radar data sets.
- the aligning of radar data sets may be similar to the aligning of lidar data sets.
- the computer implemented method further comprises the following step carried out by the computer hardware components: determining a plurality of radar residual blocks.
- the residual blocks may be similar as in the case of camera data.
- the radar data is processed using a third artificial neural network.
- the third artificial neural network may be a ResNet type convolutional neural network architecture, as for camera, transforming "velocity images”.
- the computer implemented method further comprises the following step carried out by the computer hardware components: concatenating the plurality of camera residual blocks, the plurality of lidar residual blocks, and the plurality of radar residual blocks (for example to obtain concatenated data).
- accurate 3d and 2d bounding box detections may be made with the neural network using camera-lidar-radar fused data, where KPI (key performance indicator) metrics of our solution may be tested on nuScenes dataset casting lidar and radar onto front camera, on trained and validation/test sets prepared and consisting of separate scenes.
- KPI key performance indicator
- the present disclosure is directed at a computer system, said computer system comprising a plurality of computer hardware components configured to carry out several or all steps of the computer implemented method described herein.
- the computer system may be part of a vehicle.
- the computer system may comprise a plurality of computer hardware components (for example a processor, for example processing unit or processing network, at least one memory, for example memory unit or memory network, and at least one non-transitory data storage). It will be understood that further computer hardware components may be provided and used for carrying out steps of the computer implemented method in the computer system.
- the non-transitory data storage and/or the memory unit may comprise a computer program for instructing the computer to perform several or all steps or aspects of the computer implemented method described herein, for example using the processing unit and the at least one memory unit.
- the present disclosure is directed at a vehicle, comprising a radar sensor, a lidar sensor and a camera, wherein the vehicle is configured to detect objects according to the computer implemented method described herein.
- the present disclosure is directed at a non-transitory computer readable medium comprising instructions for carrying out several or all steps or aspects of the computer implemented method described herein.
- the computer readable medium may be configured as: an optical medium, such as a compact disc (CD) or a digital versatile disk (DVD); a magnetic medium, such as a hard disk drive (HDD); a solid state drive (SSD); a read only memory (ROM), such as a flash memory; or the like.
- the computer readable medium may be configured as a data storage that is accessible via a data connection, such as an internet connection.
- the computer readable medium may, for example, be an online data repository or a cloud storage.
- the present disclosure is also directed at a computer program for instructing a computer to perform several or all steps or aspects of the computer implemented method described herein.
- Neural networks may be used in object detection tasks, for example in automotive industry, wherein bounding boxes may be placed around objects belonging to certain classes of interest (such as cars, pedestrians, or traffic signs).
- a lidar (light detection and ranging) sensor may be used, for example in combination with other sensors.
- a lidar sensor may directly provide a point cloud in 3d coordinate space.
- the outputs from several sensors may be fused to provide useful information/features of the class of objects to be found.
- lidar sensor data, radar sensor data, and camera data may be fused together in an efficient way for 2d and 3d object detection tasks, and a neural networks architecture may be provided to make benefits of such a fusion.
- Fig. 1 shows an illustration 100 of a system (in other words: an architecture) for camera, lidar and radar fusion for 2d and 3d object detection task according to various embodiments.
- Inputs to the system are several sensor frames, for example from up to 0.5 passed seconds, appropriate prepared, which can be viewed as a W x H x C dimensional input frame (wherein W is the width, H is the height, and C is the number of features, for example comparing to W x H x 3 for RGB (red-green-blue) images), and which may be transformed and used for further processing, for example in an artificial neural network, for example in a convolutional neural network.
- Camera data 102 (including camera frames), lidar data 106 (including lidar pointcloud frames), and radar data 114 (including radar frames) may be processed as will be described in the following. All radar and lidar pointcloud frames may be casted onto most recent ('current') camera frame of size W x H.
- An alignment casting 108 (for example 10-sweeps alignment casting) may be carried out for the lidar data 106.
- the lidar frames may be linearly depth completed (as illustrated by block 110).
- An alignment casting 116 (for example 6-sweeps alignment casting) may be carried out for the radar data 114.
- RGB camera channels and channels from lidar and radar casts from previous sweeps which form C channels in a W x H x C input frame may be used.
- camera frames from previous 0.5 seconds timestamps contributing to C channels may be used.
- the input consists of W x H x 3 for camera (or 3*CO in case one uses CO previous camera frames), W x H x C1 for C1 prepared lidar casts, W x H x C2 for C2 prepared radar casts.
- an artificial neural network may be used for transforming such inputs to obtain 3d and/or 2d object detections on the output.
- a SSD Single Shot MultiBox Detector
- a SSD Single Shot MultiBox Detector like neural network, may be used, which may be able to make 3d bounding boxes detecting objects.
- yolo V3 and SSD networks may be used, which work on single 2d image input outputting 2d bounding boxes only.
- ground truth labels and underlying architecture may be introduced to be able to infer object distances and sizes (width, length, height) in 3d space, together with yaw-pitch-roll angles of an object.
- the following labels may be taken: (left, bottom, right, top, center_x, center _y, center_z, width, length, height, q1 , q2, q3, q4), where (left, bottom, right, top) are 2d bounding box coordinates of an object in 2d camera image space.
- (center_x, center _y, center_z, width, length, height, q1, q2, q3, q4) may be provided in 3d camera coordinate system, and q1,q2,q3,q4 may be quaternions describing yaw-pitch-roll angles.
- the network may be divided into three phases related to processing time.
- a first phase may consist of three separate subnets of residual blocks 104, 112, 120, transforming the camera data 102, the lidar data 106, and the radar data 114.
- a second phase after the residual blocks 104, 112, 120 are obtained, features extracted this way by each networks may be concatenated (like indicated by block 122) into a single feature frame.
- the second phase includes a further transformation of joint data using residual blocks. Concatenation in 122 applies to concatenating results from previous 3 residual neural network subnets transforming 1) camera 2) lidar 3) radar.
- a third (or last) phase may include outputting class scores and region proposals.
- Outputting the class scores and region proposals may be similar as in Yolo v3, but the output may be enhanced with 3D part implying the position of an object in 3D coordinates and its rotation; hence, each point of 2D grid together with its associated 2D anchor box receives a probability score indicating probability of existence of an object within such area; then each class of object that the method is predicting receives a score indicating probability of object being of such class.
- Region proposal corresponding to such 2D grid point may consist of coordinates indicating placement of an object: (left, bottom, right, top, center_x, center_y, center_z, width, length, height, q1, q2, q3, q4).
- (left, bottom), (right, top) may imply most probable placement of an object in the image space, (center_x, center_y, center_z, width, length, height, q1, q2, q3, q4) the placement in 3D space, and q coordinates indicate rotation.
- Detection in form of the labels described above may be included, which may add a 3-dimensional part.
- An object detector 126 may carry out 2d object detections 128 and 3d object detections 130 based on the residual blocks 124.
- middle fusion or late fusion may be provided as two separate architectures.
- already joined features may be used when creating 3-scale detections. Details of the late fusion and middle fusion and of the 3-scale detections may be similar to the Yolo v3 architecture.
- Final predictions may depend on joint results of prediction from 3 different 2D grid granularities ("scales").
- the output may be joined and the rest of the network may work on a "joint image" with features prepared by subnets as in case of standard 3 channel image.
- late fusion there may be a flow of 3 subnetworks as if working with separate 3 images (from 3 sensor types) and fuse the networks only before making the final predictions on each granularity scale.
- Fig. 2 shows illustrations 200 of hl (hidden layer, or high level) view of implementation according to various embodiments of middle fusion (on the left) and late fusion (on the right).
- the hl data 202 of the middle fusion may include joint features after a point of processing indicated by a dashed line 204
- the hl data 206 of the late fusion may include joint features after a point of processing indicated by a dashed line 208.
- hl data is illustrated in a sequence of processing (or time) from top to bottom.
- a loss function may be introduced, using yolo V3 loss and additional weighted L2 distances between 3d coordinates, and quaternion angle loss for learning yaw-pitch-roll.
- Fig. 3 shows an illustration 300 of a scene with various 3d bounding boxes 302, 304 obtained according to various embodiments.
- Fig. 4 shows a flow diagram 400 illustrating a method for object detection according to various embodiments.
- a plurality of lidar data sets may be acquired from a lidar sensor.
- a plurality of radar data sets may be acquired from a radar sensor.
- at least one image may be acquired from a camera.
- concatenated data may be determined based on casting the plurality of lidar data sets and the plurality of radar data sets to the at least one image.
- an object may be detected based on the concatenated data.
- a plurality of camera residual blocks may be determined.
- the camera data may be processed using a first artificial neural network.
- the casting may include aligning a plurality of sweeps of the lidar data sets.
- out linear depth completion of the lidar data sets may be carried out.
- a plurality of lidar residual blocks may be determined based on the linear depth completed lidar data.
- the lidar data sets may be processed using a second artificial neural network.
- the casting may include aligning a plurality of sweeps of the plurality of radar data sets.
- a plurality of radar residual blocks may be determined.
- the radar data may be processed using a third artificial neural network.
- the plurality of camera residual blocks, the plurality of lidar residual blocks, and the plurality of radar residual blocks may be concatenated.
- Each of the steps 402, 404, 406, 408, 410 and the further steps described above may be performed by computer hardware components.
- Fig. 5 shows a computer system 500 with a plurality of computer hardware components configured to carry out steps of a computer implemented method for object detection according to various embodiments.
- the computer system 500 may include a processor 502, a memory 504, and a non-transitory data storage 506.
- At least one camera 508, at least one lidar sensor 510, and at least one radar sensor 512 may be provided as part of the computer system 500 (like illustrated in Fig. 5 ), or may be provided external to the computer system 500.
- the processor 502 may carry out instructions provided in the memory 404.
- the non-transitory data storage 506 may store a computer program, including the instructions that may be transferred to the memory 504 and then executed by the processor 502.
- the processor 502, the memory 504, and the non-transitory data storage 506 may be coupled with each other, e.g. via an electrical connection 514, such as e.g. a cable or a computer bus or via any other suitable electrical connection to exchange electrical signals.
- the at least one camera 508, the at least one lidar sensor 510, and/or the at least one radar sensor 512 may be coupled to the computer system 500, for example via an external interface, or may be provided as parts of the computer system (in other words: internal to the computer system, for example coupled via the electrical connection 514).
- Coupled or “connection” are intended to include a direct “coupling” (for example via a physical link) or direct “connection” as well as an indirect “coupling” or indirect “connection” (for example via a logical link), respectively.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Radar, Positioning & Navigation (AREA)
- Remote Sensing (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Computer Networks & Wireless Communication (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Electromagnetism (AREA)
- Molecular Biology (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Databases & Information Systems (AREA)
- Quality & Reliability (AREA)
- Traffic Control Systems (AREA)
- Image Analysis (AREA)
Abstract
Description
- The present disclosure relates to methods and systems for object detection, for example for bounding box detection.
- Object detection is an essential pre-requisite for various tasks, in particular in autonomously driving vehicles.
- Accordingly, there is a need to provide efficient and reliable object detection.
- The present disclosure provides a computer implemented method, a computer system, a vehicle, and a non-transitory computer readable medium according to the independent claims. Embodiments are given in the subclaims, the description and the drawings.
- In one aspect, the present disclosure is directed at a computer implemented method for object detection, the method comprising the following steps performed (in other words: carried out) by computer hardware components: acquiring a plurality of lidar data sets from a lidar sensor; acquiring a plurality of radar data sets from a radar sensor; acquiring at least one image from a camera; determining concatenated data based on casting (in other words: projecting) the plurality of lidar data sets and the plurality of radar data sets to the at least one image; and detecting an object based on the concatenated data. Concatenating data may include enhancing sensor data from several prior sensor readings into one frame subsequently casted onto single camera frame.
- As used herein, "casting" and "projecting" (and likewise "cast" and "projection") may be used interchangeably. For example, casting points onto a 2D camera space may be understood as projecting the points onto the camera plane (of the 2D camera space). For example, the projecting may be carried out using a pinhole camera model.
- According to another aspect, the computer implemented method further comprises the following step carried out by the computer hardware components: determining a plurality of camera residual blocks. Residual blocks may be part of the artificial neural network which transforms data aiming to obtain features applicable in prediction, i.e., layers of neural network with skip connections as introduced in ResNet architecture. Here, the block includes a 2D convolutional layer, batch normalization, and a leaky ReLu activation function.
- According to another aspect, the camera data may be processed using a first artificial neural network. The first artificial neural network may be a ResNet type convolutional neural network architecture transforming camera data.
- According to another aspect, the casting comprises aligning a plurality of sweeps of the lidar data sets. For example, several prior lidar sweeps are aligned to the most current sweep to enhance lidar points number, as if it was a single sweep with more dense lidar. Alignment is carried out as in nuScenes open API method that returns a point cloud that aggregates multiple sweeps. A pre-determined number of previous frames (for example 10 previous frames) may be mapped to a single reference frame, which may be the "current" (or a "current frame") in this case. Homogeneous transformations matrices considering differences in translation and rotation of ego car may be used to align a previous frame with the reference frame.
- According to another aspect, the computer implemented method further comprises the following step carried out by the computer hardware components: carrying out linear depth completion of the lidar data sets. Linear depth completion may be used to further enhance lidar points density. It is performed on already projected lidar points onto 2D camera plane. Afterwards each 2D plane point has depth linearly estimated from the lidar depths from nearest points. Such depth completion is quick and allows to obtain "depth images", which even though coming only from lidar allow to utilize for example convolutional neural networks used with images processing.
- According to another aspect, the computer implemented method further comprises the following step carried out by the computer hardware components: determining a plurality of lidar residual blocks based on the linear depth completed lidar data. Having "depth images", the same type of residual blocks as in case of camera data may be utilized.
- According to another aspect, the lidar data sets are processed using a second artificial neural network. The second artificial neural network may be a ResNet type artificial neural network architecture, as for camera, transforming "depth images".
- According to another aspect, the casting comprises aligning a plurality of sweeps of the plurality of radar data sets. The aligning of radar data sets may be similar to the aligning of lidar data sets.
- According to another aspect, the computer implemented method further comprises the following step carried out by the computer hardware components: determining a plurality of radar residual blocks. The residual blocks may be similar as in the case of camera data.
- According to another aspect, the radar data is processed using a third artificial neural network. The third artificial neural network may be a ResNet type convolutional neural network architecture, as for camera, transforming "velocity images".
- According to another aspect, the computer implemented method further comprises the following step carried out by the computer hardware components: concatenating the plurality of camera residual blocks, the plurality of lidar residual blocks, and the plurality of radar residual blocks (for example to obtain concatenated data).
- With the method according to various aspects, accurate 3d and 2d bounding box detections may be made with the neural network using camera-lidar-radar fused data, where KPI (key performance indicator) metrics of our solution may be tested on nuScenes dataset casting lidar and radar onto front camera, on trained and validation/test sets prepared and consisting of separate scenes.
- In another aspect, the present disclosure is directed at a computer system, said computer system comprising a plurality of computer hardware components configured to carry out several or all steps of the computer implemented method described herein. The computer system may be part of a vehicle.
- The computer system may comprise a plurality of computer hardware components (for example a processor, for example processing unit or processing network, at least one memory, for example memory unit or memory network, and at least one non-transitory data storage). It will be understood that further computer hardware components may be provided and used for carrying out steps of the computer implemented method in the computer system. The non-transitory data storage and/or the memory unit may comprise a computer program for instructing the computer to perform several or all steps or aspects of the computer implemented method described herein, for example using the processing unit and the at least one memory unit.
- In another aspect, the present disclosure is directed at a vehicle, comprising a radar sensor, a lidar sensor and a camera, wherein the vehicle is configured to detect objects according to the computer implemented method described herein.
- In another aspect, the present disclosure is directed at a non-transitory computer readable medium comprising instructions for carrying out several or all steps or aspects of the computer implemented method described herein. The computer readable medium may be configured as: an optical medium, such as a compact disc (CD) or a digital versatile disk (DVD); a magnetic medium, such as a hard disk drive (HDD); a solid state drive (SSD); a read only memory (ROM), such as a flash memory; or the like. Furthermore, the computer readable medium may be configured as a data storage that is accessible via a data connection, such as an internet connection. The computer readable medium may, for example, be an online data repository or a cloud storage.
- The present disclosure is also directed at a computer program for instructing a computer to perform several or all steps or aspects of the computer implemented method described herein.
- Exemplary embodiments and functions of the present disclosure are described herein in conjunction with the following drawings, showing schematically
- Fig. 1
- an illustration of an architecture for camera, lidar and radar fusion for 2d and 3d object detection task according to various embodiments;
- Fig. 2
- illustrations of hl view of implementation according to various embodiments of middle fusion and late fusion;
- Fig. 3
- an illustration of a scene with various 3d bounding boxes obtained according to various embodiments;
- Fig. 4
- a flow diagram illustrating a method for object detection according to various embodiments; and
- Fig. 5
- a computer system with a plurality of computer hardware components configured to carry out steps of a computer implemented method for object detection according to various embodiments.
- Neural networks may be used in object detection tasks, for example in automotive industry, wherein bounding boxes may be placed around objects belonging to certain classes of interest (such as cars, pedestrians, or traffic signs).
- For 2d (two-dimensional) bounding box detection, it may be sufficient to use a single camera. For 3d (three-dimensional) bounding box detection, it may be desired to determine the distance of an object of interest from the ego vehicle. For example a lidar (light detection and ranging) sensor may be used, for example in combination with other sensors. A lidar sensor may directly provide a point cloud in 3d coordinate space. To further increase safety and accuracy of object detection task, the outputs from several sensors may be fused to provide useful information/features of the class of objects to be found.
- According to various embodiments, lidar sensor data, radar sensor data, and camera data may be fused together in an efficient way for 2d and 3d object detection tasks, and a neural networks architecture may be provided to make benefits of such a fusion.
-
Fig. 1 shows anillustration 100 of a system (in other words: an architecture) for camera, lidar and radar fusion for 2d and 3d object detection task according to various embodiments. - Inputs to the system are several sensor frames, for example from up to 0.5 passed seconds, appropriate prepared, which can be viewed as a W x H x C dimensional input frame (wherein W is the width, H is the height, and C is the number of features, for example comparing to W x H x 3 for RGB (red-green-blue) images), and which may be transformed and used for further processing, for example in an artificial neural network, for example in a convolutional neural network.
- Camera data 102 (including camera frames), lidar data 106 (including lidar pointcloud frames), and radar data 114 (including radar frames) may be processed as will be described in the following. All radar and lidar pointcloud frames may be casted onto most recent ('current') camera frame of size W x H. An alignment casting 108 (for example 10-sweeps alignment casting) may be carried out for the
lidar data 106. The lidar frames may be linearly depth completed (as illustrated by block 110). An alignment casting 116 (for example 6-sweeps alignment casting) may be carried out for theradar data 114. - For a single frame entry, RGB camera channels and channels from lidar and radar casts from previous sweeps which form C channels in a W x H x C input frame may be used. According to various embodiments, instead of using a single, most current, camera frame, camera frames from previous 0.5 seconds timestamps contributing to C channels may be used. To sum up, the input consists of W x H x 3 for camera (or 3*CO in case one uses CO previous camera frames), W x H x C1 for C1 prepared lidar casts, W x H x C2 for C2 prepared radar casts.
- According to various embodiments, an artificial neural network may be used for transforming such inputs to obtain 3d and/or 2d object detections on the output. A SSD (Single Shot MultiBox Detector) like neural network, may be used, which may be able to make 3d bounding boxes detecting objects. For example, yolo V3 and SSD networks may be used, which work on single 2d image input outputting 2d bounding boxes only. According to various embodiments, ground truth labels and underlying architecture may be introduced to be able to infer object distances and sizes (width, length, height) in 3d space, together with yaw-pitch-roll angles of an object.
- According to various embodiments, the following labels may be taken: (left, bottom, right, top, center_x, center _y, center_z, width, length, height, q1 , q2, q3, q4), where (left, bottom, right, top) are 2d bounding box coordinates of an object in 2d camera image space. (center_x, center _y, center_z, width, length, height, q1, q2, q3, q4) may be provided in 3d camera coordinate system, and q1,q2,q3,q4 may be quaternions describing yaw-pitch-roll angles.
- The network may be divided into three phases related to processing time. In a first phase may consist of three separate subnets of
residual blocks camera data 102, thelidar data 106, and theradar data 114. In a second phase, after theresidual blocks residual block 124, such "joint" image (which may have 128*3 channels, where there may be 128 channel subnet output) having features from those 3 sensors may further be transformed by a residual network. A third (or last) phase may include outputting class scores and region proposals. Outputting the class scores and region proposals may be similar as in Yolo v3, but the output may be enhanced with 3D part implying the position of an object in 3D coordinates and its rotation; hence, each point of 2D grid together with its associated 2D anchor box receives a probability score indicating probability of existence of an object within such area; then each class of object that the method is predicting receives a score indicating probability of object being of such class. Region proposal corresponding to such 2D grid point may consist of coordinates indicating placement of an object: (left, bottom, right, top, center_x, center_y, center_z, width, length, height, q1, q2, q3, q4). Thus, (left, bottom), (right, top) may imply most probable placement of an object in the image space, (center_x, center_y, center_z, width, length, height, q1, q2, q3, q4) the placement in 3D space, and q coordinates indicate rotation. Detection in form of the labels described above may be included, which may add a 3-dimensional part. Anobject detector 126 may carry out 2d objectdetections 128 and 3d objectdetections 130 based on theresidual blocks 124. - Depending on when the first three subnets (for camera data, lidar data, and radar data) are joined, middle fusion or late fusion may be provided as two separate architectures. In case of middle fusion, already joined features may be used when creating 3-scale detections. Details of the late fusion and middle fusion and of the 3-scale detections may be similar to the Yolo v3 architecture. Final predictions may depend on joint results of prediction from 3 different 2D grid granularities ("scales"). In the middle case, after only several residual blocks preparing initially the features separately from camera, lidar and radar, the output may be joined and the rest of the network may work on a "joint image" with features prepared by subnets as in case of standard 3 channel image. In case of late fusion, there may be a flow of 3 subnetworks as if working with separate 3 images (from 3 sensor types) and fuse the networks only before making the final predictions on each granularity scale.
-
Fig. 2 showsillustrations 200 of hl (hidden layer, or high level) view of implementation according to various embodiments of middle fusion (on the left) and late fusion (on the right). Thehl data 202 of the middle fusion may include joint features after a point of processing indicated by a dashedline 204, and thehl data 206 of the late fusion may include joint features after a point of processing indicated by a dashedline 208. hl data is illustrated in a sequence of processing (or time) from top to bottom. - In case of late fusion, all three processing pipelines are performed separately, only concatenating features just before each detections for each scale, with last residual blocks after the join.
- According to various embodiments, a loss function may be introduced, using yolo V3 loss and additional weighted L2 distances between 3d coordinates, and quaternion angle loss for learning yaw-pitch-roll.
-
Fig. 3 shows anillustration 300 of a scene with various3d bounding boxes -
Fig. 4 shows a flow diagram 400 illustrating a method for object detection according to various embodiments. At 402, a plurality of lidar data sets may be acquired from a lidar sensor. At 404, a plurality of radar data sets may be acquired from a radar sensor. At 406, at least one image may be acquired from a camera. At 408, concatenated data may be determined based on casting the plurality of lidar data sets and the plurality of radar data sets to the at least one image. At 410, an object may be detected based on the concatenated data. - According to various embodiments, a plurality of camera residual blocks may be determined.
- According to various embodiments, the camera data may be processed using a first artificial neural network.
- According to various embodiments, the casting may include aligning a plurality of sweeps of the lidar data sets.
- According to various embodiments, out linear depth completion of the lidar data sets may be carried out.
- According to various embodiments, a plurality of lidar residual blocks may be determined based on the linear depth completed lidar data.
- According to various embodiments, the lidar data sets may be processed using a second artificial neural network.
- According to various embodiments, the casting may include aligning a plurality of sweeps of the plurality of radar data sets.
- According to various embodiments, a plurality of radar residual blocks may be determined.
- According to various embodiments, the radar data may be processed using a third artificial neural network.
- According to various embodiments, the plurality of camera residual blocks, the plurality of lidar residual blocks, and the plurality of radar residual blocks may be concatenated.
- Each of the
steps -
Fig. 5 shows acomputer system 500 with a plurality of computer hardware components configured to carry out steps of a computer implemented method for object detection according to various embodiments. Thecomputer system 500 may include aprocessor 502, amemory 504, and anon-transitory data storage 506. At least onecamera 508, at least onelidar sensor 510, and at least oneradar sensor 512 may be provided as part of the computer system 500 (like illustrated inFig. 5 ), or may be provided external to thecomputer system 500. - The
processor 502 may carry out instructions provided in thememory 404. Thenon-transitory data storage 506 may store a computer program, including the instructions that may be transferred to thememory 504 and then executed by theprocessor 502. - The
processor 502, thememory 504, and thenon-transitory data storage 506 may be coupled with each other, e.g. via anelectrical connection 514, such as e.g. a cable or a computer bus or via any other suitable electrical connection to exchange electrical signals. The at least onecamera 508, the at least onelidar sensor 510, and/or the at least oneradar sensor 512 may be coupled to thecomputer system 500, for example via an external interface, or may be provided as parts of the computer system (in other words: internal to the computer system, for example coupled via the electrical connection 514). - The terms "coupling" or "connection" are intended to include a direct "coupling" (for example via a physical link) or direct "connection" as well as an indirect "coupling" or indirect "connection" (for example via a logical link), respectively.
- It will be understood that what has been described for one of the methods above may analogously hold true for the
computer system 400. -
- 100
- an illustration of an architecture for camera, lidar and radar fusion for 2d and 3d object detection task according to various embodiments
- 102
- camera data
- 104
- residual blocks
- 106
- lidar data
- 108
- alignment casting
- 110
- linear depth completion
- 112
- residual blocks
- 114
- radar data
- 116
- alignment casting
- 120
- residual blocks
- 122
- concatenation
- 124
- residual blocks
- 126
- object detector
- 128
- 2d object detection
- 130
- 3d object detection
- 200
- illustrations of hl view of implementation according to various embodiments of middle fusion and late fusion
- 202
- hl data
- 204
- dashed line
- 206
- hl data
- 208
- dashed line
- 300
- illustration of a scene with various 3d bounding boxes obtained according to various embodiments
- 302
- 3d bounding box
- 304
- 3d bounding box
- 400
- flow diagram illustrating a method for object detection according to various embodiments
- 402
- step of acquiring a plurality of lidar data sets from a lidar sensor
- 404
- step of acquiring a plurality of radar data sets from a radar sensor
- 406
- step of acquiring at least one image from a camera
- 408
- step of determining concatenated data based on casting the plurality of lidar data sets and the plurality of radar data sets to the at least one image
- 410
- step of detecting an object based on the concatenated data
- 500
- computer system according to various embodiments
- 502
- processor
- 504
- memory
- 506
- non-transitory data storage
- 508
- camera
- 510
- lidar sensor
- 512
- radar sensor
- 514
- connection
Claims (14)
- Computer implemented method for object detection,
the method comprising the following steps carried out by computer hardware components:- acquiring a plurality of lidar data sets from a lidar sensor;- acquiring a plurality of radar data sets from a radar sensor;- acquiring at least one image from a camera;- determining concatenated data based on casting the plurality of lidar data sets and the plurality of radar data sets to the at least one image; and- detecting an object based on the concatenated data. - The computer implemented method of claim 1, further comprising the following step carried out by the computer hardware components:
determining a plurality of camera residual blocks. - The computer implemented method of at least one of claims 1 or 2,
wherein the camera data is processed using a first artificial neural network. - The computer implemented method of at least one of claims 1 to 3,
wherein the casting comprises aligning a plurality of sweeps of the lidar data sets. - The computer implemented method of at least one of claims 1 to 4,
further comprising the following step carried out by the computer hardware components:
carrying out linear depth completion of the lidar data sets. - The computer implemented method of at least one of claims 1 to 5,
further comprising the following step carried out by the computer hardware components:
determining a plurality of lidar residual blocks based on the linear depth completed lidar data. - The computer implemented method of at least one of claims 1 to 6,
wherein the lidar data sets are processed using a second artificial neural network. - The computer implemented method of at least one of claims 1 to 7,
wherein the casting comprises aligning a plurality of sweeps of the plurality of radar data sets. - The computer implemented method of at least one of claims 1 to 8,
further comprising the following step carried out by the computer hardware components:
determining a plurality of radar residual blocks. - The computer implemented method of at least one of claims 1 to 9,
wherein the radar data is processed using a third artificial neural network. - The computer implemented method of at least one of claims 1 to 10, further comprising the following step carried out by the computer hardware components:
concatenating the plurality of camera residual blocks, the plurality of lidar residual blocks, and the plurality of radar residual blocks - Computer system, the computer system comprising a plurality of computer hardware components configured to carry out steps of the computer implemented method of at least one of claims 1 to 11.
- Vehicle, comprising a radar sensor, a lidar sensor and a camera, wherein the vehicle is configured to detect objects according to the computer implemented method of at least one of claims 1 to 11.
- Non-transitory computer readable medium comprising instructions for carrying out the computer implemented method of at least one of claims 1 to 11.
Priority Applications (3)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP20180636.1A EP3926360A1 (en) | 2020-06-17 | 2020-06-17 | Neural network based methods and systems for object detection using concatenated lidar, radar and camera data sets |
US17/235,407 US20210397907A1 (en) | 2020-06-17 | 2021-04-20 | Methods and Systems for Object Detection |
CN202110511342.XA CN113888458A (en) | 2020-06-17 | 2021-05-11 | Method and system for object detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
EP20180636.1A EP3926360A1 (en) | 2020-06-17 | 2020-06-17 | Neural network based methods and systems for object detection using concatenated lidar, radar and camera data sets |
Publications (1)
Publication Number | Publication Date |
---|---|
EP3926360A1 true EP3926360A1 (en) | 2021-12-22 |
Family
ID=71108380
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
EP20180636.1A Pending EP3926360A1 (en) | 2020-06-17 | 2020-06-17 | Neural network based methods and systems for object detection using concatenated lidar, radar and camera data sets |
Country Status (3)
Country | Link |
---|---|
US (1) | US20210397907A1 (en) |
EP (1) | EP3926360A1 (en) |
CN (1) | CN113888458A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023158706A1 (en) * | 2022-02-15 | 2023-08-24 | Waymo Llc | End-to-end processing in automated driving systems |
EP4361676A1 (en) * | 2022-10-28 | 2024-05-01 | Aptiv Technologies AG | Methods and systems for determining a property of an object |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112835037B (en) * | 2020-12-29 | 2021-12-07 | 清华大学 | All-weather target detection method based on fusion of vision and millimeter waves |
US20220414387A1 (en) * | 2021-06-23 | 2022-12-29 | Gm Cruise Holdings Llc | Enhanced object detection system based on height map data |
US20230050467A1 (en) * | 2021-08-11 | 2023-02-16 | Gm Cruise Holdings Llc | Ground height-map based elevation de-noising |
DE102022116320A1 (en) | 2022-06-30 | 2024-01-04 | Bayerische Motoren Werke Aktiengesellschaft | Method and device for determining an object class of an object |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180314253A1 (en) * | 2017-05-01 | 2018-11-01 | Mentor Graphics Development (Deutschland) Gmbh | Embedded automotive perception with machine learning classification of sensor data |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11537868B2 (en) * | 2017-11-13 | 2022-12-27 | Lyft, Inc. | Generation and update of HD maps using data from heterogeneous sources |
US10803325B2 (en) * | 2017-11-15 | 2020-10-13 | Uatc, Llc | Autonomous vehicle lane boundary detection systems and methods |
EP3525000B1 (en) * | 2018-02-09 | 2021-07-21 | Bayerische Motoren Werke Aktiengesellschaft | Methods and apparatuses for object detection in a scene based on lidar data and radar data of the scene |
US11500099B2 (en) * | 2018-03-14 | 2022-11-15 | Uatc, Llc | Three-dimensional object detection |
US11494937B2 (en) * | 2018-11-16 | 2022-11-08 | Uatc, Llc | Multi-task multi-sensor fusion for three-dimensional object detection |
US11927668B2 (en) * | 2018-11-30 | 2024-03-12 | Qualcomm Incorporated | Radar deep learning |
US11693423B2 (en) * | 2018-12-19 | 2023-07-04 | Waymo Llc | Model for excluding vehicle from sensor field of view |
US11062454B1 (en) * | 2019-04-16 | 2021-07-13 | Zoox, Inc. | Multi-modal sensor data association architecture |
US11852746B2 (en) * | 2019-10-07 | 2023-12-26 | Metawave Corporation | Multi-sensor fusion platform for bootstrapping the training of a beam steering radar |
CN111027401B (en) * | 2019-11-15 | 2022-05-03 | 电子科技大学 | End-to-end target detection method with integration of camera and laser radar |
US11625839B2 (en) * | 2020-05-18 | 2023-04-11 | Toyota Research Institute, Inc. | Bird's eye view based velocity estimation via self-supervised learning |
-
2020
- 2020-06-17 EP EP20180636.1A patent/EP3926360A1/en active Pending
-
2021
- 2021-04-20 US US17/235,407 patent/US20210397907A1/en not_active Abandoned
- 2021-05-11 CN CN202110511342.XA patent/CN113888458A/en active Pending
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180314253A1 (en) * | 2017-05-01 | 2018-11-01 | Mentor Graphics Development (Deutschland) Gmbh | Embedded automotive perception with machine learning classification of sensor data |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2023158706A1 (en) * | 2022-02-15 | 2023-08-24 | Waymo Llc | End-to-end processing in automated driving systems |
EP4361676A1 (en) * | 2022-10-28 | 2024-05-01 | Aptiv Technologies AG | Methods and systems for determining a property of an object |
Also Published As
Publication number | Publication date |
---|---|
US20210397907A1 (en) | 2021-12-23 |
CN113888458A (en) | 2022-01-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
EP3926360A1 (en) | Neural network based methods and systems for object detection using concatenated lidar, radar and camera data sets | |
US20210058608A1 (en) | Method and apparatus for generating three-dimensional (3d) road model | |
US10217007B2 (en) | Detecting method and device of obstacles based on disparity map and automobile driving assistance system | |
CA2678156C (en) | Measurement apparatus, measurement method, and feature identification apparatus | |
WO2020104423A1 (en) | Method and apparatus for data fusion of lidar data and image data | |
CN108764187A (en) | Extract method, apparatus, equipment, storage medium and the acquisition entity of lane line | |
CN113160068B (en) | Point cloud completion method and system based on image | |
EP3942794B1 (en) | Depth-guided video inpainting for autonomous driving | |
CN115797454B (en) | Multi-camera fusion sensing method and device under bird's eye view angle | |
KR101086274B1 (en) | Apparatus and method for extracting depth information | |
CN116543361A (en) | Multi-mode fusion sensing method and device for vehicle, vehicle and storage medium | |
CN115953563A (en) | Three-dimensional model completion repairing method and system based on point cloud vectorization framework matching | |
JP2022513830A (en) | How to detect and model an object on the surface of a road | |
JP2009092551A (en) | Method, apparatus and system for measuring obstacle | |
US20210407117A1 (en) | System and method for self-supervised monocular ground-plane extraction | |
WO2020118623A1 (en) | Method and system for generating an environment model for positioning | |
CN116630528A (en) | Static scene reconstruction method based on neural network | |
KR102641108B1 (en) | Apparatus and Method for Completing Depth Map | |
CN114359891A (en) | Three-dimensional vehicle detection method, system, device and medium | |
CN104637043A (en) | Supporting pixel selection method and device and parallax determination method | |
Kang et al. | 3D urban reconstruction from wide area aerial surveillance video | |
CN115236672A (en) | Obstacle information generation method, device, equipment and computer readable storage medium | |
EP4047516A1 (en) | Methods and systems for determining a distance of an object | |
EP4379605A1 (en) | Method for detection of map deviations, system and vehicle | |
EP4379321A1 (en) | Method for detection of map deviations, system and vehicle |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PUAI | Public reference made under article 153(3) epc to a published international application that has entered the european phase |
Free format text: ORIGINAL CODE: 0009012 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: THE APPLICATION HAS BEEN PUBLISHED |
|
AK | Designated contracting states |
Kind code of ref document: A1 Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
B565 | Issuance of search results under rule 164(2) epc |
Effective date: 20210201 |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE |
|
17P | Request for examination filed |
Effective date: 20220622 |
|
RBV | Designated contracting states (corrected) |
Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR |
|
RAP3 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: APTIV TECHNOLOGIES LIMITED |
|
STAA | Information on the status of an ep patent application or granted ep patent |
Free format text: STATUS: EXAMINATION IS IN PROGRESS |
|
17Q | First examination report despatched |
Effective date: 20230929 |
|
RAP1 | Party data changed (applicant data changed or rights of an application transferred) |
Owner name: APTIV TECHNOLOGIES AG |