EP3926360A1 - Neural network based methods and systems for object detection using concatenated lidar, radar and camera data sets - Google Patents

Neural network based methods and systems for object detection using concatenated lidar, radar and camera data sets Download PDF

Info

Publication number
EP3926360A1
EP3926360A1 EP20180636.1A EP20180636A EP3926360A1 EP 3926360 A1 EP3926360 A1 EP 3926360A1 EP 20180636 A EP20180636 A EP 20180636A EP 3926360 A1 EP3926360 A1 EP 3926360A1
Authority
EP
European Patent Office
Prior art keywords
lidar
implemented method
radar
data sets
computer implemented
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP20180636.1A
Other languages
German (de)
French (fr)
Inventor
Jakub DERBISZ
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Aptiv Technologies AG
Original Assignee
Aptiv Technologies Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Aptiv Technologies Ltd filed Critical Aptiv Technologies Ltd
Priority to EP20180636.1A priority Critical patent/EP3926360A1/en
Priority to US17/235,407 priority patent/US20210397907A1/en
Priority to CN202110511342.XA priority patent/CN113888458A/en
Publication of EP3926360A1 publication Critical patent/EP3926360A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S13/00Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
    • G01S13/88Radar or analogous systems specially adapted for specific applications
    • G01S13/93Radar or analogous systems specially adapted for specific applications for anti-collision purposes
    • G01S13/931Radar or analogous systems specially adapted for specific applications for anti-collision purposes of land vehicles
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S13/00Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
    • G01S13/86Combinations of radar systems with non-radar systems, e.g. sonar, direction finder
    • G01S13/865Combination of radar systems with lidar systems
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S13/00Systems using the reflection or reradiation of radio waves, e.g. radar systems; Analogous systems using reflection or reradiation of waves whose nature or wavelength is irrelevant or unspecified
    • G01S13/86Combinations of radar systems with non-radar systems, e.g. sonar, direction finder
    • G01S13/867Combination of radar systems with cameras
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/86Combinations of lidar systems with systems other than lidar, radar or sonar, e.g. with direction finders
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/88Lidar systems specially adapted for specific applications
    • G01S17/89Lidar systems specially adapted for specific applications for mapping or imaging
    • G01S17/8943D imaging with simultaneous measurement of time-of-flight at a 2D array of receiver pixels, e.g. time-of-flight cameras or flash lidar
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S17/00Systems using the reflection or reradiation of electromagnetic waves other than radio waves, e.g. lidar systems
    • G01S17/88Lidar systems specially adapted for specific applications
    • G01S17/93Lidar systems specially adapted for specific applications for anti-collision purposes
    • G01S17/931Lidar systems specially adapted for specific applications for anti-collision purposes of land vehicles
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S7/00Details of systems according to groups G01S13/00, G01S15/00, G01S17/00
    • G01S7/02Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00
    • G01S7/41Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00 using analysis of echo signal for target characterisation; Target signature; Target cross-section
    • G01S7/417Details of systems according to groups G01S13/00, G01S15/00, G01S17/00 of systems according to group G01S13/00 using analysis of echo signal for target characterisation; Target signature; Target cross-section involving the use of neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/251Fusion techniques of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/803Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of input or preprocessed data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10032Satellite or aerial image; Remote sensing
    • G06T2207/10044Radar image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]

Definitions

  • the present disclosure relates to methods and systems for object detection, for example for bounding box detection.
  • Object detection is an essential pre-requisite for various tasks, in particular in autonomously driving vehicles.
  • the present disclosure provides a computer implemented method, a computer system, a vehicle, and a non-transitory computer readable medium according to the independent claims. Embodiments are given in the subclaims, the description and the drawings.
  • the present disclosure is directed at a computer implemented method for object detection, the method comprising the following steps performed (in other words: carried out) by computer hardware components: acquiring a plurality of lidar data sets from a lidar sensor; acquiring a plurality of radar data sets from a radar sensor; acquiring at least one image from a camera; determining concatenated data based on casting (in other words: projecting) the plurality of lidar data sets and the plurality of radar data sets to the at least one image; and detecting an object based on the concatenated data.
  • Concatenating data may include enhancing sensor data from several prior sensor readings into one frame subsequently casted onto single camera frame.
  • casting and “projecting” (and likewise “cast” and “projection”) may be used interchangeably.
  • casting points onto a 2D camera space may be understood as projecting the points onto the camera plane (of the 2D camera space).
  • the projecting may be carried out using a pinhole camera model.
  • the computer implemented method further comprises the following step carried out by the computer hardware components: determining a plurality of camera residual blocks.
  • Residual blocks may be part of the artificial neural network which transforms data aiming to obtain features applicable in prediction, i.e., layers of neural network with skip connections as introduced in ResNet architecture.
  • the block includes a 2D convolutional layer, batch normalization, and a leaky ReLu activation function.
  • the camera data may be processed using a first artificial neural network.
  • the first artificial neural network may be a ResNet type convolutional neural network architecture transforming camera data.
  • the casting comprises aligning a plurality of sweeps of the lidar data sets. For example, several prior lidar sweeps are aligned to the most current sweep to enhance lidar points number, as if it was a single sweep with more dense lidar. Alignment is carried out as in nuScenes open API method that returns a point cloud that aggregates multiple sweeps. A pre-determined number of previous frames (for example 10 previous frames) may be mapped to a single reference frame, which may be the "current" (or a "current frame”) in this case. Homogeneous transformations matrices considering differences in translation and rotation of ego car may be used to align a previous frame with the reference frame.
  • the computer implemented method further comprises the following step carried out by the computer hardware components: carrying out linear depth completion of the lidar data sets.
  • Linear depth completion may be used to further enhance lidar points density. It is performed on already projected lidar points onto 2D camera plane. Afterwards each 2D plane point has depth linearly estimated from the lidar depths from nearest points.
  • Such depth completion is quick and allows to obtain "depth images", which even though coming only from lidar allow to utilize for example convolutional neural networks used with images processing.
  • the computer implemented method further comprises the following step carried out by the computer hardware components: determining a plurality of lidar residual blocks based on the linear depth completed lidar data. Having "depth images", the same type of residual blocks as in case of camera data may be utilized.
  • the lidar data sets are processed using a second artificial neural network.
  • the second artificial neural network may be a ResNet type artificial neural network architecture, as for camera, transforming "depth images”.
  • the casting comprises aligning a plurality of sweeps of the plurality of radar data sets.
  • the aligning of radar data sets may be similar to the aligning of lidar data sets.
  • the computer implemented method further comprises the following step carried out by the computer hardware components: determining a plurality of radar residual blocks.
  • the residual blocks may be similar as in the case of camera data.
  • the radar data is processed using a third artificial neural network.
  • the third artificial neural network may be a ResNet type convolutional neural network architecture, as for camera, transforming "velocity images”.
  • the computer implemented method further comprises the following step carried out by the computer hardware components: concatenating the plurality of camera residual blocks, the plurality of lidar residual blocks, and the plurality of radar residual blocks (for example to obtain concatenated data).
  • accurate 3d and 2d bounding box detections may be made with the neural network using camera-lidar-radar fused data, where KPI (key performance indicator) metrics of our solution may be tested on nuScenes dataset casting lidar and radar onto front camera, on trained and validation/test sets prepared and consisting of separate scenes.
  • KPI key performance indicator
  • the present disclosure is directed at a computer system, said computer system comprising a plurality of computer hardware components configured to carry out several or all steps of the computer implemented method described herein.
  • the computer system may be part of a vehicle.
  • the computer system may comprise a plurality of computer hardware components (for example a processor, for example processing unit or processing network, at least one memory, for example memory unit or memory network, and at least one non-transitory data storage). It will be understood that further computer hardware components may be provided and used for carrying out steps of the computer implemented method in the computer system.
  • the non-transitory data storage and/or the memory unit may comprise a computer program for instructing the computer to perform several or all steps or aspects of the computer implemented method described herein, for example using the processing unit and the at least one memory unit.
  • the present disclosure is directed at a vehicle, comprising a radar sensor, a lidar sensor and a camera, wherein the vehicle is configured to detect objects according to the computer implemented method described herein.
  • the present disclosure is directed at a non-transitory computer readable medium comprising instructions for carrying out several or all steps or aspects of the computer implemented method described herein.
  • the computer readable medium may be configured as: an optical medium, such as a compact disc (CD) or a digital versatile disk (DVD); a magnetic medium, such as a hard disk drive (HDD); a solid state drive (SSD); a read only memory (ROM), such as a flash memory; or the like.
  • the computer readable medium may be configured as a data storage that is accessible via a data connection, such as an internet connection.
  • the computer readable medium may, for example, be an online data repository or a cloud storage.
  • the present disclosure is also directed at a computer program for instructing a computer to perform several or all steps or aspects of the computer implemented method described herein.
  • Neural networks may be used in object detection tasks, for example in automotive industry, wherein bounding boxes may be placed around objects belonging to certain classes of interest (such as cars, pedestrians, or traffic signs).
  • a lidar (light detection and ranging) sensor may be used, for example in combination with other sensors.
  • a lidar sensor may directly provide a point cloud in 3d coordinate space.
  • the outputs from several sensors may be fused to provide useful information/features of the class of objects to be found.
  • lidar sensor data, radar sensor data, and camera data may be fused together in an efficient way for 2d and 3d object detection tasks, and a neural networks architecture may be provided to make benefits of such a fusion.
  • Fig. 1 shows an illustration 100 of a system (in other words: an architecture) for camera, lidar and radar fusion for 2d and 3d object detection task according to various embodiments.
  • Inputs to the system are several sensor frames, for example from up to 0.5 passed seconds, appropriate prepared, which can be viewed as a W x H x C dimensional input frame (wherein W is the width, H is the height, and C is the number of features, for example comparing to W x H x 3 for RGB (red-green-blue) images), and which may be transformed and used for further processing, for example in an artificial neural network, for example in a convolutional neural network.
  • Camera data 102 (including camera frames), lidar data 106 (including lidar pointcloud frames), and radar data 114 (including radar frames) may be processed as will be described in the following. All radar and lidar pointcloud frames may be casted onto most recent ('current') camera frame of size W x H.
  • An alignment casting 108 (for example 10-sweeps alignment casting) may be carried out for the lidar data 106.
  • the lidar frames may be linearly depth completed (as illustrated by block 110).
  • An alignment casting 116 (for example 6-sweeps alignment casting) may be carried out for the radar data 114.
  • RGB camera channels and channels from lidar and radar casts from previous sweeps which form C channels in a W x H x C input frame may be used.
  • camera frames from previous 0.5 seconds timestamps contributing to C channels may be used.
  • the input consists of W x H x 3 for camera (or 3*CO in case one uses CO previous camera frames), W x H x C1 for C1 prepared lidar casts, W x H x C2 for C2 prepared radar casts.
  • an artificial neural network may be used for transforming such inputs to obtain 3d and/or 2d object detections on the output.
  • a SSD Single Shot MultiBox Detector
  • a SSD Single Shot MultiBox Detector like neural network, may be used, which may be able to make 3d bounding boxes detecting objects.
  • yolo V3 and SSD networks may be used, which work on single 2d image input outputting 2d bounding boxes only.
  • ground truth labels and underlying architecture may be introduced to be able to infer object distances and sizes (width, length, height) in 3d space, together with yaw-pitch-roll angles of an object.
  • the following labels may be taken: (left, bottom, right, top, center_x, center _y, center_z, width, length, height, q1 , q2, q3, q4), where (left, bottom, right, top) are 2d bounding box coordinates of an object in 2d camera image space.
  • (center_x, center _y, center_z, width, length, height, q1, q2, q3, q4) may be provided in 3d camera coordinate system, and q1,q2,q3,q4 may be quaternions describing yaw-pitch-roll angles.
  • the network may be divided into three phases related to processing time.
  • a first phase may consist of three separate subnets of residual blocks 104, 112, 120, transforming the camera data 102, the lidar data 106, and the radar data 114.
  • a second phase after the residual blocks 104, 112, 120 are obtained, features extracted this way by each networks may be concatenated (like indicated by block 122) into a single feature frame.
  • the second phase includes a further transformation of joint data using residual blocks. Concatenation in 122 applies to concatenating results from previous 3 residual neural network subnets transforming 1) camera 2) lidar 3) radar.
  • a third (or last) phase may include outputting class scores and region proposals.
  • Outputting the class scores and region proposals may be similar as in Yolo v3, but the output may be enhanced with 3D part implying the position of an object in 3D coordinates and its rotation; hence, each point of 2D grid together with its associated 2D anchor box receives a probability score indicating probability of existence of an object within such area; then each class of object that the method is predicting receives a score indicating probability of object being of such class.
  • Region proposal corresponding to such 2D grid point may consist of coordinates indicating placement of an object: (left, bottom, right, top, center_x, center_y, center_z, width, length, height, q1, q2, q3, q4).
  • (left, bottom), (right, top) may imply most probable placement of an object in the image space, (center_x, center_y, center_z, width, length, height, q1, q2, q3, q4) the placement in 3D space, and q coordinates indicate rotation.
  • Detection in form of the labels described above may be included, which may add a 3-dimensional part.
  • An object detector 126 may carry out 2d object detections 128 and 3d object detections 130 based on the residual blocks 124.
  • middle fusion or late fusion may be provided as two separate architectures.
  • already joined features may be used when creating 3-scale detections. Details of the late fusion and middle fusion and of the 3-scale detections may be similar to the Yolo v3 architecture.
  • Final predictions may depend on joint results of prediction from 3 different 2D grid granularities ("scales").
  • the output may be joined and the rest of the network may work on a "joint image" with features prepared by subnets as in case of standard 3 channel image.
  • late fusion there may be a flow of 3 subnetworks as if working with separate 3 images (from 3 sensor types) and fuse the networks only before making the final predictions on each granularity scale.
  • Fig. 2 shows illustrations 200 of hl (hidden layer, or high level) view of implementation according to various embodiments of middle fusion (on the left) and late fusion (on the right).
  • the hl data 202 of the middle fusion may include joint features after a point of processing indicated by a dashed line 204
  • the hl data 206 of the late fusion may include joint features after a point of processing indicated by a dashed line 208.
  • hl data is illustrated in a sequence of processing (or time) from top to bottom.
  • a loss function may be introduced, using yolo V3 loss and additional weighted L2 distances between 3d coordinates, and quaternion angle loss for learning yaw-pitch-roll.
  • Fig. 3 shows an illustration 300 of a scene with various 3d bounding boxes 302, 304 obtained according to various embodiments.
  • Fig. 4 shows a flow diagram 400 illustrating a method for object detection according to various embodiments.
  • a plurality of lidar data sets may be acquired from a lidar sensor.
  • a plurality of radar data sets may be acquired from a radar sensor.
  • at least one image may be acquired from a camera.
  • concatenated data may be determined based on casting the plurality of lidar data sets and the plurality of radar data sets to the at least one image.
  • an object may be detected based on the concatenated data.
  • a plurality of camera residual blocks may be determined.
  • the camera data may be processed using a first artificial neural network.
  • the casting may include aligning a plurality of sweeps of the lidar data sets.
  • out linear depth completion of the lidar data sets may be carried out.
  • a plurality of lidar residual blocks may be determined based on the linear depth completed lidar data.
  • the lidar data sets may be processed using a second artificial neural network.
  • the casting may include aligning a plurality of sweeps of the plurality of radar data sets.
  • a plurality of radar residual blocks may be determined.
  • the radar data may be processed using a third artificial neural network.
  • the plurality of camera residual blocks, the plurality of lidar residual blocks, and the plurality of radar residual blocks may be concatenated.
  • Each of the steps 402, 404, 406, 408, 410 and the further steps described above may be performed by computer hardware components.
  • Fig. 5 shows a computer system 500 with a plurality of computer hardware components configured to carry out steps of a computer implemented method for object detection according to various embodiments.
  • the computer system 500 may include a processor 502, a memory 504, and a non-transitory data storage 506.
  • At least one camera 508, at least one lidar sensor 510, and at least one radar sensor 512 may be provided as part of the computer system 500 (like illustrated in Fig. 5 ), or may be provided external to the computer system 500.
  • the processor 502 may carry out instructions provided in the memory 404.
  • the non-transitory data storage 506 may store a computer program, including the instructions that may be transferred to the memory 504 and then executed by the processor 502.
  • the processor 502, the memory 504, and the non-transitory data storage 506 may be coupled with each other, e.g. via an electrical connection 514, such as e.g. a cable or a computer bus or via any other suitable electrical connection to exchange electrical signals.
  • the at least one camera 508, the at least one lidar sensor 510, and/or the at least one radar sensor 512 may be coupled to the computer system 500, for example via an external interface, or may be provided as parts of the computer system (in other words: internal to the computer system, for example coupled via the electrical connection 514).
  • Coupled or “connection” are intended to include a direct “coupling” (for example via a physical link) or direct “connection” as well as an indirect “coupling” or indirect “connection” (for example via a logical link), respectively.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Multimedia (AREA)
  • Electromagnetism (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Quality & Reliability (AREA)
  • Traffic Control Systems (AREA)
  • Image Analysis (AREA)

Abstract

A computer implemented method for object detection comprises the following steps carried out by computer hardware components: acquiring a plurality of lidar data sets from a lidar sensor; acquiring a plurality of radar data sets from a radar sensor; acquiring at least one image from a camera; determining concatenated data based on casting the plurality of lidar data sets and the plurality of radar data sets to the at least one image; and detecting an object based on the concatenated data.

Description

    FIELD
  • The present disclosure relates to methods and systems for object detection, for example for bounding box detection.
  • BACKGROUND
  • Object detection is an essential pre-requisite for various tasks, in particular in autonomously driving vehicles.
  • Accordingly, there is a need to provide efficient and reliable object detection.
  • SUMMARY
  • The present disclosure provides a computer implemented method, a computer system, a vehicle, and a non-transitory computer readable medium according to the independent claims. Embodiments are given in the subclaims, the description and the drawings.
  • In one aspect, the present disclosure is directed at a computer implemented method for object detection, the method comprising the following steps performed (in other words: carried out) by computer hardware components: acquiring a plurality of lidar data sets from a lidar sensor; acquiring a plurality of radar data sets from a radar sensor; acquiring at least one image from a camera; determining concatenated data based on casting (in other words: projecting) the plurality of lidar data sets and the plurality of radar data sets to the at least one image; and detecting an object based on the concatenated data. Concatenating data may include enhancing sensor data from several prior sensor readings into one frame subsequently casted onto single camera frame.
  • As used herein, "casting" and "projecting" (and likewise "cast" and "projection") may be used interchangeably. For example, casting points onto a 2D camera space may be understood as projecting the points onto the camera plane (of the 2D camera space). For example, the projecting may be carried out using a pinhole camera model.
  • According to another aspect, the computer implemented method further comprises the following step carried out by the computer hardware components: determining a plurality of camera residual blocks. Residual blocks may be part of the artificial neural network which transforms data aiming to obtain features applicable in prediction, i.e., layers of neural network with skip connections as introduced in ResNet architecture. Here, the block includes a 2D convolutional layer, batch normalization, and a leaky ReLu activation function.
  • According to another aspect, the camera data may be processed using a first artificial neural network. The first artificial neural network may be a ResNet type convolutional neural network architecture transforming camera data.
  • According to another aspect, the casting comprises aligning a plurality of sweeps of the lidar data sets. For example, several prior lidar sweeps are aligned to the most current sweep to enhance lidar points number, as if it was a single sweep with more dense lidar. Alignment is carried out as in nuScenes open API method that returns a point cloud that aggregates multiple sweeps. A pre-determined number of previous frames (for example 10 previous frames) may be mapped to a single reference frame, which may be the "current" (or a "current frame") in this case. Homogeneous transformations matrices considering differences in translation and rotation of ego car may be used to align a previous frame with the reference frame.
  • According to another aspect, the computer implemented method further comprises the following step carried out by the computer hardware components: carrying out linear depth completion of the lidar data sets. Linear depth completion may be used to further enhance lidar points density. It is performed on already projected lidar points onto 2D camera plane. Afterwards each 2D plane point has depth linearly estimated from the lidar depths from nearest points. Such depth completion is quick and allows to obtain "depth images", which even though coming only from lidar allow to utilize for example convolutional neural networks used with images processing.
  • According to another aspect, the computer implemented method further comprises the following step carried out by the computer hardware components: determining a plurality of lidar residual blocks based on the linear depth completed lidar data. Having "depth images", the same type of residual blocks as in case of camera data may be utilized.
  • According to another aspect, the lidar data sets are processed using a second artificial neural network. The second artificial neural network may be a ResNet type artificial neural network architecture, as for camera, transforming "depth images".
  • According to another aspect, the casting comprises aligning a plurality of sweeps of the plurality of radar data sets. The aligning of radar data sets may be similar to the aligning of lidar data sets.
  • According to another aspect, the computer implemented method further comprises the following step carried out by the computer hardware components: determining a plurality of radar residual blocks. The residual blocks may be similar as in the case of camera data.
  • According to another aspect, the radar data is processed using a third artificial neural network. The third artificial neural network may be a ResNet type convolutional neural network architecture, as for camera, transforming "velocity images".
  • According to another aspect, the computer implemented method further comprises the following step carried out by the computer hardware components: concatenating the plurality of camera residual blocks, the plurality of lidar residual blocks, and the plurality of radar residual blocks (for example to obtain concatenated data).
  • With the method according to various aspects, accurate 3d and 2d bounding box detections may be made with the neural network using camera-lidar-radar fused data, where KPI (key performance indicator) metrics of our solution may be tested on nuScenes dataset casting lidar and radar onto front camera, on trained and validation/test sets prepared and consisting of separate scenes.
  • In another aspect, the present disclosure is directed at a computer system, said computer system comprising a plurality of computer hardware components configured to carry out several or all steps of the computer implemented method described herein. The computer system may be part of a vehicle.
  • The computer system may comprise a plurality of computer hardware components (for example a processor, for example processing unit or processing network, at least one memory, for example memory unit or memory network, and at least one non-transitory data storage). It will be understood that further computer hardware components may be provided and used for carrying out steps of the computer implemented method in the computer system. The non-transitory data storage and/or the memory unit may comprise a computer program for instructing the computer to perform several or all steps or aspects of the computer implemented method described herein, for example using the processing unit and the at least one memory unit.
  • In another aspect, the present disclosure is directed at a vehicle, comprising a radar sensor, a lidar sensor and a camera, wherein the vehicle is configured to detect objects according to the computer implemented method described herein.
  • In another aspect, the present disclosure is directed at a non-transitory computer readable medium comprising instructions for carrying out several or all steps or aspects of the computer implemented method described herein. The computer readable medium may be configured as: an optical medium, such as a compact disc (CD) or a digital versatile disk (DVD); a magnetic medium, such as a hard disk drive (HDD); a solid state drive (SSD); a read only memory (ROM), such as a flash memory; or the like. Furthermore, the computer readable medium may be configured as a data storage that is accessible via a data connection, such as an internet connection. The computer readable medium may, for example, be an online data repository or a cloud storage.
  • The present disclosure is also directed at a computer program for instructing a computer to perform several or all steps or aspects of the computer implemented method described herein.
  • DRAWINGS
  • Exemplary embodiments and functions of the present disclosure are described herein in conjunction with the following drawings, showing schematically
  • Fig. 1
    an illustration of an architecture for camera, lidar and radar fusion for 2d and 3d object detection task according to various embodiments;
    Fig. 2
    illustrations of hl view of implementation according to various embodiments of middle fusion and late fusion;
    Fig. 3
    an illustration of a scene with various 3d bounding boxes obtained according to various embodiments;
    Fig. 4
    a flow diagram illustrating a method for object detection according to various embodiments; and
    Fig. 5
    a computer system with a plurality of computer hardware components configured to carry out steps of a computer implemented method for object detection according to various embodiments.
    DETAILED DESCRIPTION
  • Neural networks may be used in object detection tasks, for example in automotive industry, wherein bounding boxes may be placed around objects belonging to certain classes of interest (such as cars, pedestrians, or traffic signs).
  • For 2d (two-dimensional) bounding box detection, it may be sufficient to use a single camera. For 3d (three-dimensional) bounding box detection, it may be desired to determine the distance of an object of interest from the ego vehicle. For example a lidar (light detection and ranging) sensor may be used, for example in combination with other sensors. A lidar sensor may directly provide a point cloud in 3d coordinate space. To further increase safety and accuracy of object detection task, the outputs from several sensors may be fused to provide useful information/features of the class of objects to be found.
  • According to various embodiments, lidar sensor data, radar sensor data, and camera data may be fused together in an efficient way for 2d and 3d object detection tasks, and a neural networks architecture may be provided to make benefits of such a fusion.
  • Fig. 1 shows an illustration 100 of a system (in other words: an architecture) for camera, lidar and radar fusion for 2d and 3d object detection task according to various embodiments.
  • Inputs to the system are several sensor frames, for example from up to 0.5 passed seconds, appropriate prepared, which can be viewed as a W x H x C dimensional input frame (wherein W is the width, H is the height, and C is the number of features, for example comparing to W x H x 3 for RGB (red-green-blue) images), and which may be transformed and used for further processing, for example in an artificial neural network, for example in a convolutional neural network.
  • Camera data 102 (including camera frames), lidar data 106 (including lidar pointcloud frames), and radar data 114 (including radar frames) may be processed as will be described in the following. All radar and lidar pointcloud frames may be casted onto most recent ('current') camera frame of size W x H. An alignment casting 108 (for example 10-sweeps alignment casting) may be carried out for the lidar data 106. The lidar frames may be linearly depth completed (as illustrated by block 110). An alignment casting 116 (for example 6-sweeps alignment casting) may be carried out for the radar data 114.
  • For a single frame entry, RGB camera channels and channels from lidar and radar casts from previous sweeps which form C channels in a W x H x C input frame may be used. According to various embodiments, instead of using a single, most current, camera frame, camera frames from previous 0.5 seconds timestamps contributing to C channels may be used. To sum up, the input consists of W x H x 3 for camera (or 3*CO in case one uses CO previous camera frames), W x H x C1 for C1 prepared lidar casts, W x H x C2 for C2 prepared radar casts.
  • According to various embodiments, an artificial neural network may be used for transforming such inputs to obtain 3d and/or 2d object detections on the output. A SSD (Single Shot MultiBox Detector) like neural network, may be used, which may be able to make 3d bounding boxes detecting objects. For example, yolo V3 and SSD networks may be used, which work on single 2d image input outputting 2d bounding boxes only. According to various embodiments, ground truth labels and underlying architecture may be introduced to be able to infer object distances and sizes (width, length, height) in 3d space, together with yaw-pitch-roll angles of an object.
  • According to various embodiments, the following labels may be taken: (left, bottom, right, top, center_x, center _y, center_z, width, length, height, q1 , q2, q3, q4), where (left, bottom, right, top) are 2d bounding box coordinates of an object in 2d camera image space. (center_x, center _y, center_z, width, length, height, q1, q2, q3, q4) may be provided in 3d camera coordinate system, and q1,q2,q3,q4 may be quaternions describing yaw-pitch-roll angles.
  • The network may be divided into three phases related to processing time. In a first phase may consist of three separate subnets of residual blocks 104, 112, 120, transforming the camera data 102, the lidar data 106, and the radar data 114. In a second phase, after the residual blocks 104, 112, 120 are obtained, features extracted this way by each networks may be concatenated (like indicated by block 122) into a single feature frame. The second phase includes a further transformation of joint data using residual blocks. Concatenation in 122 applies to concatenating results from previous 3 residual neural network subnets transforming 1) camera 2) lidar 3) radar. Afterwards in the residual block 124, such "joint" image (which may have 128*3 channels, where there may be 128 channel subnet output) having features from those 3 sensors may further be transformed by a residual network. A third (or last) phase may include outputting class scores and region proposals. Outputting the class scores and region proposals may be similar as in Yolo v3, but the output may be enhanced with 3D part implying the position of an object in 3D coordinates and its rotation; hence, each point of 2D grid together with its associated 2D anchor box receives a probability score indicating probability of existence of an object within such area; then each class of object that the method is predicting receives a score indicating probability of object being of such class. Region proposal corresponding to such 2D grid point may consist of coordinates indicating placement of an object: (left, bottom, right, top, center_x, center_y, center_z, width, length, height, q1, q2, q3, q4). Thus, (left, bottom), (right, top) may imply most probable placement of an object in the image space, (center_x, center_y, center_z, width, length, height, q1, q2, q3, q4) the placement in 3D space, and q coordinates indicate rotation. Detection in form of the labels described above may be included, which may add a 3-dimensional part. An object detector 126 may carry out 2d object detections 128 and 3d object detections 130 based on the residual blocks 124.
  • Depending on when the first three subnets (for camera data, lidar data, and radar data) are joined, middle fusion or late fusion may be provided as two separate architectures. In case of middle fusion, already joined features may be used when creating 3-scale detections. Details of the late fusion and middle fusion and of the 3-scale detections may be similar to the Yolo v3 architecture. Final predictions may depend on joint results of prediction from 3 different 2D grid granularities ("scales"). In the middle case, after only several residual blocks preparing initially the features separately from camera, lidar and radar, the output may be joined and the rest of the network may work on a "joint image" with features prepared by subnets as in case of standard 3 channel image. In case of late fusion, there may be a flow of 3 subnetworks as if working with separate 3 images (from 3 sensor types) and fuse the networks only before making the final predictions on each granularity scale.
  • Fig. 2 shows illustrations 200 of hl (hidden layer, or high level) view of implementation according to various embodiments of middle fusion (on the left) and late fusion (on the right). The hl data 202 of the middle fusion may include joint features after a point of processing indicated by a dashed line 204, and the hl data 206 of the late fusion may include joint features after a point of processing indicated by a dashed line 208. hl data is illustrated in a sequence of processing (or time) from top to bottom.
  • In case of late fusion, all three processing pipelines are performed separately, only concatenating features just before each detections for each scale, with last residual blocks after the join.
  • According to various embodiments, a loss function may be introduced, using yolo V3 loss and additional weighted L2 distances between 3d coordinates, and quaternion angle loss for learning yaw-pitch-roll.
  • Fig. 3 shows an illustration 300 of a scene with various 3d bounding boxes 302, 304 obtained according to various embodiments.
  • Fig. 4 shows a flow diagram 400 illustrating a method for object detection according to various embodiments. At 402, a plurality of lidar data sets may be acquired from a lidar sensor. At 404, a plurality of radar data sets may be acquired from a radar sensor. At 406, at least one image may be acquired from a camera. At 408, concatenated data may be determined based on casting the plurality of lidar data sets and the plurality of radar data sets to the at least one image. At 410, an object may be detected based on the concatenated data.
  • According to various embodiments, a plurality of camera residual blocks may be determined.
  • According to various embodiments, the camera data may be processed using a first artificial neural network.
  • According to various embodiments, the casting may include aligning a plurality of sweeps of the lidar data sets.
  • According to various embodiments, out linear depth completion of the lidar data sets may be carried out.
  • According to various embodiments, a plurality of lidar residual blocks may be determined based on the linear depth completed lidar data.
  • According to various embodiments, the lidar data sets may be processed using a second artificial neural network.
  • According to various embodiments, the casting may include aligning a plurality of sweeps of the plurality of radar data sets.
  • According to various embodiments, a plurality of radar residual blocks may be determined.
  • According to various embodiments, the radar data may be processed using a third artificial neural network.
  • According to various embodiments, the plurality of camera residual blocks, the plurality of lidar residual blocks, and the plurality of radar residual blocks may be concatenated.
  • Each of the steps 402, 404, 406, 408, 410 and the further steps described above may be performed by computer hardware components.
  • Fig. 5 shows a computer system 500 with a plurality of computer hardware components configured to carry out steps of a computer implemented method for object detection according to various embodiments. The computer system 500 may include a processor 502, a memory 504, and a non-transitory data storage 506. At least one camera 508, at least one lidar sensor 510, and at least one radar sensor 512 may be provided as part of the computer system 500 (like illustrated in Fig. 5), or may be provided external to the computer system 500.
  • The processor 502 may carry out instructions provided in the memory 404. The non-transitory data storage 506 may store a computer program, including the instructions that may be transferred to the memory 504 and then executed by the processor 502.
  • The processor 502, the memory 504, and the non-transitory data storage 506 may be coupled with each other, e.g. via an electrical connection 514, such as e.g. a cable or a computer bus or via any other suitable electrical connection to exchange electrical signals. The at least one camera 508, the at least one lidar sensor 510, and/or the at least one radar sensor 512 may be coupled to the computer system 500, for example via an external interface, or may be provided as parts of the computer system (in other words: internal to the computer system, for example coupled via the electrical connection 514).
  • The terms "coupling" or "connection" are intended to include a direct "coupling" (for example via a physical link) or direct "connection" as well as an indirect "coupling" or indirect "connection" (for example via a logical link), respectively.
  • It will be understood that what has been described for one of the methods above may analogously hold true for the computer system 400.
  • Reference numeral list
  • 100
    an illustration of an architecture for camera, lidar and radar fusion for 2d and 3d object detection task according to various embodiments
    102
    camera data
    104
    residual blocks
    106
    lidar data
    108
    alignment casting
    110
    linear depth completion
    112
    residual blocks
    114
    radar data
    116
    alignment casting
    120
    residual blocks
    122
    concatenation
    124
    residual blocks
    126
    object detector
    128
    2d object detection
    130
    3d object detection
    200
    illustrations of hl view of implementation according to various embodiments of middle fusion and late fusion
    202
    hl data
    204
    dashed line
    206
    hl data
    208
    dashed line
    300
    illustration of a scene with various 3d bounding boxes obtained according to various embodiments
    302
    3d bounding box
    304
    3d bounding box
    400
    flow diagram illustrating a method for object detection according to various embodiments
    402
    step of acquiring a plurality of lidar data sets from a lidar sensor
    404
    step of acquiring a plurality of radar data sets from a radar sensor
    406
    step of acquiring at least one image from a camera
    408
    step of determining concatenated data based on casting the plurality of lidar data sets and the plurality of radar data sets to the at least one image
    410
    step of detecting an object based on the concatenated data
    500
    computer system according to various embodiments
    502
    processor
    504
    memory
    506
    non-transitory data storage
    508
    camera
    510
    lidar sensor
    512
    radar sensor
    514
    connection

Claims (14)

  1. Computer implemented method for object detection,
    the method comprising the following steps carried out by computer hardware components:
    - acquiring a plurality of lidar data sets from a lidar sensor;
    - acquiring a plurality of radar data sets from a radar sensor;
    - acquiring at least one image from a camera;
    - determining concatenated data based on casting the plurality of lidar data sets and the plurality of radar data sets to the at least one image; and
    - detecting an object based on the concatenated data.
  2. The computer implemented method of claim 1, further comprising the following step carried out by the computer hardware components:
    determining a plurality of camera residual blocks.
  3. The computer implemented method of at least one of claims 1 or 2,
    wherein the camera data is processed using a first artificial neural network.
  4. The computer implemented method of at least one of claims 1 to 3,
    wherein the casting comprises aligning a plurality of sweeps of the lidar data sets.
  5. The computer implemented method of at least one of claims 1 to 4,
    further comprising the following step carried out by the computer hardware components:
    carrying out linear depth completion of the lidar data sets.
  6. The computer implemented method of at least one of claims 1 to 5,
    further comprising the following step carried out by the computer hardware components:
    determining a plurality of lidar residual blocks based on the linear depth completed lidar data.
  7. The computer implemented method of at least one of claims 1 to 6,
    wherein the lidar data sets are processed using a second artificial neural network.
  8. The computer implemented method of at least one of claims 1 to 7,
    wherein the casting comprises aligning a plurality of sweeps of the plurality of radar data sets.
  9. The computer implemented method of at least one of claims 1 to 8,
    further comprising the following step carried out by the computer hardware components:
    determining a plurality of radar residual blocks.
  10. The computer implemented method of at least one of claims 1 to 9,
    wherein the radar data is processed using a third artificial neural network.
  11. The computer implemented method of at least one of claims 1 to 10, further comprising the following step carried out by the computer hardware components:
    concatenating the plurality of camera residual blocks, the plurality of lidar residual blocks, and the plurality of radar residual blocks
  12. Computer system, the computer system comprising a plurality of computer hardware components configured to carry out steps of the computer implemented method of at least one of claims 1 to 11.
  13. Vehicle, comprising a radar sensor, a lidar sensor and a camera, wherein the vehicle is configured to detect objects according to the computer implemented method of at least one of claims 1 to 11.
  14. Non-transitory computer readable medium comprising instructions for carrying out the computer implemented method of at least one of claims 1 to 11.
EP20180636.1A 2020-06-17 2020-06-17 Neural network based methods and systems for object detection using concatenated lidar, radar and camera data sets Pending EP3926360A1 (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
EP20180636.1A EP3926360A1 (en) 2020-06-17 2020-06-17 Neural network based methods and systems for object detection using concatenated lidar, radar and camera data sets
US17/235,407 US20210397907A1 (en) 2020-06-17 2021-04-20 Methods and Systems for Object Detection
CN202110511342.XA CN113888458A (en) 2020-06-17 2021-05-11 Method and system for object detection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP20180636.1A EP3926360A1 (en) 2020-06-17 2020-06-17 Neural network based methods and systems for object detection using concatenated lidar, radar and camera data sets

Publications (1)

Publication Number Publication Date
EP3926360A1 true EP3926360A1 (en) 2021-12-22

Family

ID=71108380

Family Applications (1)

Application Number Title Priority Date Filing Date
EP20180636.1A Pending EP3926360A1 (en) 2020-06-17 2020-06-17 Neural network based methods and systems for object detection using concatenated lidar, radar and camera data sets

Country Status (3)

Country Link
US (1) US20210397907A1 (en)
EP (1) EP3926360A1 (en)
CN (1) CN113888458A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023158706A1 (en) * 2022-02-15 2023-08-24 Waymo Llc End-to-end processing in automated driving systems
EP4361676A1 (en) * 2022-10-28 2024-05-01 Aptiv Technologies AG Methods and systems for determining a property of an object

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112835037B (en) * 2020-12-29 2021-12-07 清华大学 All-weather target detection method based on fusion of vision and millimeter waves
US20220414387A1 (en) * 2021-06-23 2022-12-29 Gm Cruise Holdings Llc Enhanced object detection system based on height map data
US20230050467A1 (en) * 2021-08-11 2023-02-16 Gm Cruise Holdings Llc Ground height-map based elevation de-noising
DE102022116320A1 (en) 2022-06-30 2024-01-04 Bayerische Motoren Werke Aktiengesellschaft Method and device for determining an object class of an object

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180314253A1 (en) * 2017-05-01 2018-11-01 Mentor Graphics Development (Deutschland) Gmbh Embedded automotive perception with machine learning classification of sensor data

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11537868B2 (en) * 2017-11-13 2022-12-27 Lyft, Inc. Generation and update of HD maps using data from heterogeneous sources
US10803325B2 (en) * 2017-11-15 2020-10-13 Uatc, Llc Autonomous vehicle lane boundary detection systems and methods
EP3525000B1 (en) * 2018-02-09 2021-07-21 Bayerische Motoren Werke Aktiengesellschaft Methods and apparatuses for object detection in a scene based on lidar data and radar data of the scene
US11500099B2 (en) * 2018-03-14 2022-11-15 Uatc, Llc Three-dimensional object detection
US11494937B2 (en) * 2018-11-16 2022-11-08 Uatc, Llc Multi-task multi-sensor fusion for three-dimensional object detection
US11927668B2 (en) * 2018-11-30 2024-03-12 Qualcomm Incorporated Radar deep learning
US11693423B2 (en) * 2018-12-19 2023-07-04 Waymo Llc Model for excluding vehicle from sensor field of view
US11062454B1 (en) * 2019-04-16 2021-07-13 Zoox, Inc. Multi-modal sensor data association architecture
US11852746B2 (en) * 2019-10-07 2023-12-26 Metawave Corporation Multi-sensor fusion platform for bootstrapping the training of a beam steering radar
CN111027401B (en) * 2019-11-15 2022-05-03 电子科技大学 End-to-end target detection method with integration of camera and laser radar
US11625839B2 (en) * 2020-05-18 2023-04-11 Toyota Research Institute, Inc. Bird's eye view based velocity estimation via self-supervised learning

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180314253A1 (en) * 2017-05-01 2018-11-01 Mentor Graphics Development (Deutschland) Gmbh Embedded automotive perception with machine learning classification of sensor data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023158706A1 (en) * 2022-02-15 2023-08-24 Waymo Llc End-to-end processing in automated driving systems
EP4361676A1 (en) * 2022-10-28 2024-05-01 Aptiv Technologies AG Methods and systems for determining a property of an object

Also Published As

Publication number Publication date
US20210397907A1 (en) 2021-12-23
CN113888458A (en) 2022-01-04

Similar Documents

Publication Publication Date Title
EP3926360A1 (en) Neural network based methods and systems for object detection using concatenated lidar, radar and camera data sets
US20210058608A1 (en) Method and apparatus for generating three-dimensional (3d) road model
US10217007B2 (en) Detecting method and device of obstacles based on disparity map and automobile driving assistance system
CA2678156C (en) Measurement apparatus, measurement method, and feature identification apparatus
WO2020104423A1 (en) Method and apparatus for data fusion of lidar data and image data
CN108764187A (en) Extract method, apparatus, equipment, storage medium and the acquisition entity of lane line
CN113160068B (en) Point cloud completion method and system based on image
EP3942794B1 (en) Depth-guided video inpainting for autonomous driving
CN115797454B (en) Multi-camera fusion sensing method and device under bird's eye view angle
KR101086274B1 (en) Apparatus and method for extracting depth information
CN116543361A (en) Multi-mode fusion sensing method and device for vehicle, vehicle and storage medium
CN115953563A (en) Three-dimensional model completion repairing method and system based on point cloud vectorization framework matching
JP2022513830A (en) How to detect and model an object on the surface of a road
JP2009092551A (en) Method, apparatus and system for measuring obstacle
US20210407117A1 (en) System and method for self-supervised monocular ground-plane extraction
WO2020118623A1 (en) Method and system for generating an environment model for positioning
CN116630528A (en) Static scene reconstruction method based on neural network
KR102641108B1 (en) Apparatus and Method for Completing Depth Map
CN114359891A (en) Three-dimensional vehicle detection method, system, device and medium
CN104637043A (en) Supporting pixel selection method and device and parallax determination method
Kang et al. 3D urban reconstruction from wide area aerial surveillance video
CN115236672A (en) Obstacle information generation method, device, equipment and computer readable storage medium
EP4047516A1 (en) Methods and systems for determining a distance of an object
EP4379605A1 (en) Method for detection of map deviations, system and vehicle
EP4379321A1 (en) Method for detection of map deviations, system and vehicle

Legal Events

Date Code Title Description
PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE APPLICATION HAS BEEN PUBLISHED

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

B565 Issuance of search results under rule 164(2) epc

Effective date: 20210201

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20220622

RBV Designated contracting states (corrected)

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

RAP3 Party data changed (applicant data changed or rights of an application transferred)

Owner name: APTIV TECHNOLOGIES LIMITED

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20230929

RAP1 Party data changed (applicant data changed or rights of an application transferred)

Owner name: APTIV TECHNOLOGIES AG