WO2021056278A1 - Systems and methods for evaluating three-dimensional (3-d) map constructed based on sensor data - Google Patents

Systems and methods for evaluating three-dimensional (3-d) map constructed based on sensor data Download PDF

Info

Publication number
WO2021056278A1
WO2021056278A1 PCT/CN2019/107910 CN2019107910W WO2021056278A1 WO 2021056278 A1 WO2021056278 A1 WO 2021056278A1 CN 2019107910 W CN2019107910 W CN 2019107910W WO 2021056278 A1 WO2021056278 A1 WO 2021056278A1
Authority
WO
WIPO (PCT)
Prior art keywords
map
determining
target landmark
image
sensor data
Prior art date
Application number
PCT/CN2019/107910
Other languages
French (fr)
Inventor
Minkang WANG
Original Assignee
Beijing Didi Infinity Technology And Development Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Didi Infinity Technology And Development Co., Ltd. filed Critical Beijing Didi Infinity Technology And Development Co., Ltd.
Priority to PCT/CN2019/107910 priority Critical patent/WO2021056278A1/en
Publication of WO2021056278A1 publication Critical patent/WO2021056278A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/05Geographic models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/98Detection or correction of errors, e.g. by rescanning the pattern or by human intervention; Evaluation of the quality of the acquired patterns
    • G06V10/993Evaluation of the quality of the acquired pattern

Definitions

  • the present disclosure relates to systems and methods for evaluating a three-dimensional (3-D) map, and more particularly to, systems and methods for evaluating a 3-D map constructed based on sensor data using object identifications and geometric modeling.
  • 3-D high-definition (HD) maps are widely used, e.g., to aid autonomous driving.
  • 3-D HD maps provide 3-D HD geometric information of the roads and surroundings to autonomous driving vehicles.
  • the quality of the map needs to be evaluated. Based on the evaluation, one may make quality improvement to the map.
  • Maps are evaluated by comparing benchmark measurement results of a real-world object with measurements of the corresponding object in the 3-D map.
  • Existing map quality evaluating methods rely on manual identification of objects from 3-D maps, e.g., by an operator, before they can be verified against the benchmark measurements. This manual process is time consuming, inefficient and is also inaccurate due to errors caused by the subjective observations and measurements.
  • Embodiments of the disclosure address the above problems by providing methods and systems for evaluating a three-dimensional (3-D) map based on sensor data using automatic object identifications and geometric modeling.
  • Embodiments of the disclosure provide a method for evaluating a 3-D map constructed based on sensor data.
  • An exemplary method may include receiving the 3-D map and measurement data of a target landmark by a communication interface.
  • the method may also include identifying an object in the 3-D map corresponding to the target landmark and determining a bounding box of the object based on a learning model by at least one processor.
  • the method may further include determining a geometric representation of the object based on a subset of the sensor data extracted corresponding to the bounding box and determining parameters associated with the geometric representation by the at least one processor.
  • the method may include evaluating the 3-D map by comparing the measurement data of the target landmark and the determined parameters by the at least one processor.
  • Embodiments of the disclosure also provide a system for evaluating a 3-D map constructed based on sensor data.
  • An exemplary system may include a communication interface configured to receive the 3-D map and the measurement data of a target landmark and a storage configured to store the 3-D map and measurement data.
  • the system may also include at least one processor coupled to the storage.
  • the at least one processor may be configured to identify an object in the 3-D map corresponding to the target landmark and determine a bounding box of the object based on a learning model.
  • the at least one processor may be further configured to determine a geometric representation of the object based on a subset of the sensor data extracted corresponding to the bounding box and determine parameters associated with the geometric representation.
  • the at least one processor may be configured to evaluate the 3-D map by comparing the measurement data of the target landmark and the determined parameters.
  • Embodiments of the disclosure further provide a non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform a method for evaluating a 3-D map constructed based on sensor data.
  • the method may include receiving the 3-D map and measurement data of a target landmark.
  • the method may also include identifying an object in the 3-D map corresponding to the target landmark and determining a bounding box of the object based on a learning model.
  • the method may further include determining a geometric representation of the object based on a subset of the sensor data extracted corresponding to the bounding box and determining parameters associated with the geometric representation.
  • the method may include evaluating the 3-D map by comparing the measurement data of the target landmark and the determined parameters.
  • FIG. 1 illustrates a schematic diagram of an exemplary 3-D map evaluation system, according to embodiments of the disclosure.
  • FIG. 2 illustrates a block diagram of an exemplary 3-D map evaluation device, according to embodiments of the disclosure.
  • FIG. 3 illustrates a flowchart of an exemplary method for 3-D map evaluation, according to embodiments of the disclosure.
  • FIG. 4. illustrates a schematic diagram of an exemplary faster R-CNN model, according to embodiments of the disclosure.
  • FIG. 1 illustrates a schematic diagram of an exemplary 3-D map evaluation system (referred to as “3-D map evaluation system 100” ) , according to embodiments of the disclosure.
  • 3-D map evaluation system 100 is configured to evaluate 3-D maps constructed from sensor data acquired by sensor 160 (e.g., 3-D map 104) based on a Faster R-CNN model 105 trained using sample 3-D maps, bounding boxes and corresponding landmarks within the 3-D maps (e.g., included in training data 101) and measurement data 102.
  • 3-D map evaluation system 100 may include components shown in FIG. 1, including a training database 140, a model training device 120, a 3-D map evaluation device 110, a database/repository 150, a display device 130, a sensor 160 and a network 170 to facilitate communications among the various components. It is to be contemplated that 3-D map evaluation system 100 may include more or less components compared to those shown in FIG. 1.
  • 3-D map evaluation system 100 may perform two stages: a landmark object identification model training stage and a 3-D map evaluation stage applying the trained model.
  • 3-D map evaluation system 100 may include training database 140 and model training device 120.
  • 3-D map evaluation process to obtain a 3-D map evaluation result 107
  • 3-D map evaluation system 100 may include 3-D map evaluation device 110 and database/repository 150.
  • 3-D map evaluation system 100 may also include display device 130 to display a 3-D map evaluation result 107.
  • 3-D map evaluation system 100 may include only 3-D map evaluation device 110, database/repository 150, and display device 130 to perform 3-D map evaluation related functions.
  • 3-D map evaluation system 100 may optionally include network 170 to facilitate the communication among the various components of 3-D map evaluation system 100, such as databases 140 and 150, devices 110 and 120, and sensor 160.
  • network 170 may be a local area network (LAN) , a wireless network, a cloud computing environment (e.g., software as a service, platform as a service, infrastructure as a service) , a client-server, a wide area network (WAN) , etc.
  • LAN local area network
  • cloud computing environment e.g., software as a service, platform as a service, infrastructure as a service
  • WAN wide area network
  • network 170 may be replaced by wired data communication systems or devices.
  • the various components of 3-D map evaluation system 100 may be remote from each other or in different locations and be connected through network 170 as shown in FIG. 1.
  • certain components of 3-D map evaluation system 100 may be located on the same site or inside one device.
  • training database 140 may be located on-site with or be part of model training device 120.
  • model training device 120 and 3-D map evaluation device 110 may be inside the same computer or processing device.
  • 3-D map evaluation system 100 may store 3-D maps.
  • sample 3-D maps as part of training data 101 may be stored in training database 140 and 3-D maps 104 to be evaluated may be stored in database/repository 150.
  • 3-D maps may be constructed based on sensor data received from sensors (e.g., sensor 160) .
  • sensor data may be point cloud data acquired by a LiDAR.
  • sensor 160 may be a LiDAR scanner configured to scan the surrounding and acquire point clouds. LiDAR measures distance to a target by illuminating the target with pulsed laser light and measuring the reflected pulses with the sensor.
  • 3-D maps may be constructed based on the 3-D representation of targets made by calculating the differences in laser return times and wavelengths.
  • training database 140 may store training data 101, which includes sample 3-D maps, known bounding boxes of pre-identified corresponding landmark objects within the 3-D maps.
  • the known bounding boxes of the landmarks included in the 3-D maps may be benchmark extraction made by operators based on the sample 3-D maps and the corresponding landmarks.
  • Sample 3-D maps, bounding boxes and the corresponding landmarks within the 3-D maps may be stored in pairs in training database 140 as training data 101.
  • sensor 160 or a separate sensor may also acquire ground truths (e.g., measurement data 102) of the landmarks included by the constructed 3-D maps.
  • measurement data 102 of a landmark may be the height, width and length of the landmark.
  • an operator may manually measure the landmark and obtain measurement data 102.
  • measurement data 102 may be determined from images acquired by sensors such as a monocular or binocular camera.
  • measurement data 102 may be measured multiple times (e.g., 5 or 10 times under the same measurement condition) to reduce the observational error.
  • Measurement data 102 associated with the location information of the landmark e.g., GPS data of the landmark
  • the model training process is performed by model training device 120.
  • “training” a learning model refers to determining one or more parameters of at least one layer in the learning model.
  • a convolutional layer of a CNN model may include at least one filter or kernel.
  • One or more parameters, such as kernel weights, size, shape, and structure, of the at least one filter may be determined by e.g., a backpropagation-based training process.
  • Faster R-CNN model 105 may be trained using supervised learning.
  • model training device 120 may communicate with training data base 140 to receive one or more set of training data 101.
  • Each set of training data 101 may include a sample 3-D map, bounding boxes of the landmark objects included in the 3-D map and the corresponding landmarks.
  • Model training device 120 may use training data 101 received from training database 140 to train a learning model, e.g., faster R-CNN model 105 (described in detail in connection with FIG. 4) .
  • Model training device 120 may be implemented with hardware specially programmed by software that performs the training process.
  • model training device 120 may include a processor and a non-transitory computer-readable medium. The processor may conduct the training by performing instructions of a training process stored in the computer-readable medium.
  • Model training device 120 may additionally include input and output interfaces to communicate with training database 140, network 170, and/or a user interface (not shown) .
  • the user interface may be used for selecting sets of training data, adjusting one or more parameters of the training process, selecting or modifying a framework of the learning model, and/or manually or semi-automatically providing bounding boxes of landmark objects included in the 3-D map.
  • faster R-CNN model 105 may further include a Convolutional neural network (CNN) sub-model to determine feature maps of the objects corresponding to the landmarks, a Region Proposal Network (RPN) sub-model to determine the bounding boxes of the objects corresponding to landmarks and a plurality of trained Support-Vector Machine (SVM) sub-models to determine if the bounding boxes includes the objects corresponding to the target landmarks.
  • CNN Convolutional neural network
  • RPN Region Proposal Network
  • SVM Support-Vector Machine
  • faster R-CNN model 105 may also include a linear regression sub-model to fine-tuning the determined bounding boxes.
  • the CNN sub-model may process data such as 2-D images determined from the 3-D map that includes the objects corresponding to landmarks and may use geometric layer labels associated with the 3-D map as input for the sub-model.
  • the architecture of the CNN sub-model includes a stack of distinct layers that transform the input into the output (e.g., object features of the objects corresponding to landmarks and/or feature maps of the objects) .
  • the output of the CNN sub-model may be used as input for the RPN sub-model to determine the bounding boxes of the objects corresponding to the landmarks.
  • the RPN sub-model may include a classifier and a regressor. For example, proposals of the bounding boxes may be generated based on the feature maps of the objects, the classifiers may determine the probability of the proposed bounding boxes having the target objects and the regressor may regress the coordinates of the proposed bounding boxes. Based on the classifier and the regressor, the RPN sub-model may determine candidate bounding boxes that include the objects corresponding to landmarks.
  • the trained SVMs may be used to determine if the candidate bounding boxes include the objects corresponding to the target landmarks.
  • each SVM may be trained by given a set of training examples (e.g., bounding boxes that include the objects corresponding to a target landmark and bounding boxes that do not include the objects corresponding to the target landmark) .
  • the SVM can determine whether the candidate bounding boxes include the objects corresponding to the target landmark.
  • the output of the SVM may be “Yes” (e.g., a probability higher than a threshold value) if the candidate bounding box includes the objects corresponding to target landmarks.
  • the bounding boxes may further be fine-tuned by a regressor.
  • each bonding box may be tuned to include just the object corresponding to the target landmark.
  • 3-D map evaluation device 110 may receive trained aster R-CNN model 105 from model training device 120.3-D map evaluation device 110 may include a processor and a non-transitory computer-readable medium (not shown) .
  • the processor may perform instructions of a 3-D map evaluation process stored in the medium.
  • 3-D map evaluation device 110 may additionally include input and output interfaces to communicate with database/repository 150, sensor 160, network 170 and/or a user interface of display device 130.
  • the input interface may be used for selecting a 3-D map for evaluation or initiating the evaluation process.
  • the output interface may be used for providing a 3-D map evaluation result 107.
  • Display 130 may include a display such as a Liquid Crystal Display (LCD) , a Light Emitting Diode Display (LED) , a plasma display, or any other type of display, and provide a Graphical User Interface (GUI) presented on the display for user input and data depiction.
  • the display may include a number of different types of materials, such as plastic or glass, and may be touch-sensitive to receive inputs from the user.
  • the display may include a touch-sensitive material that is substantially rigid, such as Gorilla Glass TM , or substantially pliable, such as Willow Glass TM .
  • display 130 may be part of 3-D map evaluation device 110.
  • FIG. 2 illustrates a block diagram of an exemplary 3-D map evaluation device 110, according to embodiments of the disclosure.
  • 3-D map evaluation device 110 may include a communication interface 202, a processor 204, a memory 206, and a storage 208.
  • 3-D map evaluation device 110 may have different modules in a signal device, such as an integrated circuit (IC) chip (e.g., implemented as an application-specific integrated circuit (ASIC) or a field-programmable gate array (FPGA) ) , or separate devices with dedicated functions.
  • IC integrated circuit
  • ASIC application-specific integrated circuit
  • FPGA field-programmable gate array
  • one or more components of 3-D map evaluation device 110 may be located in a cloud or may be alternatively in a single location (such as inside a mobile device) or distributed locations.
  • 3-D map evaluation device 110 may be in an integrated device or distributed at different locations but communicate with each other through a network (not shown) . Consistent with the president disclosure, 3-D map evaluation device 110 may be configured to evaluate 3-D maps 104 from database/repository 150 against measurement data 102 received from database/repository 150 and faster R-CNN model 105 trained in model training device 120.
  • Communication interface 202 may send data to and receive data from components such as database/repository 150, sensor 160, model training device 120 and display device 130 via communication cables, a Wireless Local Area Network (WLAN) , a Wide Area Network (WAN) , wireless networks such as radio waves, a cellular network, and/or a local or short-range wireless network (e.g., Bluetooth TM ) , or other communication methods.
  • communication interface 202 may include an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection.
  • ISDN integrated services digital network
  • communication interface 202 may include a local area network (LAN) card to provide a data communication connection to a compatible LAN.
  • Wireless links can also be implemented by communication interface 202.
  • communication interface 202 can send and receive electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
  • communication interface 202 may receive faster R-CNN model 105 from model training device 120, 3-D maps 104 and measurement data 102 from database/repository 150. Communication interface 202 may further provide 3-D map 104, measurement data 102, and Raster R-CNN model 105 to memory 206 and/or storage 208 for storage or to processor 204 for processing.
  • Processor 204 may include any appropriate type of general-purpose or special-purpose microprocessor, digital signal processor, or microcontroller. Processor 204 may be configured as a separate processor module dedicated to evaluating 3-D maps using a learning model. Alternatively, processor 204 may be configured as a shared processor module for performing other functions in addition to 3-D map evaluation.
  • Memory 206 and storage 208 may include any appropriate type of mass storage provided to store any type of information that processor 204 may need to operate.
  • Memory 206 and storage 208 may be a volatile or non-volatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other type of storage device or tangible (i.e., non-transitory) computer-readable medium including, but not limited to, a ROM, a flash memory, a dynamic RAM, and a static RAM.
  • Memory 206 and/or storage 208 may be configured to store one or more computer programs that may be executed by processor 204 to perform functions disclosed herein.
  • memory 206 and/or storage 208 may be configured to store program (s) that may be executed by processor 204 to evaluate 3-D maps 104 based on faster R-CNN model 105.
  • memory 206 and/or storage 208 may also store intermediate data such as the object features of target landmarks, feature maps output by layers of the learning model, bounding boxes, parameters of the landmark objects, etc.
  • Memory 206 and/or storage 208 may additionally store various learning models including their model parameters, such as faster R-CNN model 105, etc.
  • processor 204 may include multiple modules, such as a 2-D image determination unit 240, an object identification unit 242, a geometric representation construction unit 244 and a map evaluation unit 246, and the like. These modules (and any corresponding sub-modules or sub-units) can be hardware units (e.g., portions of an integrated circuit) of processor 204 designed for use with other components or software units implemented by processor 204 through executing at least part of a program.
  • the program may be stored on a computer-readable medium, and when executed by processor 204, it may perform one or more functions.
  • FIG. 2 shows units 240-246 all within one processor 204, it is contemplated that these units may be distributed among different processors located closely or remotely with each other.
  • FIG. 3 illustrates a flowchart of an exemplary method 300 for 3-D map evaluation based on faster R-CNN model 105, according to embodiments of the disclosure.
  • Method 300 may be implemented by 3-D map evaluation device 110 and particularly processor 204 or a separate processor not shown in FIG. 2.
  • Method 300 may include steps S302-S314 as described below. It is to be appreciated that some of the steps may be performed simultaneously, or in a different order than shown in FIG. 3.
  • communication interface 202 may receive 3-D map 104 including target landmarks from database/repository 150 and ground truth data of the target landmarks (e.g., measurement data 102) from database/repository 150.
  • the target landmarks may be ground marks (e.g., traffic lanes and/or crosswalk lanes) and/or standing marks (e.g., traffic lights and/or traffic signs) .
  • the ground truth data may be the geometric measurements of the target landmark (e.g., the length, the width and the height of the target landmark) .
  • 2-D image determination unit 240 may determine 2-D images from 3-D map 104.
  • 2-D image determination unit 240 may determine an object in 3-D map 104 that corresponds to the target landmark and may determine 2-D images from 3-D map 104 that contain the object.
  • the target landmark is a ground mark such as a traffic lane
  • 2-D image determination unit 240 may project 3-D map 104 into 2-D data, determine a position corresponding to the target landmark in the 2-D data, and extract the 2-D image surrounding the position which includes the object.
  • 2-D image determination unit 240 may use a a pillar filtering method to filter 3-D map 104 where points within a same 2-D region may be treated as a “pillar. ” The points within each pillar may be then clustered according to constrains on distance. The lowest cluster point may be used as the height and color information of the 2-D image.
  • 2-D image determination unit 240 may first project the top view of the sensor data (e.g., point cloud data) corresponding to the target landmark within 3-D map 104 to obtain a first image.
  • 2-D image determination unit 240 may then map the height information of the sensor data corresponding to the object to a second image.
  • the 2-D image may be then determined by combining the first and the second image together.
  • object identification unit 242 may identify objects corresponding to landmarks from the 2-D images and determine a bounding box of the objects based on faster R-CNN model 105.
  • object identification unit 242 may use faster R-CNN model 105 that includes a CNN sub-model and an RPN sub-model to determine bounding boxes of the target landmarks.
  • object identification unit 242 may label different geometric layers within the images determined in step S306, use the different geometric layer labels as input and determine a feature map of the images using the CNN sub-model.
  • Object identification unit 242 may then use the feature map as input to the RPN sub-model to determine features of candidate bounding boxes of the object.
  • object identification unit 242 may also use a trained SVM to determine if the object within the candidate bounding box corresponds to the target landmark.
  • the SVM may be trained by given a set of training examples (e.g., bounding boxes that include the objects corresponding to target landmarks and bounding boxes that do not include the objects corresponding to target landmarks) and when the candidate bounding boxes are used as input of the SVM, the SVM can determine whether the bounding boxes include the objects corresponding to target landmarks.
  • object identification unit 242 may further includes a regressor for fine-tuning the bounding boxes. For example, each bonding box may be tuned to include just the target object.
  • geometric representation construction unit 244 may construct geometric representations of the objects.
  • the geometric representations of the objects are determined based on the sensor data (e.g., point cloud data) within the bounding boxes.
  • geometric representation construction unit 244 may extract the sensor data within the bounding boxes and use a Random Sample Consensus (RANSAC) algorithm to plane fit the sensor data.
  • Geometric representation construction unit 244 may project all the sensor data to the fitted plane based on the plane function of the fitted plane.
  • Geometric representation construction unit 244 may further transit all the sensor data to a horizontal plane based on a transfer function between the fitted plane and the horizontal plane. Then geometric representation construction unit 244 may then use an Otsu’s method to filter the sensor data that does not correspond to the target landmarks.
  • geometric representation construction unit 244 may construct a geometric representation of the object corresponding to target landmark and the geometric representation may be selected from shapes such as rectangle, circle, triangle, etc.
  • geometric representation construction unit 244 may also establish constrains to optimize the geometric representation construction.
  • the constrains may include: all sensor data needs to be within the boundary of the geometric representation; the boundary of the geometric representation needs to be as close to the inner points of the representation as possible; the central point of the sensor data of the target landmarks should be at the same distance to all the boundary of the geometric representation; no point should be negative, etc.
  • geometric representation construction unit 244 may also apply a relaxation coefficient to the geometric representation construction so that noise may be partially tolerated outside the boundary of the geometric representation. For example, when calculating the first constrain, all sensor data needs to be within the boundary of the geometric representation. Any point with a positive relaxation will be defined as noise.
  • map evaluation unit 246 may determine parameters of the geometric representation of the object and evaluate 3-D map 104 based on the parameters and corresponding measurement data 102.
  • the parameters may include the width, the length and the height of the geometric representation.
  • map evaluation unit 246 may evaluate 3-D map 104 by calculating a difference between the parameters and the corresponding ground truths (e.g., measurement data 102 of the target landmark) .
  • FIG. 4. illustrates a schematic diagram of an exemplary faster R-CNN model 105, according to embodiments of the disclosure.
  • faster R-CNN model 105 may further include a Convolutional neural network (CNN) sub-model 410 to determine feature maps of the objects corresponding to the target landmarks, a Region Proposal Network (RPN) sub-model 420 to determine the candidate bounding boxes of the objects corresponding to the target landmarks and at least one trained Support-Vector Machine (SVM) 430 to determine if the bounding boxes includes the objects corresponding to the target landmarks.
  • CNN Convolutional neural network
  • RPN Region Proposal Network
  • SVM Support-Vector Machine
  • faster R-CNN model 105 may also include a linear regression sub-model 440 to fine-tune the determined bounding boxes.
  • CNN sub-model 410 may determine feature maps of the input 2-D image provided by 2-D image determination unit 240. For example, CNN sub-model 410 may process 2-D images determined from the 3-D map that includes objects corresponding to the target landmarks and may use the geometric layer labels associated with the 3-D map as an input. In some embodiment, feature map of each objects may be determined as output of CNN sub-model 410.
  • RPN sub-model 420 may determine candidate bounding boxes of the objects corresponding to the target landmarks based on the feature maps determined by CNN sub-model 410.
  • RPN sub-model 420 may include a classifier and a regressor. Proposals of the bounding boxes may be generated based on the feature maps of the objects.
  • the classifiers may determine the probability of the proposed bounding boxes having the target object and the regressor may regress the coordinates of the proposed bounding boxes.
  • RPN sub-model 420 may determine the bounding boxes that include objects corresponding to landmarks based on the classifier and the regressor.
  • SVM 430 may match the candidate bounding boxes with the target landmarks by determining if probabilities that the respective candidate bounding boxes include the objects corresponding to target landmarks exceed a threshold value.
  • SVM 430 may be trained by a given set of training examples (e.g., bounding boxes that include an object corresponding to a target landmark and bounding boxes that do not include the object corresponding to the target landmark) to determine if the candidate bounding boxes include the object.
  • one SVM is trained for each target landmark.
  • the candidate bounding boxes may be used as input to the SVM.
  • the candidate bounding boxes may be used as the bounding box for the corresponding target landmark.
  • faster R-CNN model 105 may further include linear regression model 440 for fine-tuning the bounding boxes using a linear regressor.
  • linear regression model 440 for fine-tuning the bounding boxes using a linear regressor.
  • each bonding box may be tuned using linear regression model 440 to include just the target landmark.
  • faster R-CNN model 105 may be trained by model training device 120 by determining one or more parameters of at least one layer in any sub-model of the learning model and may be transmitted to 3-D map evaluation device 110 for performing 3-D map evaluation related functions.
  • a convolutional layer of a CNN model may include at least one filter or kernel.
  • One or more parameters, such as kernel weights, size, shape, and structure, of the at least one filter may be determined by e.g., a backpropagation-based training process.
  • faster R-CNN model 105 may be trained using supervised learning.
  • the computer-readable medium may include volatile or non-volatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other types of computer-readable medium or computer-readable storage devices.
  • the computer-readable medium may be the storage device or the memory module having the computer instructions stored thereon, as disclosed.
  • the computer-readable medium may be a disc or a flash drive having the computer instructions stored thereon.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Geometry (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Remote Sensing (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Graphics (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

Systems and methods for evaluating a three-dimensional (3-D) map constructed based on sensor data are provided. An exemplary method may include receiving the 3-D map and measurement data of a target landmark by a communication interface. The method may also include identifying an object in the 3-D map corresponding to the target landmark and determining a bounding box of the object based on a learning model by at least one processor. The method may further include determining a geometric representation of the object based on a subset of the sensor data extracted corresponding to the bounding box and determining parameters associated with the geometric representation by the at least one processor. Moreover, the method may include evaluating the 3-D map by comparing the measurement data of the target landmark and the determined parameters by the at least one processor.

Description

[Title established by the ISA under Rule 37.2] SYSTEMS AND METHODS FOR EVALUATING THREE-DIMENSIONAL (3-D) MAP CONSTRUCTED BASED ON SENSOR DATA TECHNICAL FIELD
The present disclosure relates to systems and methods for evaluating a three-dimensional (3-D) map, and more particularly to, systems and methods for evaluating a 3-D map constructed based on sensor data using object identifications and geometric modeling.
BACKGROUND
3-D high-definition (HD) maps are widely used, e.g., to aid autonomous driving. For example, 3-D HD maps provide 3-D HD geometric information of the roads and surroundings to autonomous driving vehicles. In order to provide accurate positioning information, the quality of the map needs to be evaluated. Based on the evaluation, one may make quality improvement to the map.
Maps are evaluated by comparing benchmark measurement results of a real-world object with measurements of the corresponding object in the 3-D map. Existing map quality evaluating methods rely on manual identification of objects from 3-D maps, e.g., by an operator, before they can be verified against the benchmark measurements. This manual process is time consuming, inefficient and is also inaccurate due to errors caused by the subjective observations and measurements.
Embodiments of the disclosure address the above problems by providing methods and systems for evaluating a three-dimensional (3-D) map based on sensor data using automatic object identifications and geometric modeling.
SUMMARY
Embodiments of the disclosure provide a method for evaluating a 3-D map constructed based on sensor data. An exemplary method may include receiving the 3-D map and measurement data of a target landmark by a communication interface. The method may also include identifying an object in the 3-D map corresponding to the target landmark and determining a bounding box of the object based on a learning model by at least one processor. The method may further include determining a geometric representation of the object based on a subset of the sensor data extracted corresponding to the bounding box and determining parameters associated with the geometric representation by the at least one processor. Moreover,  the method may include evaluating the 3-D map by comparing the measurement data of the target landmark and the determined parameters by the at least one processor.
Embodiments of the disclosure also provide a system for evaluating a 3-D map constructed based on sensor data. An exemplary system may include a communication interface configured to receive the 3-D map and the measurement data of a target landmark and a storage configured to store the 3-D map and measurement data. The system may also include at least one processor coupled to the storage. The at least one processor may be configured to identify an object in the 3-D map corresponding to the target landmark and determine a bounding box of the object based on a learning model. The at least one processor may be further configured to determine a geometric representation of the object based on a subset of the sensor data extracted corresponding to the bounding box and determine parameters associated with the geometric representation. Moreover, the at least one processor may be configured to evaluate the 3-D map by comparing the measurement data of the target landmark and the determined parameters.
Embodiments of the disclosure further provide a non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform a method for evaluating a 3-D map constructed based on sensor data. The method may include receiving the 3-D map and measurement data of a target landmark. The method may also include identifying an object in the 3-D map corresponding to the target landmark and determining a bounding box of the object based on a learning model. The method may further include determining a geometric representation of the object based on a subset of the sensor data extracted corresponding to the bounding box and determining parameters associated with the geometric representation. Moreover, the method may include evaluating the 3-D map by comparing the measurement data of the target landmark and the determined parameters.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
BRIEF DESCRIPTION OF THE DRAWINGS
FIG. 1 illustrates a schematic diagram of an exemplary 3-D map evaluation system, according to embodiments of the disclosure.
FIG. 2 illustrates a block diagram of an exemplary 3-D map evaluation device, according to embodiments of the disclosure.
FIG. 3 illustrates a flowchart of an exemplary method for 3-D map evaluation, according to embodiments of the disclosure.
FIG. 4. illustrates a schematic diagram of an exemplary faster R-CNN model, according to embodiments of the disclosure.
DETAILED DESCRIPTION
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.
FIG. 1 illustrates a schematic diagram of an exemplary 3-D map evaluation system (referred to as “3-D map evaluation system 100” ) , according to embodiments of the disclosure.
Consistent with the present disclosure, 3-D map evaluation system 100 is configured to evaluate 3-D maps constructed from sensor data acquired by sensor 160 (e.g., 3-D map 104) based on a Faster R-CNN model 105 trained using sample 3-D maps, bounding boxes and corresponding landmarks within the 3-D maps (e.g., included in training data 101) and measurement data 102. In some embodiments, 3-D map evaluation system 100 may include components shown in FIG. 1, including a training database 140, a model training device 120, a 3-D map evaluation device 110, a database/repository 150, a display device 130, a sensor 160 and a network 170 to facilitate communications among the various components. It is to be contemplated that 3-D map evaluation system 100 may include more or less components compared to those shown in FIG. 1.
As shown in FIG. 1, 3-D map evaluation system 100 may perform two stages: a landmark object identification model training stage and a 3-D map evaluation stage applying the trained model. To perform the training stage to train a learning model such as faster R-CNN model 105, 3-D map evaluation system 100 may include training database 140 and model training device 120. To perform the 3-D map evaluation process to obtain a 3-D map evaluation result 107, 3-D map evaluation system 100 may include 3-D map evaluation device 110 and database/repository 150. In some embodiments, 3-D map evaluation system 100 may also include display device 130 to display a 3-D map evaluation result 107. In some embodiments, when a learning model (e.g., faster R-CNN model 105) is pre-trained for landmark object identification, 3-D map evaluation system 100 may include only 3-D map evaluation device 110, database/repository 150, and display device 130 to perform 3-D map evaluation related functions.
3-D map evaluation system 100 may optionally include network 170 to facilitate the communication among the various components of 3-D map evaluation system 100, such as  databases  140 and 150,  devices  110 and 120, and sensor 160. For example, network 170 may be a local area network (LAN) , a wireless network, a cloud computing environment (e.g., software as a service, platform as a service, infrastructure as a service) , a client-server, a wide area network (WAN) , etc. In some embodiments, network 170 may be replaced by wired data communication systems or devices.
In some embodiments, the various components of 3-D map evaluation system 100 may be remote from each other or in different locations and be connected through network 170 as shown in FIG. 1. In some alternative embodiments, certain components of 3-D map evaluation system 100 may be located on the same site or inside one device. For example, training database 140 may be located on-site with or be part of model training device 120. As another example, model training device 120 and 3-D map evaluation device 110 may be inside the same computer or processing device.
Consistent with the present disclosure, 3-D map evaluation system 100 may store 3-D maps. For example, sample 3-D maps as part of training data 101 may be stored in training database 140 and 3-D maps 104 to be evaluated may be stored in database/repository 150.
3-D maps may be constructed based on sensor data received from sensors (e.g., sensor 160) . In some embodiments, sensor data may be point cloud data acquired by a LiDAR. For example, sensor 160 may be a LiDAR scanner configured to scan the surrounding and acquire point clouds. LiDAR measures distance to a target by illuminating the target with pulsed laser light and measuring the reflected pulses with the sensor. 3-D maps may be constructed based on the 3-D representation of targets made by calculating the differences in laser return times and wavelengths.
In some embodiments, training database 140 may store training data 101, which includes sample 3-D maps, known bounding boxes of pre-identified corresponding landmark objects within the 3-D maps. The known bounding boxes of the landmarks included in the 3-D maps may be benchmark extraction made by operators based on the sample 3-D maps and the corresponding landmarks. Sample 3-D maps, bounding boxes and the corresponding landmarks within the 3-D maps may be stored in pairs in training database 140 as training data 101.
In some embodiments, sensor 160 or a separate sensor may also acquire ground truths (e.g., measurement data 102) of the landmarks included by the constructed 3-D maps. For example, measurement data 102 of a landmark may be the height, width and length of the landmark. In some embodiments, an operator may manually measure the landmark and obtain measurement data 102. Alternatively, measurement data 102 may be determined from images acquired by sensors such as a monocular or binocular camera. In some embodiments, measurement data 102 may be measured multiple times (e.g., 5 or 10 times under the same measurement condition) to reduce the observational error. Measurement data 102 associated with the location information of the landmark (e.g., GPS data of the landmark) may be stored in database/repository 150 and used for evaluating 3-D maps 104 in database/repository 150.
In some embodiments, the model training process is performed by model training device 120. As used herein, “training” a learning model refers to determining one or more parameters of at least one layer in the learning model. For example, a convolutional layer of a CNN model may include at least one filter or kernel. One or more parameters, such as kernel weights, size, shape, and structure, of the at least one filter may be determined by e.g., a backpropagation-based training process. Consistent with some embodiments, Faster R-CNN model 105 may be trained using supervised learning.
As show in FIG. 1, model training device 120 may communicate with training data base 140 to receive one or more set of training data 101. Each set of training data 101 may include a sample 3-D map, bounding boxes of the landmark objects included in the 3-D map and the corresponding landmarks. Model training device 120 may use training data 101 received from training database 140 to train a learning model, e.g., faster R-CNN model 105 (described in detail in connection with FIG. 4) . Model training device 120 may be implemented with hardware specially programmed by software that performs the training process. For example, model training device 120 may include a processor and a non-transitory computer-readable medium. The processor may conduct the training by performing instructions of a training process stored in the computer-readable medium. Model training device 120 may additionally include input and output interfaces to communicate with training database 140, network 170, and/or a user interface (not shown) . The user interface may be used for selecting sets of training data, adjusting one or more parameters of the training process, selecting or modifying a  framework of the learning model, and/or manually or semi-automatically providing bounding boxes of landmark objects included in the 3-D map.
Consistent with some embodiments, faster R-CNN model 105 may further include a Convolutional neural network (CNN) sub-model to determine feature maps of the objects corresponding to the landmarks, a Region Proposal Network (RPN) sub-model to determine the bounding boxes of the objects corresponding to landmarks and a plurality of trained Support-Vector Machine (SVM) sub-models to determine if the bounding boxes includes the objects corresponding to the target landmarks. In some embodiments, faster R-CNN model 105 may also include a linear regression sub-model to fine-tuning the determined bounding boxes.
In some embodiments, the CNN sub-model may process data such as 2-D images determined from the 3-D map that includes the objects corresponding to landmarks and may use geometric layer labels associated with the 3-D map as input for the sub-model. The architecture of the CNN sub-model includes a stack of distinct layers that transform the input into the output (e.g., object features of the objects corresponding to landmarks and/or feature maps of the objects) .
The output of the CNN sub-model may be used as input for the RPN sub-model to determine the bounding boxes of the objects corresponding to the landmarks. The RPN sub-model may include a classifier and a regressor. For example, proposals of the bounding boxes may be generated based on the feature maps of the objects, the classifiers may determine the probability of the proposed bounding boxes having the target objects and the regressor may regress the coordinates of the proposed bounding boxes. Based on the classifier and the regressor, the RPN sub-model may determine candidate bounding boxes that include the objects corresponding to landmarks.
The trained SVMs may be used to determine if the candidate bounding boxes include the objects corresponding to the target landmarks. For example, each SVM may be trained by given a set of training examples (e.g., bounding boxes that include the objects corresponding to a target landmark and bounding boxes that do not include the objects corresponding to the target landmark) . When the candidate bounding boxes are used as input of the SVM, the SVM can determine whether the candidate bounding boxes include the objects corresponding to the target landmark. For example, the output of the SVM may be “Yes” (e.g., a probability higher than a threshold value) if the candidate bounding box includes the objects corresponding to target  landmarks. In some embodiments, the bounding boxes may further be fine-tuned by a regressor. For example, each bonding box may be tuned to include just the object corresponding to the target landmark.
3-D map evaluation device 110 may receive trained aster R-CNN model 105 from model training device 120.3-D map evaluation device 110 may include a processor and a non-transitory computer-readable medium (not shown) . The processor may perform instructions of a 3-D map evaluation process stored in the medium. 3-D map evaluation device 110 may additionally include input and output interfaces to communicate with database/repository 150, sensor 160, network 170 and/or a user interface of display device 130. The input interface may be used for selecting a 3-D map for evaluation or initiating the evaluation process. The output interface may be used for providing a 3-D map evaluation result 107.
Display 130 may include a display such as a Liquid Crystal Display (LCD) , a Light Emitting Diode Display (LED) , a plasma display, or any other type of display, and provide a Graphical User Interface (GUI) presented on the display for user input and data depiction. The display may include a number of different types of materials, such as plastic or glass, and may be touch-sensitive to receive inputs from the user. For example, the display may include a touch-sensitive material that is substantially rigid, such as Gorilla Glass TM, or substantially pliable, such as Willow Glass TM. In some embodiments, display 130 may be part of 3-D map evaluation device 110.
FIG. 2 illustrates a block diagram of an exemplary 3-D map evaluation device 110, according to embodiments of the disclosure. In some embodiments, as shown in FIG. 2, 3-D map evaluation device 110 may include a communication interface 202, a processor 204, a memory 206, and a storage 208. In some embodiments, 3-D map evaluation device 110 may have different modules in a signal device, such as an integrated circuit (IC) chip (e.g., implemented as an application-specific integrated circuit (ASIC) or a field-programmable gate array (FPGA) ) , or separate devices with dedicated functions. In some embodiments, one or more components of 3-D map evaluation device 110 may be located in a cloud or may be alternatively in a single location (such as inside a mobile device) or distributed locations. Components of 3-D map evaluation device 110 may be in an integrated device or distributed at different locations but communicate with each other through a network (not shown) . Consistent with the president disclosure, 3-D map evaluation device 110 may be configured to evaluate 3-D maps 104 from  database/repository 150 against measurement data 102 received from database/repository 150 and faster R-CNN model 105 trained in model training device 120.
Communication interface 202 may send data to and receive data from components such as database/repository 150, sensor 160, model training device 120 and display device 130 via communication cables, a Wireless Local Area Network (WLAN) , a Wide Area Network (WAN) , wireless networks such as radio waves, a cellular network, and/or a local or short-range wireless network (e.g., Bluetooth TM) , or other communication methods. In some embodiments, communication interface 202 may include an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection. As another example, communication interface 202 may include a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links can also be implemented by communication interface 202. In such an implementation, communication interface 202 can send and receive electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.
Consistent with some embodiments, communication interface 202 may receive faster R-CNN model 105 from model training device 120, 3-D maps 104 and measurement data 102 from database/repository 150. Communication interface 202 may further provide 3-D map 104, measurement data 102, and Raster R-CNN model 105 to memory 206 and/or storage 208 for storage or to processor 204 for processing.
Processor 204 may include any appropriate type of general-purpose or special-purpose microprocessor, digital signal processor, or microcontroller. Processor 204 may be configured as a separate processor module dedicated to evaluating 3-D maps using a learning model. Alternatively, processor 204 may be configured as a shared processor module for performing other functions in addition to 3-D map evaluation.
Memory 206 and storage 208 may include any appropriate type of mass storage provided to store any type of information that processor 204 may need to operate. Memory 206 and storage 208 may be a volatile or non-volatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other type of storage device or tangible (i.e., non-transitory) computer-readable medium including, but not limited to, a ROM, a flash memory, a dynamic RAM, and a static RAM. Memory 206 and/or storage 208 may be configured to store one or more computer programs that may be executed by processor 204 to perform functions disclosed herein. For  example, memory 206 and/or storage 208 may be configured to store program (s) that may be executed by processor 204 to evaluate 3-D maps 104 based on faster R-CNN model 105.
In some embodiments, memory 206 and/or storage 208 may also store intermediate data such as the object features of target landmarks, feature maps output by layers of the learning model, bounding boxes, parameters of the landmark objects, etc. Memory 206 and/or storage 208 may additionally store various learning models including their model parameters, such as faster R-CNN model 105, etc.
As shown in FIG. 2, processor 204 may include multiple modules, such as a 2-D image determination unit 240, an object identification unit 242, a geometric representation construction unit 244 and a map evaluation unit 246, and the like. These modules (and any corresponding sub-modules or sub-units) can be hardware units (e.g., portions of an integrated circuit) of processor 204 designed for use with other components or software units implemented by processor 204 through executing at least part of a program. The program may be stored on a computer-readable medium, and when executed by processor 204, it may perform one or more functions. Although FIG. 2 shows units 240-246 all within one processor 204, it is contemplated that these units may be distributed among different processors located closely or remotely with each other.
In some embodiments, units 242-246 of FIG. 2 may execute computer instructions to perform the evaluation. For example, FIG. 3 illustrates a flowchart of an exemplary method 300 for 3-D map evaluation based on faster R-CNN model 105, according to embodiments of the disclosure. Method 300 may be implemented by 3-D map evaluation device 110 and particularly processor 204 or a separate processor not shown in FIG. 2. Method 300 may include steps S302-S314 as described below. It is to be appreciated that some of the steps may be performed simultaneously, or in a different order than shown in FIG. 3.
In step S302 and step S304, communication interface 202 may receive 3-D map 104 including target landmarks from database/repository 150 and ground truth data of the target landmarks (e.g., measurement data 102) from database/repository 150. In some embodiments the target landmarks may be ground marks (e.g., traffic lanes and/or crosswalk lanes) and/or standing marks (e.g., traffic lights and/or traffic signs) . In some embodiments, the ground truth data may be the geometric measurements of the target landmark (e.g., the length, the width and the height of the target landmark) .
In step S306, 2-D image determination unit 240 may determine 2-D images from 3-D map 104. In some embodiments, 2-D image determination unit 240 may determine an object in 3-D map 104 that corresponds to the target landmark and may determine 2-D images from 3-D map 104 that contain the object. In some embodiments, if the target landmark is a ground mark such as a traffic lane, 2-D image determination unit 240 may project 3-D map 104 into 2-D data, determine a position corresponding to the target landmark in the 2-D data, and extract the 2-D image surrounding the position which includes the object. For example, 2-D image determination unit 240 may use a a pillar filtering method to filter 3-D map 104 where points within a same 2-D region may be treated as a “pillar. ” The points within each pillar may be then clustered according to constrains on distance. The lowest cluster point may be used as the height and color information of the 2-D image.
In some other embodiments, if the target landmark is a standing mark such as a traffic sign, 2-D image determination unit 240 may first project the top view of the sensor data (e.g., point cloud data) corresponding to the target landmark within 3-D map 104 to obtain a first image. 2-D image determination unit 240 may then map the height information of the sensor data corresponding to the object to a second image. The 2-D image may be then determined by combining the first and the second image together.
In step S308, object identification unit 242 may identify objects corresponding to landmarks from the 2-D images and determine a bounding box of the objects based on faster R-CNN model 105. In some embodiments, object identification unit 242 may use faster R-CNN model 105 that includes a CNN sub-model and an RPN sub-model to determine bounding boxes of the target landmarks. For example, object identification unit 242 may label different geometric layers within the images determined in step S306, use the different geometric layer labels as input and determine a feature map of the images using the CNN sub-model. Object identification unit 242 may then use the feature map as input to the RPN sub-model to determine features of candidate bounding boxes of the object.
In some embodiments, object identification unit 242 may also use a trained SVM to determine if the object within the candidate bounding box corresponds to the target landmark. For example, the SVM may be trained by given a set of training examples (e.g., bounding boxes that include the objects corresponding to target landmarks and bounding boxes that do not include the objects corresponding to target landmarks) and when the candidate bounding boxes  are used as input of the SVM, the SVM can determine whether the bounding boxes include the objects corresponding to target landmarks. In some embodiments, object identification unit 242 may further includes a regressor for fine-tuning the bounding boxes. For example, each bonding box may be tuned to include just the target object.
In step S310, geometric representation construction unit 244 may construct geometric representations of the objects. In some embodiments, the geometric representations of the objects are determined based on the sensor data (e.g., point cloud data) within the bounding boxes. For example, geometric representation construction unit 244 may extract the sensor data within the bounding boxes and use a Random Sample Consensus (RANSAC) algorithm to plane fit the sensor data. Geometric representation construction unit 244 may project all the sensor data to the fitted plane based on the plane function of the fitted plane. Geometric representation construction unit 244 may further transit all the sensor data to a horizontal plane based on a transfer function between the fitted plane and the horizontal plane. Then geometric representation construction unit 244 may then use an Otsu’s method to filter the sensor data that does not correspond to the target landmarks.
In some embodiments, geometric representation construction unit 244 may construct a geometric representation of the object corresponding to target landmark and the geometric representation may be selected from shapes such as rectangle, circle, triangle, etc. In some embodiments, geometric representation construction unit 244 may also establish constrains to optimize the geometric representation construction. For example, the constrains may include: all sensor data needs to be within the boundary of the geometric representation; the boundary of the geometric representation needs to be as close to the inner points of the representation as possible; the central point of the sensor data of the target landmarks should be at the same distance to all the boundary of the geometric representation; no point should be negative, etc. In some embodiments, geometric representation construction unit 244 may also apply a relaxation coefficient to the geometric representation construction so that noise may be partially tolerated outside the boundary of the geometric representation. For example, when calculating the first constrain, all sensor data needs to be within the boundary of the geometric representation. Any point with a positive relaxation will be defined as noise.
In steps S312-S314, map evaluation unit 246 may determine parameters of the geometric representation of the object and evaluate 3-D map 104 based on the parameters and  corresponding measurement data 102. In some embodiments, the parameters may include the width, the length and the height of the geometric representation. For example, map evaluation unit 246 may evaluate 3-D map 104 by calculating a difference between the parameters and the corresponding ground truths (e.g., measurement data 102 of the target landmark) .
FIG. 4. illustrates a schematic diagram of an exemplary faster R-CNN model 105, according to embodiments of the disclosure. Consistent with some embodiments as shown in FIG. 4, faster R-CNN model 105 may further include a Convolutional neural network (CNN) sub-model 410 to determine feature maps of the objects corresponding to the target landmarks, a Region Proposal Network (RPN) sub-model 420 to determine the candidate bounding boxes of the objects corresponding to the target landmarks and at least one trained Support-Vector Machine (SVM) 430 to determine if the bounding boxes includes the objects corresponding to the target landmarks. In some embodiments, faster R-CNN model 105 may also include a linear regression sub-model 440 to fine-tune the determined bounding boxes.
CNN sub-model 410 may determine feature maps of the input 2-D image provided by 2-D image determination unit 240. For example, CNN sub-model 410 may process 2-D images determined from the 3-D map that includes objects corresponding to the target landmarks and may use the geometric layer labels associated with the 3-D map as an input. In some embodiment, feature map of each objects may be determined as output of CNN sub-model 410.
RPN sub-model 420 may determine candidate bounding boxes of the objects corresponding to the target landmarks based on the feature maps determined by CNN sub-model 410. For example, RPN sub-model 420 may include a classifier and a regressor. Proposals of the bounding boxes may be generated based on the feature maps of the objects. The classifiers may determine the probability of the proposed bounding boxes having the target object and the regressor may regress the coordinates of the proposed bounding boxes. Thus, RPN sub-model 420 may determine the bounding boxes that include objects corresponding to landmarks based on the classifier and the regressor.
SVM 430 may match the candidate bounding boxes with the target landmarks by determining if probabilities that the respective candidate bounding boxes include the objects corresponding to target landmarks exceed a threshold value. For example, SVM 430 may be trained by a given set of training examples (e.g., bounding boxes that include an object corresponding to a target landmark and bounding boxes that do not include the object  corresponding to the target landmark) to determine if the candidate bounding boxes include the object. In some embodiments, one SVM is trained for each target landmark. The candidate bounding boxes may be used as input to the SVM. In some embodiments, if the output is yes (e.g., a probability that exceeds a threshold value indicating the candidate bounding box includes the object) , the candidate bounding boxes may be used as the bounding box for the corresponding target landmark.
In some embodiments, faster R-CNN model 105 may further include linear regression model 440 for fine-tuning the bounding boxes using a linear regressor. For example, each bonding box may be tuned using linear regression model 440 to include just the target landmark.
In some embodiments, faster R-CNN model 105 may be trained by model training device 120 by determining one or more parameters of at least one layer in any sub-model of the learning model and may be transmitted to 3-D map evaluation device 110 for performing 3-D map evaluation related functions. For example, a convolutional layer of a CNN model may include at least one filter or kernel. One or more parameters, such as kernel weights, size, shape, and structure, of the at least one filter may be determined by e.g., a backpropagation-based training process. Consistent with some embodiments, faster R-CNN model 105 may be trained using supervised learning.
Another aspect of the disclosure is directed to a non-transitory computer-readable medium storing instructions which, when executed, cause one or more processors to perform the methods, as discussed above. The computer-readable medium may include volatile or non-volatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other types of computer-readable medium or computer-readable storage devices. For example, the computer-readable medium may be the storage device or the memory module having the computer instructions stored thereon, as disclosed. In some embodiments, the computer-readable medium may be a disc or a flash drive having the computer instructions stored thereon.
It will be apparent to those skilled in the art that various modifications and variations can be made to the disclosed system and related methods. Other embodiments will be apparent to those skilled in the art from consideration of the specification and practice of the disclosed system and related methods.
It is intended that the specification and examples be considered as exemplary only, with a true scope being indicated by the following claims and their equivalents.

Claims (20)

  1. A method for evaluating a three-dimensional (3-D) map constructed based on sensor data, comprising:
    receiving, by a communication interface, the 3-D map and measurement data of a target landmark;
    identifying, by at least one processor, an object in the 3-D map corresponding to the target landmark and determining a bounding box of the object based on a learning model;
    determining, by the at least one processor, a geometric representation of the object based on a subset of the sensor data extracted corresponding to the bounding box;
    determining, by the at least one processor, parameters associated with the geometric representation; and
    evaluating, by the at least one processor, the 3-D map by comparing the measurement data of the target landmark and the determined parameters.
  2. The method of claim 1, wherein the learning model is a faster R-CNN model.
  3. The method of claim 1, wherein identifying the object corresponding to the target landmark further comprises:
    determining a 2-D image from the 3-D map that contains the object;
    determining object features from the 2-D image; and
    matching the object with the target landmark based on the object features.
  4. The method of claim 3, wherein determining object features further comprises:
    determining a feature map by applying a Convolutional Neural Network (CNN) to the image; and
    determining the object features by applying a Region Proposal Network (RPN) to the feature map.
  5. The method of claim 3, wherein the object is matched with the target landmark using a Support-Vector Machine (SVM) .
  6. The method of claim 1, wherein the bounding box is determined using a linear regression model.
  7. The method of claim 1, wherein target landmark is a standing sign, and determining the 2-D image further comprises:
    projecting the top view of the sensor data corresponding to the object to obtain a first image;
    mapping height information of the sensor data corresponding the object to a second image; and
    determining the 2-D image based on the first image and the second image.
  8. The method of claim 1, wherein the target landmark is a ground mark, and determining the 2-D image further comprises:
    projecting the 3-D map into 2-D data;
    determining a position corresponding to the target landmark in the 2-D data; and
    extracting the 2-D image surrounding the position.
  9. The method of claim 1, wherein the sensor data is point cloud data acquired by a LiDAR.
  10. The method of claim 1, wherein determining the parameters further comprises:
    fitting the subset of sensor data into a plane;
    constructing the geometric representation for the object in the plane; and
    determining parameters of the geometric representation based on a set of constraints.
  11. The method of claim 2, wherein the faster R-CNN model is trained using the 3-D map and geometric layer labels associated with the 3-D map.
  12. The method of claim 2, wherein the faster R-CNN model is separately trained for ground marks and standing signs.
  13. The method of claim 1, further comprising associating the determined parameters with the target landmark.
  14. A system for evaluating a three-dimensional (3-D) map constructed based on sensor data, comprising:
    a communication interface configured to receive the 3-D map and measurement data of a target landmark;
    a storage configured to store the 3-D map and the measurement data; and
    at least one processor coupled to the storage and configured to:
    identify an object in the 3-D map corresponding to the target landmark and determine a bounding box of the object based on a learning model;
    determine a geometric representation of the object based on a subset of the sensor data extracted corresponding to the bounding box;
    determine parameters associated with the geometric representation; and
    evaluate the 3-D map by comparing the measurement data of the target landmark and the determined parameters.
  15. The system of claim 14, wherein the learning model is a faster R-CNN model.
  16. The system of claim 14, wherein to identify the object corresponding to the target landmark, the at least one processor is further configured to:
    determine a 2-D image from the 3-D map that contains the object;
    determine object features from the 2-D image; and
    match the object with the target landmark based on the object features.
  17. The system of claim 16, wherein to determine the object features, the at least one processor is further configured to:
    determine a feature map by applying a Convolutional Neural Network (CNN) to the image; and
    determine the object features by applying a Region Proposal Network (RPN) to the feature map.
  18. The method of claim 16, wherein the object is matched with the target landmark using a Support-Vector Machine (SVM) .
  19. A non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform a method for evaluating a three-dimensional (3-D) map constructed based on sensor data, comprising:
    receiving the 3-D map and measurement data of a target landmark;
    identifying an object in the 3-D map corresponding to the target landmark and determining a bounding box of the object based on a learning model;
    determining a geometric representation of the object based on a subset of the sensor data extracted corresponding to the bounding box;
    determining parameters associated with the geometric representation; and
    evaluating the 3-D map by comparing the measurement data of the target landmark and the determined parameters.
  20. The non-transitory computer-readable medium of claim 19, wherein the learning model is a faster R-CNN model.
PCT/CN2019/107910 2019-09-25 2019-09-25 Systems and methods for evaluating three-dimensional (3-d) map constructed based on sensor data WO2021056278A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/107910 WO2021056278A1 (en) 2019-09-25 2019-09-25 Systems and methods for evaluating three-dimensional (3-d) map constructed based on sensor data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/107910 WO2021056278A1 (en) 2019-09-25 2019-09-25 Systems and methods for evaluating three-dimensional (3-d) map constructed based on sensor data

Publications (1)

Publication Number Publication Date
WO2021056278A1 true WO2021056278A1 (en) 2021-04-01

Family

ID=75165011

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/107910 WO2021056278A1 (en) 2019-09-25 2019-09-25 Systems and methods for evaluating three-dimensional (3-d) map constructed based on sensor data

Country Status (1)

Country Link
WO (1) WO2021056278A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023111909A1 (en) * 2021-12-16 2023-06-22 Niantic International Technology Limited High-speed real-time scene reconstruction from input image data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008242590A (en) * 2007-03-26 2008-10-09 Fujitsu Ltd Three-dimensional internal space model generation method, apparatus and program
CN104574348A (en) * 2013-10-28 2015-04-29 南京财经大学 Three-dimensional scene similarity evaluation method based on attribute relation graph matching
US20150262398A1 (en) * 2014-03-17 2015-09-17 Apple Inc. System and method of tile management
CN107170033A (en) * 2017-04-12 2017-09-15 青岛市光电工程技术研究院 Smart city 3D live-action map systems based on laser radar technique
CN108399380A (en) * 2018-02-12 2018-08-14 北京工业大学 A kind of video actions detection method based on Three dimensional convolution and Faster RCNN

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008242590A (en) * 2007-03-26 2008-10-09 Fujitsu Ltd Three-dimensional internal space model generation method, apparatus and program
CN104574348A (en) * 2013-10-28 2015-04-29 南京财经大学 Three-dimensional scene similarity evaluation method based on attribute relation graph matching
US20150262398A1 (en) * 2014-03-17 2015-09-17 Apple Inc. System and method of tile management
CN107170033A (en) * 2017-04-12 2017-09-15 青岛市光电工程技术研究院 Smart city 3D live-action map systems based on laser radar technique
CN108399380A (en) * 2018-02-12 2018-08-14 北京工业大学 A kind of video actions detection method based on Three dimensional convolution and Faster RCNN

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023111909A1 (en) * 2021-12-16 2023-06-22 Niantic International Technology Limited High-speed real-time scene reconstruction from input image data

Similar Documents

Publication Publication Date Title
US11255973B2 (en) Method and apparatus for extracting lane line and computer readable storage medium
WO2022083402A1 (en) Obstacle detection method and apparatus, computer device, and storage medium
KR102210715B1 (en) Method, apparatus and device for determining lane lines in road
US11144786B2 (en) Information processing apparatus, method for controlling information processing apparatus, and storage medium
CN110148196B (en) Image processing method and device and related equipment
US11250288B2 (en) Information processing apparatus and information processing method using correlation between attributes
JP6678605B2 (en) Information processing apparatus, information processing method, and information processing program
EP3631494A1 (en) Integrated sensor calibration in natural scenes
US11204610B2 (en) Information processing apparatus, vehicle, and information processing method using correlation between attributes
US10996337B2 (en) Systems and methods for constructing a high-definition map based on landmarks
CN108594244B (en) Obstacle recognition transfer learning method based on stereoscopic vision and laser radar
CN114051628B (en) Method and device for determining target object point cloud set
JP7279848B2 (en) Image processing device, image processing method, and program
JP2023508705A (en) Data transmission method and device
CN113936198A (en) Low-beam laser radar and camera fusion method, storage medium and device
Duthon et al. Methodology used to evaluate computer vision algorithms in adverse weather conditions
CN113255578A (en) Traffic identification recognition method and device, electronic equipment and storage medium
WO2021056278A1 (en) Systems and methods for evaluating three-dimensional (3-d) map constructed based on sensor data
CN113988197A (en) Multi-camera and multi-laser radar based combined calibration and target fusion detection method
CN116430404A (en) Method and device for determining relative position, storage medium and electronic device
US20220404170A1 (en) Apparatus, method, and computer program for updating map
CN114611635B (en) Object identification method and device, storage medium and electronic device
WO2020113425A1 (en) Systems and methods for constructing high-definition map
CN114051627A (en) Camera calibration method
CN112466112A (en) Apparatus and method for generating vehicle data and system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19947118

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19947118

Country of ref document: EP

Kind code of ref document: A1