WO2021056278A1

WO2021056278A1 - Systems and methods for evaluating three-dimensional (3-d) map constructed based on sensor data

Info

Publication number: WO2021056278A1
Application number: PCT/CN2019/107910
Authority: WO
Inventors: Minkang WANG
Original assignee: Beijing Didi Infinity Technology And Development Co., Ltd.
Priority date: 2019-09-25
Filing date: 2019-09-25
Publication date: 2021-04-01

Abstract

Systems and methods for evaluating a three-dimensional (3-D) map constructed based on sensor data are provided. An exemplary method may include receiving the 3-D map and measurement data of a target landmark by a communication interface. The method may also include identifying an object in the 3-D map corresponding to the target landmark and determining a bounding box of the object based on a learning model by at least one processor. The method may further include determining a geometric representation of the object based on a subset of the sensor data extracted corresponding to the bounding box and determining parameters associated with the geometric representation by the at least one processor. Moreover, the method may include evaluating the 3-D map by comparing the measurement data of the target landmark and the determined parameters by the at least one processor.

Description

[Title established by the ISA under Rule 37.2] SYSTEMS AND METHODS FOR EVALUATING THREE-DIMENSIONAL (3-D) MAP CONSTRUCTED BASED ON SENSOR DATA

TECHNICAL FIELD

The present disclosure relates to systems and methods for evaluating a three-dimensional (3-D) map, and more particularly to, systems and methods for evaluating a 3-D map constructed based on sensor data using object identifications and geometric modeling.

BACKGROUND

3-D high-definition (HD) maps are widely used, e.g., to aid autonomous driving. For example, 3-D HD maps provide 3-D HD geometric information of the roads and surroundings to autonomous driving vehicles. In order to provide accurate positioning information, the quality of the map needs to be evaluated. Based on the evaluation, one may make quality improvement to the map.

Maps are evaluated by comparing benchmark measurement results of a real-world object with measurements of the corresponding object in the 3-D map. Existing map quality evaluating methods rely on manual identification of objects from 3-D maps, e.g., by an operator, before they can be verified against the benchmark measurements. This manual process is time consuming, inefficient and is also inaccurate due to errors caused by the subjective observations and measurements.

Embodiments of the disclosure address the above problems by providing methods and systems for evaluating a three-dimensional (3-D) map based on sensor data using automatic object identifications and geometric modeling.

SUMMARY

Embodiments of the disclosure provide a method for evaluating a 3-D map constructed based on sensor data. An exemplary method may include receiving the 3-D map and measurement data of a target landmark by a communication interface. The method may also include identifying an object in the 3-D map corresponding to the target landmark and determining a bounding box of the object based on a learning model by at least one processor. The method may further include determining a geometric representation of the object based on a subset of the sensor data extracted corresponding to the bounding box and determining parameters associated with the geometric representation by the at least one processor. Moreover, the method may include evaluating the 3-D map by comparing the measurement data of the target landmark and the determined parameters by the at least one processor.

Embodiments of the disclosure also provide a system for evaluating a 3-D map constructed based on sensor data. An exemplary system may include a communication interface configured to receive the 3-D map and the measurement data of a target landmark and a storage configured to store the 3-D map and measurement data. The system may also include at least one processor coupled to the storage. The at least one processor may be configured to identify an object in the 3-D map corresponding to the target landmark and determine a bounding box of the object based on a learning model. The at least one processor may be further configured to determine a geometric representation of the object based on a subset of the sensor data extracted corresponding to the bounding box and determine parameters associated with the geometric representation. Moreover, the at least one processor may be configured to evaluate the 3-D map by comparing the measurement data of the target landmark and the determined parameters.

Embodiments of the disclosure further provide a non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform a method for evaluating a 3-D map constructed based on sensor data. The method may include receiving the 3-D map and measurement data of a target landmark. The method may also include identifying an object in the 3-D map corresponding to the target landmark and determining a bounding box of the object based on a learning model. The method may further include determining a geometric representation of the object based on a subset of the sensor data extracted corresponding to the bounding box and determining parameters associated with the geometric representation. Moreover, the method may include evaluating the 3-D map by comparing the measurement data of the target landmark and the determined parameters.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates a schematic diagram of an exemplary 3-D map evaluation system, according to embodiments of the disclosure.

FIG. 2 illustrates a block diagram of an exemplary 3-D map evaluation device, according to embodiments of the disclosure.

FIG. 3 illustrates a flowchart of an exemplary method for 3-D map evaluation, according to embodiments of the disclosure.

FIG. 4. illustrates a schematic diagram of an exemplary faster R-CNN model, according to embodiments of the disclosure.

DETAILED DESCRIPTION

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. Wherever possible, the same reference numbers will be used throughout the drawings to refer to the same or like parts.

FIG. 1 illustrates a schematic diagram of an exemplary 3-D map evaluation system (referred to as “3-D map evaluation system 100” ) , according to embodiments of the disclosure.

Consistent with the present disclosure, 3-D map evaluation system 100 is configured to evaluate 3-D maps constructed from sensor data acquired by sensor 160 (e.g., 3-D map 104) based on a Faster R-CNN model 105 trained using sample 3-D maps, bounding boxes and corresponding landmarks within the 3-D maps (e.g., included in training data 101) and measurement data 102. In some embodiments, 3-D map evaluation system 100 may include components shown in FIG. 1, including a training database 140, a model training device 120, a 3-D map evaluation device 110, a database/repository 150, a display device 130, a sensor 160 and a network 170 to facilitate communications among the various components. It is to be contemplated that 3-D map evaluation system 100 may include more or less components compared to those shown in FIG. 1.

As shown in FIG. 1, 3-D map evaluation system 100 may perform two stages: a landmark object identification model training stage and a 3-D map evaluation stage applying the trained model. To perform the training stage to train a learning model such as faster R-CNN model 105, 3-D map evaluation system 100 may include training database 140 and model training device 120. To perform the 3-D map evaluation process to obtain a 3-D map evaluation result 107, 3-D map evaluation system 100 may include 3-D map evaluation device 110 and database/repository 150. In some embodiments, 3-D map evaluation system 100 may also include display device 130 to display a 3-D map evaluation result 107. In some embodiments, when a learning model (e.g., faster R-CNN model 105) is pre-trained for landmark object identification, 3-D map evaluation system 100 may include only 3-D map evaluation device 110, database/repository 150, and display device 130 to perform 3-D map evaluation related functions.

3-D map evaluation system 100 may optionally include network 170 to facilitate the communication among the various components of 3-D map evaluation system 100, such as

databases

140 and 150,

devices

110 and 120, and sensor 160. For example, network 170 may be a local area network (LAN) , a wireless network, a cloud computing environment (e.g., software as a service, platform as a service, infrastructure as a service) , a client-server, a wide area network (WAN) , etc. In some embodiments, network 170 may be replaced by wired data communication systems or devices.

In some embodiments, the various components of 3-D map evaluation system 100 may be remote from each other or in different locations and be connected through network 170 as shown in FIG. 1. In some alternative embodiments, certain components of 3-D map evaluation system 100 may be located on the same site or inside one device. For example, training database 140 may be located on-site with or be part of model training device 120. As another example, model training device 120 and 3-D map evaluation device 110 may be inside the same computer or processing device.

Consistent with the present disclosure, 3-D map evaluation system 100 may store 3-D maps. For example, sample 3-D maps as part of training data 101 may be stored in training database 140 and 3-D maps 104 to be evaluated may be stored in database/repository 150.

3-D maps may be constructed based on sensor data received from sensors (e.g., sensor 160) . In some embodiments, sensor data may be point cloud data acquired by a LiDAR. For example, sensor 160 may be a LiDAR scanner configured to scan the surrounding and acquire point clouds. LiDAR measures distance to a target by illuminating the target with pulsed laser light and measuring the reflected pulses with the sensor. 3-D maps may be constructed based on the 3-D representation of targets made by calculating the differences in laser return times and wavelengths.

In some embodiments, training database 140 may store training data 101, which includes sample 3-D maps, known bounding boxes of pre-identified corresponding landmark objects within the 3-D maps. The known bounding boxes of the landmarks included in the 3-D maps may be benchmark extraction made by operators based on the sample 3-D maps and the corresponding landmarks. Sample 3-D maps, bounding boxes and the corresponding landmarks within the 3-D maps may be stored in pairs in training database 140 as training data 101.

In some embodiments, sensor 160 or a separate sensor may also acquire ground truths (e.g., measurement data 102) of the landmarks included by the constructed 3-D maps. For example, measurement data 102 of a landmark may be the height, width and length of the landmark. In some embodiments, an operator may manually measure the landmark and obtain measurement data 102. Alternatively, measurement data 102 may be determined from images acquired by sensors such as a monocular or binocular camera. In some embodiments, measurement data 102 may be measured multiple times (e.g., 5 or 10 times under the same measurement condition) to reduce the observational error. Measurement data 102 associated with the location information of the landmark (e.g., GPS data of the landmark) may be stored in database/repository 150 and used for evaluating 3-D maps 104 in database/repository 150.

In some embodiments, the model training process is performed by model training device 120. As used herein, “training” a learning model refers to determining one or more parameters of at least one layer in the learning model. For example, a convolutional layer of a CNN model may include at least one filter or kernel. One or more parameters, such as kernel weights, size, shape, and structure, of the at least one filter may be determined by e.g., a backpropagation-based training process. Consistent with some embodiments, Faster R-CNN model 105 may be trained using supervised learning.

As show in FIG. 1, model training device 120 may communicate with training data base 140 to receive one or more set of training data 101. Each set of training data 101 may include a sample 3-D map, bounding boxes of the landmark objects included in the 3-D map and the corresponding landmarks. Model training device 120 may use training data 101 received from training database 140 to train a learning model, e.g., faster R-CNN model 105 (described in detail in connection with FIG. 4) . Model training device 120 may be implemented with hardware specially programmed by software that performs the training process. For example, model training device 120 may include a processor and a non-transitory computer-readable medium. The processor may conduct the training by performing instructions of a training process stored in the computer-readable medium. Model training device 120 may additionally include input and output interfaces to communicate with training database 140, network 170, and/or a user interface (not shown) . The user interface may be used for selecting sets of training data, adjusting one or more parameters of the training process, selecting or modifying a framework of the learning model, and/or manually or semi-automatically providing bounding boxes of landmark objects included in the 3-D map.

Consistent with some embodiments, faster R-CNN model 105 may further include a Convolutional neural network (CNN) sub-model to determine feature maps of the objects corresponding to the landmarks, a Region Proposal Network (RPN) sub-model to determine the bounding boxes of the objects corresponding to landmarks and a plurality of trained Support-Vector Machine (SVM) sub-models to determine if the bounding boxes includes the objects corresponding to the target landmarks. In some embodiments, faster R-CNN model 105 may also include a linear regression sub-model to fine-tuning the determined bounding boxes.

In some embodiments, the CNN sub-model may process data such as 2-D images determined from the 3-D map that includes the objects corresponding to landmarks and may use geometric layer labels associated with the 3-D map as input for the sub-model. The architecture of the CNN sub-model includes a stack of distinct layers that transform the input into the output (e.g., object features of the objects corresponding to landmarks and/or feature maps of the objects) .

The output of the CNN sub-model may be used as input for the RPN sub-model to determine the bounding boxes of the objects corresponding to the landmarks. The RPN sub-model may include a classifier and a regressor. For example, proposals of the bounding boxes may be generated based on the feature maps of the objects, the classifiers may determine the probability of the proposed bounding boxes having the target objects and the regressor may regress the coordinates of the proposed bounding boxes. Based on the classifier and the regressor, the RPN sub-model may determine candidate bounding boxes that include the objects corresponding to landmarks.

The trained SVMs may be used to determine if the candidate bounding boxes include the objects corresponding to the target landmarks. For example, each SVM may be trained by given a set of training examples (e.g., bounding boxes that include the objects corresponding to a target landmark and bounding boxes that do not include the objects corresponding to the target landmark) . When the candidate bounding boxes are used as input of the SVM, the SVM can determine whether the candidate bounding boxes include the objects corresponding to the target landmark. For example, the output of the SVM may be “Yes” (e.g., a probability higher than a threshold value) if the candidate bounding box includes the objects corresponding to target landmarks. In some embodiments, the bounding boxes may further be fine-tuned by a regressor. For example, each bonding box may be tuned to include just the object corresponding to the target landmark.

3-D map evaluation device 110 may receive trained aster R-CNN model 105 from model training device 120.3-D map evaluation device 110 may include a processor and a non-transitory computer-readable medium (not shown) . The processor may perform instructions of a 3-D map evaluation process stored in the medium. 3-D map evaluation device 110 may additionally include input and output interfaces to communicate with database/repository 150, sensor 160, network 170 and/or a user interface of display device 130. The input interface may be used for selecting a 3-D map for evaluation or initiating the evaluation process. The output interface may be used for providing a 3-D map evaluation result 107.

Display 130 may include a display such as a Liquid Crystal Display (LCD) , a Light Emitting Diode Display (LED) , a plasma display, or any other type of display, and provide a Graphical User Interface (GUI) presented on the display for user input and data depiction. The display may include a number of different types of materials, such as plastic or glass, and may be touch-sensitive to receive inputs from the user. For example, the display may include a touch-sensitive material that is substantially rigid, such as Gorilla Glass ^TM, or substantially pliable, such as Willow Glass ^TM. In some embodiments, display 130 may be part of 3-D map evaluation device 110.

FIG. 2 illustrates a block diagram of an exemplary 3-D map evaluation device 110, according to embodiments of the disclosure. In some embodiments, as shown in FIG. 2, 3-D map evaluation device 110 may include a communication interface 202, a processor 204, a memory 206, and a storage 208. In some embodiments, 3-D map evaluation device 110 may have different modules in a signal device, such as an integrated circuit (IC) chip (e.g., implemented as an application-specific integrated circuit (ASIC) or a field-programmable gate array (FPGA) ) , or separate devices with dedicated functions. In some embodiments, one or more components of 3-D map evaluation device 110 may be located in a cloud or may be alternatively in a single location (such as inside a mobile device) or distributed locations. Components of 3-D map evaluation device 110 may be in an integrated device or distributed at different locations but communicate with each other through a network (not shown) . Consistent with the president disclosure, 3-D map evaluation device 110 may be configured to evaluate 3-D maps 104 from database/repository 150 against measurement data 102 received from database/repository 150 and faster R-CNN model 105 trained in model training device 120.

Communication interface 202 may send data to and receive data from components such as database/repository 150, sensor 160, model training device 120 and display device 130 via communication cables, a Wireless Local Area Network (WLAN) , a Wide Area Network (WAN) , wireless networks such as radio waves, a cellular network, and/or a local or short-range wireless network (e.g., Bluetooth ^TM) , or other communication methods. In some embodiments, communication interface 202 may include an integrated services digital network (ISDN) card, cable modem, satellite modem, or a modem to provide a data communication connection. As another example, communication interface 202 may include a local area network (LAN) card to provide a data communication connection to a compatible LAN. Wireless links can also be implemented by communication interface 202. In such an implementation, communication interface 202 can send and receive electrical, electromagnetic or optical signals that carry digital data streams representing various types of information.

Consistent with some embodiments, communication interface 202 may receive faster R-CNN model 105 from model training device 120, 3-D maps 104 and measurement data 102 from database/repository 150. Communication interface 202 may further provide 3-D map 104, measurement data 102, and Raster R-CNN model 105 to memory 206 and/or storage 208 for storage or to processor 204 for processing.

Processor 204 may include any appropriate type of general-purpose or special-purpose microprocessor, digital signal processor, or microcontroller. Processor 204 may be configured as a separate processor module dedicated to evaluating 3-D maps using a learning model. Alternatively, processor 204 may be configured as a shared processor module for performing other functions in addition to 3-D map evaluation.

Memory 206 and storage 208 may include any appropriate type of mass storage provided to store any type of information that processor 204 may need to operate. Memory 206 and storage 208 may be a volatile or non-volatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other type of storage device or tangible (i.e., non-transitory) computer-readable medium including, but not limited to, a ROM, a flash memory, a dynamic RAM, and a static RAM. Memory 206 and/or storage 208 may be configured to store one or more computer programs that may be executed by processor 204 to perform functions disclosed herein. For example, memory 206 and/or storage 208 may be configured to store program (s) that may be executed by processor 204 to evaluate 3-D maps 104 based on faster R-CNN model 105.

In some embodiments, memory 206 and/or storage 208 may also store intermediate data such as the object features of target landmarks, feature maps output by layers of the learning model, bounding boxes, parameters of the landmark objects, etc. Memory 206 and/or storage 208 may additionally store various learning models including their model parameters, such as faster R-CNN model 105, etc.

As shown in FIG. 2, processor 204 may include multiple modules, such as a 2-D image determination unit 240, an object identification unit 242, a geometric representation construction unit 244 and a map evaluation unit 246, and the like. These modules (and any corresponding sub-modules or sub-units) can be hardware units (e.g., portions of an integrated circuit) of processor 204 designed for use with other components or software units implemented by processor 204 through executing at least part of a program. The program may be stored on a computer-readable medium, and when executed by processor 204, it may perform one or more functions. Although FIG. 2 shows units 240-246 all within one processor 204, it is contemplated that these units may be distributed among different processors located closely or remotely with each other.

In some embodiments, units 242-246 of FIG. 2 may execute computer instructions to perform the evaluation. For example, FIG. 3 illustrates a flowchart of an exemplary method 300 for 3-D map evaluation based on faster R-CNN model 105, according to embodiments of the disclosure. Method 300 may be implemented by 3-D map evaluation device 110 and particularly processor 204 or a separate processor not shown in FIG. 2. Method 300 may include steps S302-S314 as described below. It is to be appreciated that some of the steps may be performed simultaneously, or in a different order than shown in FIG. 3.

In step S302 and step S304, communication interface 202 may receive 3-D map 104 including target landmarks from database/repository 150 and ground truth data of the target landmarks (e.g., measurement data 102) from database/repository 150. In some embodiments the target landmarks may be ground marks (e.g., traffic lanes and/or crosswalk lanes) and/or standing marks (e.g., traffic lights and/or traffic signs) . In some embodiments, the ground truth data may be the geometric measurements of the target landmark (e.g., the length, the width and the height of the target landmark) .

In step S306, 2-D image determination unit 240 may determine 2-D images from 3-D map 104. In some embodiments, 2-D image determination unit 240 may determine an object in 3-D map 104 that corresponds to the target landmark and may determine 2-D images from 3-D map 104 that contain the object. In some embodiments, if the target landmark is a ground mark such as a traffic lane, 2-D image determination unit 240 may project 3-D map 104 into 2-D data, determine a position corresponding to the target landmark in the 2-D data, and extract the 2-D image surrounding the position which includes the object. For example, 2-D image determination unit 240 may use a a pillar filtering method to filter 3-D map 104 where points within a same 2-D region may be treated as a “pillar. ” The points within each pillar may be then clustered according to constrains on distance. The lowest cluster point may be used as the height and color information of the 2-D image.

In some other embodiments, if the target landmark is a standing mark such as a traffic sign, 2-D image determination unit 240 may first project the top view of the sensor data (e.g., point cloud data) corresponding to the target landmark within 3-D map 104 to obtain a first image. 2-D image determination unit 240 may then map the height information of the sensor data corresponding to the object to a second image. The 2-D image may be then determined by combining the first and the second image together.

In step S308, object identification unit 242 may identify objects corresponding to landmarks from the 2-D images and determine a bounding box of the objects based on faster R-CNN model 105. In some embodiments, object identification unit 242 may use faster R-CNN model 105 that includes a CNN sub-model and an RPN sub-model to determine bounding boxes of the target landmarks. For example, object identification unit 242 may label different geometric layers within the images determined in step S306, use the different geometric layer labels as input and determine a feature map of the images using the CNN sub-model. Object identification unit 242 may then use the feature map as input to the RPN sub-model to determine features of candidate bounding boxes of the object.

In some embodiments, object identification unit 242 may also use a trained SVM to determine if the object within the candidate bounding box corresponds to the target landmark. For example, the SVM may be trained by given a set of training examples (e.g., bounding boxes that include the objects corresponding to target landmarks and bounding boxes that do not include the objects corresponding to target landmarks) and when the candidate bounding boxes are used as input of the SVM, the SVM can determine whether the bounding boxes include the objects corresponding to target landmarks. In some embodiments, object identification unit 242 may further includes a regressor for fine-tuning the bounding boxes. For example, each bonding box may be tuned to include just the target object.

In step S310, geometric representation construction unit 244 may construct geometric representations of the objects. In some embodiments, the geometric representations of the objects are determined based on the sensor data (e.g., point cloud data) within the bounding boxes. For example, geometric representation construction unit 244 may extract the sensor data within the bounding boxes and use a Random Sample Consensus (RANSAC) algorithm to plane fit the sensor data. Geometric representation construction unit 244 may project all the sensor data to the fitted plane based on the plane function of the fitted plane. Geometric representation construction unit 244 may further transit all the sensor data to a horizontal plane based on a transfer function between the fitted plane and the horizontal plane. Then geometric representation construction unit 244 may then use an Otsu’s method to filter the sensor data that does not correspond to the target landmarks.

In some embodiments, geometric representation construction unit 244 may construct a geometric representation of the object corresponding to target landmark and the geometric representation may be selected from shapes such as rectangle, circle, triangle, etc. In some embodiments, geometric representation construction unit 244 may also establish constrains to optimize the geometric representation construction. For example, the constrains may include: all sensor data needs to be within the boundary of the geometric representation; the boundary of the geometric representation needs to be as close to the inner points of the representation as possible; the central point of the sensor data of the target landmarks should be at the same distance to all the boundary of the geometric representation; no point should be negative, etc. In some embodiments, geometric representation construction unit 244 may also apply a relaxation coefficient to the geometric representation construction so that noise may be partially tolerated outside the boundary of the geometric representation. For example, when calculating the first constrain, all sensor data needs to be within the boundary of the geometric representation. Any point with a positive relaxation will be defined as noise.

In steps S312-S314, map evaluation unit 246 may determine parameters of the geometric representation of the object and evaluate 3-D map 104 based on the parameters and corresponding measurement data 102. In some embodiments, the parameters may include the width, the length and the height of the geometric representation. For example, map evaluation unit 246 may evaluate 3-D map 104 by calculating a difference between the parameters and the corresponding ground truths (e.g., measurement data 102 of the target landmark) .

FIG. 4. illustrates a schematic diagram of an exemplary faster R-CNN model 105, according to embodiments of the disclosure. Consistent with some embodiments as shown in FIG. 4, faster R-CNN model 105 may further include a Convolutional neural network (CNN) sub-model 410 to determine feature maps of the objects corresponding to the target landmarks, a Region Proposal Network (RPN) sub-model 420 to determine the candidate bounding boxes of the objects corresponding to the target landmarks and at least one trained Support-Vector Machine (SVM) 430 to determine if the bounding boxes includes the objects corresponding to the target landmarks. In some embodiments, faster R-CNN model 105 may also include a linear regression sub-model 440 to fine-tune the determined bounding boxes.

CNN sub-model 410 may determine feature maps of the input 2-D image provided by 2-D image determination unit 240. For example, CNN sub-model 410 may process 2-D images determined from the 3-D map that includes objects corresponding to the target landmarks and may use the geometric layer labels associated with the 3-D map as an input. In some embodiment, feature map of each objects may be determined as output of CNN sub-model 410.

RPN sub-model 420 may determine candidate bounding boxes of the objects corresponding to the target landmarks based on the feature maps determined by CNN sub-model 410. For example, RPN sub-model 420 may include a classifier and a regressor. Proposals of the bounding boxes may be generated based on the feature maps of the objects. The classifiers may determine the probability of the proposed bounding boxes having the target object and the regressor may regress the coordinates of the proposed bounding boxes. Thus, RPN sub-model 420 may determine the bounding boxes that include objects corresponding to landmarks based on the classifier and the regressor.

SVM 430 may match the candidate bounding boxes with the target landmarks by determining if probabilities that the respective candidate bounding boxes include the objects corresponding to target landmarks exceed a threshold value. For example, SVM 430 may be trained by a given set of training examples (e.g., bounding boxes that include an object corresponding to a target landmark and bounding boxes that do not include the object corresponding to the target landmark) to determine if the candidate bounding boxes include the object. In some embodiments, one SVM is trained for each target landmark. The candidate bounding boxes may be used as input to the SVM. In some embodiments, if the output is yes (e.g., a probability that exceeds a threshold value indicating the candidate bounding box includes the object) , the candidate bounding boxes may be used as the bounding box for the corresponding target landmark.

In some embodiments, faster R-CNN model 105 may further include linear regression model 440 for fine-tuning the bounding boxes using a linear regressor. For example, each bonding box may be tuned using linear regression model 440 to include just the target landmark.

In some embodiments, faster R-CNN model 105 may be trained by model training device 120 by determining one or more parameters of at least one layer in any sub-model of the learning model and may be transmitted to 3-D map evaluation device 110 for performing 3-D map evaluation related functions. For example, a convolutional layer of a CNN model may include at least one filter or kernel. One or more parameters, such as kernel weights, size, shape, and structure, of the at least one filter may be determined by e.g., a backpropagation-based training process. Consistent with some embodiments, faster R-CNN model 105 may be trained using supervised learning.

Another aspect of the disclosure is directed to a non-transitory computer-readable medium storing instructions which, when executed, cause one or more processors to perform the methods, as discussed above. The computer-readable medium may include volatile or non-volatile, magnetic, semiconductor, tape, optical, removable, non-removable, or other types of computer-readable medium or computer-readable storage devices. For example, the computer-readable medium may be the storage device or the memory module having the computer instructions stored thereon, as disclosed. In some embodiments, the computer-readable medium may be a disc or a flash drive having the computer instructions stored thereon.

It will be apparent to those skilled in the art that various modifications and variations can be made to the disclosed system and related methods. Other embodiments will be apparent to those skilled in the art from consideration of the specification and practice of the disclosed system and related methods.

It is intended that the specification and examples be considered as exemplary only, with a true scope being indicated by the following claims and their equivalents.

Claims

A method for evaluating a three-dimensional (3-D) map constructed based on sensor data, comprising:

receiving, by a communication interface, the 3-D map and measurement data of a target landmark;

identifying, by at least one processor, an object in the 3-D map corresponding to the target landmark and determining a bounding box of the object based on a learning model;

determining, by the at least one processor, a geometric representation of the object based on a subset of the sensor data extracted corresponding to the bounding box;

determining, by the at least one processor, parameters associated with the geometric representation; and

evaluating, by the at least one processor, the 3-D map by comparing the measurement data of the target landmark and the determined parameters.
The method of claim 1, wherein the learning model is a faster R-CNN model.
The method of claim 1, wherein identifying the object corresponding to the target landmark further comprises:

determining a 2-D image from the 3-D map that contains the object;

determining object features from the 2-D image; and

matching the object with the target landmark based on the object features.
The method of claim 3, wherein determining object features further comprises:

determining a feature map by applying a Convolutional Neural Network (CNN) to the image; and

determining the object features by applying a Region Proposal Network (RPN) to the feature map.
The method of claim 3, wherein the object is matched with the target landmark using a Support-Vector Machine (SVM) .
The method of claim 1, wherein the bounding box is determined using a linear regression model.
The method of claim 1, wherein target landmark is a standing sign, and determining the 2-D image further comprises:

projecting the top view of the sensor data corresponding to the object to obtain a first image;

mapping height information of the sensor data corresponding the object to a second image; and

determining the 2-D image based on the first image and the second image.
The method of claim 1, wherein the target landmark is a ground mark, and determining the 2-D image further comprises:

projecting the 3-D map into 2-D data;

determining a position corresponding to the target landmark in the 2-D data; and

extracting the 2-D image surrounding the position.
The method of claim 1, wherein the sensor data is point cloud data acquired by a LiDAR.
The method of claim 1, wherein determining the parameters further comprises:

fitting the subset of sensor data into a plane;

constructing the geometric representation for the object in the plane; and

determining parameters of the geometric representation based on a set of constraints.
The method of claim 2, wherein the faster R-CNN model is trained using the 3-D map and geometric layer labels associated with the 3-D map.
The method of claim 2, wherein the faster R-CNN model is separately trained for ground marks and standing signs.
The method of claim 1, further comprising associating the determined parameters with the target landmark.
A system for evaluating a three-dimensional (3-D) map constructed based on sensor data, comprising:

a communication interface configured to receive the 3-D map and measurement data of a target landmark;

a storage configured to store the 3-D map and the measurement data; and

at least one processor coupled to the storage and configured to:

identify an object in the 3-D map corresponding to the target landmark and determine a bounding box of the object based on a learning model;

determine a geometric representation of the object based on a subset of the sensor data extracted corresponding to the bounding box;

determine parameters associated with the geometric representation; and

evaluate the 3-D map by comparing the measurement data of the target landmark and the determined parameters.
The system of claim 14, wherein the learning model is a faster R-CNN model.
The system of claim 14, wherein to identify the object corresponding to the target landmark, the at least one processor is further configured to:

determine a 2-D image from the 3-D map that contains the object;

determine object features from the 2-D image; and

match the object with the target landmark based on the object features.
The system of claim 16, wherein to determine the object features, the at least one processor is further configured to:

determine a feature map by applying a Convolutional Neural Network (CNN) to the image; and

determine the object features by applying a Region Proposal Network (RPN) to the feature map.
The method of claim 16, wherein the object is matched with the target landmark using a Support-Vector Machine (SVM) .
A non-transitory computer-readable medium storing instructions that, when executed by one or more processors, cause the one or more processors to perform a method for evaluating a three-dimensional (3-D) map constructed based on sensor data, comprising:

receiving the 3-D map and measurement data of a target landmark;

identifying an object in the 3-D map corresponding to the target landmark and determining a bounding box of the object based on a learning model;

determining a geometric representation of the object based on a subset of the sensor data extracted corresponding to the bounding box;

determining parameters associated with the geometric representation; and

evaluating the 3-D map by comparing the measurement data of the target landmark and the determined parameters.
The non-transitory computer-readable medium of claim 19, wherein the learning model is a faster R-CNN model.