CN111523515A

CN111523515A - Method and device for evaluating environment cognitive ability of automatic driving vehicle and storage medium

Info

Publication number: CN111523515A
Application number: CN202010400859.7A
Authority: CN
Inventors: 王家梁; 郭正东
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-05-13
Filing date: 2020-05-13
Publication date: 2020-08-11

Abstract

The application discloses a method, equipment and a storage medium for evaluating environment cognitive ability of an automatic driving vehicle, and relates to the field of automatic driving. The method comprises the steps that traffic scene data to be graded are obtained, wherein the traffic scene data comprise two-dimensional image data and/or three-dimensional point cloud data of a traffic scene; processing traffic scene data through a traffic scene complexity classification model to extract feature vectors according to the traffic scene data, and grading the traffic scene data to obtain corresponding target complexity grades; and evaluating the environment cognitive ability of the automatic driving vehicle according to the traffic scene data and the target complexity level. According to the method and the device, the complexity grading can be carried out on any traffic scene data comprising the two-dimensional image data and/or the three-dimensional point cloud data through the traffic complexity grading model, the method and the device are suitable for various traffic scene data, the complexity grading is accurate, the distinguishing degree is good, and the environment cognitive ability of the automatic driving vehicle can be more accurately evaluated based on the traffic scene data and the complexity grade of the traffic scene data.

Description

Method and device for evaluating environment cognitive ability of automatic driving vehicle and storage medium

Technical Field

The embodiment of the application relates to the technical field of data processing, in particular to a method, equipment and a storage medium for evaluating environment cognitive ability of an automatic driving vehicle, which can be used in the field of automatic driving.

Background

In the process of evaluating the environmental awareness of an autonomous vehicle, the evaluation needs to be based on an autonomous traffic scene data set. Different driving conditions require cognitive abilities of different degrees, and if the autonomous vehicle can accurately sense a more complex scene, the autonomous vehicle is considered to have stronger cognitive ability. Therefore, traffic scene data of different complexity levels can be generally adopted to test the accurate perception of the automatic driving vehicle, so that the environmental cognitive ability of the automatic driving vehicle can be evaluated, and therefore, the more comprehensive complexity grading needs to be carried out on the automatic driving traffic scene data set, so that the environmental cognitive ability of the automatic driving vehicle can be accurately evaluated.

Existing autonomous driving traffic scene data sets mainly include KITTI data sets and Robotcar data sets. The KITTI data set is obtained by dividing the collected original data set into different categories, such as roads, cities, residences, campuses, crowds and the like, and further counting the number of different traffic participants, such as automobiles, trucks, pedestrians, bicycles and the like, namely, the complexity grading based on the KITTI data set aims at processing specific objects. In addition, for three-dimensional objects in the KITTI dataset, the reference for the hierarchy of complexity is based on specific references, e.g., view angle and degree of occlusion. While the Robotcar dataset introduces a global representation of complexity such as environmental conditions, for example: for pedestrian, bicycle and vehicular traffic, light and heavy rain, etc.

Based on this, in the conventional traffic scene data set, the complexity grading method of the KITTI data set is only specific to a specific object, and is specific to a specific reference, so the complexity grading is not suitable for representing the inherent complexity of the whole traffic scene, while the complexity grading method of the Robotcar data set is only based on global representation, and therefore is not representative, and lacks representations of road types and partial scene conditions. That is, the complexity classification manner of the existing traffic scene data set cannot be applied to all traffic scenes, and the complexity classification is not accurate enough or the degree of distinction is not good, and if the complexity classification of the traffic scene data is used for testing and evaluating the environment cognitive ability of the autonomous vehicle, the testing and evaluation of the environment cognitive ability of the autonomous vehicle may be inaccurate.

Disclosure of Invention

The application provides an evaluation method, equipment and storage medium for environment cognitive ability of an automatic driving vehicle.

In a first aspect, the present application provides a method for evaluating environmental awareness of an autonomous vehicle, comprising:

acquiring traffic scene data to be graded, wherein the traffic scene data comprises two-dimensional image data and/or three-dimensional point cloud data of a traffic scene;

processing the traffic scene data through a traffic scene complexity classification model to extract a feature vector according to the two-dimensional image data and/or the three-dimensional point cloud data, and classifying the traffic scene data according to the feature vector to obtain a target complexity grade corresponding to the traffic scene data;

and evaluating the environment cognitive ability of the automatic driving vehicle according to the traffic scene data and the target complexity level corresponding to the traffic scene data.

In a second aspect, the present application provides an evaluation device for environmental awareness of an autonomous vehicle, comprising:

the system comprises an acquisition module, a classification module and a classification module, wherein the acquisition module is used for acquiring traffic scene data to be classified, and the traffic scene data comprises two-dimensional image data and/or three-dimensional point cloud data of a traffic scene;

the processing module is used for processing the traffic scene data through a traffic scene complexity classification model so as to extract a characteristic vector according to the two-dimensional image data and/or the three-dimensional point cloud data and classify the traffic scene data according to the characteristic vector to obtain a target complexity grade corresponding to the traffic scene data;

and the evaluation module is used for evaluating the environment cognitive ability of the automatic driving vehicle according to the traffic scene data and the target complexity level corresponding to the traffic scene data.

In a third aspect, the present application provides an electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of the first aspect.

In a fourth aspect, the present application provides a non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of the first aspect.

In a fifth aspect, the present application provides a computer program comprising program code for performing the method according to the first aspect when the computer program is run by a computer.

According to the method, the equipment and the storage medium for evaluating the environment cognitive ability of the automatic driving vehicle, traffic scene data to be graded are obtained, and the traffic scene data comprise two-dimensional image data and/or three-dimensional point cloud data of a traffic scene; processing the traffic scene data through a traffic scene complexity classification model to extract a feature vector according to the two-dimensional image data and/or the three-dimensional point cloud data, and classifying the traffic scene data according to the feature vector to obtain a target complexity grade corresponding to the traffic scene data; and evaluating the environment cognitive ability of the automatic driving vehicle according to the traffic scene data and the target complexity level corresponding to the traffic scene data. According to the method and the device, the complexity grading can be carried out on any traffic scene data comprising the two-dimensional image data and/or the three-dimensional point cloud data through the traffic complexity classification model, the method and the device are applicable to the complexity grading of various traffic scene data, the complexity grading is accurate, the distinguishing degree is good, and therefore the environment cognitive ability of the automatic driving vehicle can be evaluated more accurately based on the traffic scene data and the complexity grade of the traffic scene data.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:

FIG. 1 is a flow chart of an evaluation of environmental awareness capabilities of an autonomous vehicle provided in accordance with an embodiment of the present application;

FIG. 2 is a flow chart of an evaluation of environmental awareness capabilities of an autonomous vehicle provided in accordance with another embodiment of the present application;

FIG. 3 is a flow chart of an evaluation of environmental awareness capabilities of an autonomous vehicle provided in accordance with another embodiment of the present application;

FIG. 4 is a schematic diagram of a traffic scene complexity classification model provided according to an embodiment of the present application;

FIG. 5 is a flow chart of an evaluation of environmental awareness capabilities of an autonomous vehicle provided in accordance with another embodiment of the present application;

FIG. 6 is a block diagram of an apparatus for evaluating environmental awareness capabilities of an autonomous vehicle according to an embodiment of the present application;

fig. 7 is a block diagram of an electronic device for implementing the method for evaluating the environment awareness ability of the autonomous vehicle according to the embodiment of the present application.

Detailed Description

The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Because the complexity grading mode of the traffic scene data set in the prior art cannot be applied to all traffic scenes, and the complexity grading is not accurate enough or the differentiation is not good, if the complexity grading of the traffic scene data is used for testing and evaluating the environment cognitive ability of the automatic driving vehicle, the testing and evaluation of the environment cognitive ability of the automatic driving vehicle are inaccurate. In addition, the existing traffic scene data set simply carries out semantic level classification or topological structure element classification on the traffic scene, and is difficult to represent the complexity of the traffic scene.

Therefore, the method for carrying out complexity grading on traffic scene data through the pre-trained traffic scene complexity classification model is considered and provided in the application, the traffic scene complexity classification model can be a neural network and is suitable for complexity grading of all traffic scenes, and therefore the environment cognitive ability of the automatic driving vehicle is evaluated according to the target complexity grade corresponding to the traffic scene data.

An embodiment of the present application provides an evaluation method for environment cognition ability of an autonomous vehicle, and fig. 1 is a flowchart of the evaluation method for environment cognition ability of an autonomous vehicle according to the embodiment of the present invention. As shown in fig. 1, the method for evaluating the environmental awareness ability of the autonomous vehicle specifically comprises the following steps:

s101, obtaining traffic scene data to be graded, wherein the traffic scene data comprises two-dimensional image data and/or three-dimensional point cloud data of a traffic scene.

In this embodiment, traffic scene data to be ranked is first obtained, where the traffic scene data may be two-dimensional image data of a traffic scene or three-dimensional point cloud data of the traffic scene, and certainly, the two-dimensional image data and the three-dimensional point cloud data may also be mixed, where the two-dimensional image data and the three-dimensional point cloud data are corresponding data at this time, that is, the two-dimensional image data and the three-dimensional point cloud data include one or more identical objects, such as identical vehicles, pedestrians, and the like. Optionally, the two-dimensional image data may be acquired in advance by using a camera; the three-dimensional point cloud data can be acquired in advance by using a laser radar sensor, for example, a camera and/or a laser radar sensor of a test vehicle or an automatic driving vehicle are used for acquiring real traffic scene data.

S102, processing the traffic scene data through a traffic scene complexity classification model to extract a feature vector according to the two-dimensional image data and/or the three-dimensional point cloud data, and classifying the traffic scene data according to the feature vector to obtain a target complexity grade corresponding to the traffic scene data.

In this embodiment, the traffic scene complexity classification model may be a neural network model trained in advance, or may be implemented by using other models, and the traffic scene data is subjected to complexity classification by using the traffic scene complexity classification model. Specifically, through a traffic scene complexity classification model, feature vectors are extracted from two-dimensional image data and/or three-dimensional point cloud data of a traffic scene included in traffic scene data, and optionally, in the traffic scene complexity classification model, the feature vectors can be extracted from the two-dimensional image data of the traffic scene data by adopting a residual error network sub-model; for the traffic scene data which are three-dimensional Point cloud data, extracting the characteristic vector by adopting a Point-net network sub-model; and for the traffic scene data which are two-dimensional image data and three-dimensional point cloud data, respectively extracting characteristic vectors from the two-dimensional image data and the three-dimensional point cloud data, fusing the characteristic vectors, and taking the obtained fused characteristic vectors as the characteristic vectors. And then grading the traffic scene data based on the feature vectors to obtain a target complexity grade corresponding to the traffic scene data.

S103, evaluating the environment cognitive ability of the automatic driving vehicle according to the traffic scene data and the target complexity level corresponding to the traffic scene data.

In this embodiment, after the target complexity level corresponding to the traffic scene data is obtained, the traffic scene data may be used to test the environmental cognitive ability of the autonomous vehicle, specifically including two aspects, that is, extracting geometric metric information from the traffic scene, that is, extracting the relative position relationship between the position of the autonomous vehicle and the road line, the road boundary line, and other surrounding vehicles, pedestrians and other road participants; and secondly, how to realize safe driving of the automatic driving vehicle in a future period of time is presumed according to the movement intentions of surrounding vehicles and pedestrians, and an accurate decision is made. If the autonomous vehicle can accurately realize the two aspects according to the traffic scene data, it is determined that the environment cognitive ability of the autonomous vehicle can cope with the traffic scene with the target complexity level corresponding to the traffic scene data, and if the traffic scene with the higher complexity level can be coped with by the autonomous vehicle, the environment cognitive ability of the autonomous vehicle is stronger. Optionally, the environmental cognitive performance of the autonomous vehicle may be sequentially tested according to a plurality of traffic scene data with complexity levels from low to high, and the upper limit of the environmental cognitive performance of the autonomous vehicle is determined until the autonomous vehicle cannot cope with a traffic scene with a certain complexity level.

In the present embodiment, the process of evaluating the environment recognition ability of the autonomous vehicle may be executed on the same processor as that of S101 to S102, or may be executed on a different processor.

According to the method for evaluating the environment cognitive ability of the automatic driving vehicle, traffic scene data to be graded are obtained, wherein the traffic scene data comprise two-dimensional image data and/or three-dimensional point cloud data of a traffic scene; processing the traffic scene data through a traffic scene complexity classification model to extract a feature vector according to the two-dimensional image data and/or the three-dimensional point cloud data, and classifying the traffic scene data according to the feature vector to obtain a target complexity grade corresponding to the traffic scene data; and evaluating the environment cognitive ability of the automatic driving vehicle according to the traffic scene data and the target complexity level corresponding to the traffic scene data. According to the method provided by the embodiment, the complexity grading can be performed on any traffic scene data comprising two-dimensional image data and/or three-dimensional point cloud data through the traffic complexity classification model, the method is applicable to the complexity grading of various traffic scene data, the complexity grading is accurate, the distinguishing degree is good, and therefore the environment cognitive ability of the automatic driving vehicle can be more accurately evaluated based on the traffic scene data and the complexity grade of the traffic scene data.

On the basis of the above embodiment, optionally, the traffic scene data includes two-dimensional image data and three-dimensional point cloud data of a traffic scene, and the two-dimensional image data corresponds to the three-dimensional point cloud data; further, as shown in fig. 2, the processing the traffic scene data through the traffic scene complexity classification model in S102 in the foregoing embodiment to extract the feature vector according to the two-dimensional image data and/or the three-dimensional point cloud data may specifically include:

s201, extracting a first characteristic vector from the two-dimensional image data through a residual network sub-model in the traffic scene complexity classification model, and extracting a second characteristic vector from the three-dimensional Point cloud data through a Point-net network sub-model in the traffic scene complexity classification model;

s202, fusing the first feature vector and the second feature vector to obtain a fused feature vector as the feature vector.

In this embodiment, considering that the features of the pixel level extracted from the two-dimensional image data are influenced by the factors such as image acquisition conditions, illumination, pose, and angle, and similarly, the features extracted from the three-dimensional point cloud data are also influenced by the factors such as noise, uneven sampling, and data loss of the three-dimensional point cloud data, so that the complexity classification is performed according to the feature vectors of the two-dimensional image data, or the complexity classification is performed according to the feature vectors of the three-dimensional point cloud data, which may cause the complexity classification to be inaccurate, in this embodiment, the corresponding two-dimensional image data and three-dimensional point cloud data are used for the traffic scene data, and the feature vectors are extracted from the two-dimensional image data and the three-dimensional point cloud data, and then the extracted vectors are fused to obtain the fused feature vectors, which are used as the feature vectors for the complexity classification based on the fused feature vectors, the accuracy and robustness can be improved, and the defect of a single certain characteristic is avoided.

In addition, because of the traditional traffic scene calculation and understanding method, the traffic scene is simply subjected to semantic level classification or topological structure element classification, namely the semantic level traffic scene is structurally represented, the traffic scene is composed of two-dimensional semantic labels of pixel level scenes and two-dimensional lane line labels of fine granularity, the geometric measurement of the traffic elements in the traffic scene is formed, the topological structure elements of the traffic elements are extracted, but the original image contains texture information of the image, the topological structure information of the scene is reflected through semantic segmentation, and although the information is rich, the depth information is lost. In the embodiment, the feature vectors are respectively extracted from the two-dimensional image data and the three-dimensional point cloud data, and then the extracted vectors are fused to obtain the fused feature vectors, so that the problems can be solved, and the extracted feature vectors have relatively comprehensive information and cannot lose information.

In the embodiment, the residual error network submodel is adopted in the traffic scene complexity classification model to extract the first feature vector from the two-dimensional image data, the residual error network can extract more features from the two-dimensional image data by increasing the network depth, the expression capability is stronger, the residual error network is easier to optimize, and the learning effect is better.

And extracting a second characteristic vector from the three-dimensional Point cloud data by adopting a Point-net network sub-model in the traffic scene complexity classification model. Conventional network models for processing three-dimensional point cloud data typically project the point cloud data into a two-dimensional Grid (Grid), such as a bird's eye view and a front view, or perform voxelization (voxelize) of the point cloud as an input of a network, wherein the voxelization is to divide the point cloud in a space into three-dimensional cubes in the space. Although the traditional network model can also be used for extracting the characteristics of the three-dimensional Point cloud data in the embodiment, considering that some information of the three-dimensional Point cloud data is lost in a projection process or a voxelization process in a mode of inputting the three-dimensional Point cloud data to a two-dimensional grid and then inputting the three-dimensional Point cloud data to the network model for extracting the characteristics after projecting the three-dimensional Point cloud data to the two-dimensional grid in the traditional network model, the three-dimensional Point cloud data is processed by using a Point-net network in the embodiment, and the Point-net network allows the three-dimensional Point cloud data to be directly input, so that the information loss of the three-dimensional Point cloud data can be avoided, and more characteristics can be obtained from the three-; in addition, the input information of the Point-net network has order invariance, and because the Point set has disorder and transformation invariance in the Euclidean space, different from pixels and signals in an image, the Point-net network does not change along with the change of the input order for a specific Point set, so that the Point-net network has the capability of not influencing feature extraction no matter how the order of the input Point set, and simultaneously has the capability of extracting local features of points and adjacent points.

Further, the fusing the first feature vector and the second feature vector to obtain a fused feature vector in S202 may specifically include:

and fusing the first feature vector and the second feature vector through a feature fusion sub-model of the traffic scene complexity classification model to obtain the fusion feature vector.

In this embodiment, the feature fusion submodel of the traffic scene complexity classification model may be an Embedding layer, and the first feature vector and the second feature vector are fused, specifically, the first feature vector and the second feature vector may be connected in series to realize the fusion of the two feature vectors, so as to obtain one fusion feature vector and reduce the dimensionality of the feature vector.

On the basis of any of the above embodiments, as shown in fig. 3, the classifying the traffic scene data according to the feature vector in S102 to obtain a target complexity level corresponding to the traffic scene data may specifically include:

s301, configuring corresponding weight for each dimension in the feature vector.

In this embodiment, when traffic scene data is classified according to a feature vector, the feature elements of each dimension in the feature vector have different importance degrees and influences on complexity levels, so that a corresponding weight is configured for each dimension of the feature vector, for example, a weight corresponding to a dimension with a higher importance degree is larger; for another example, if the quality of the two-dimensional image data is good (noise is low), the weight of the dimension derived from the two-dimensional image data in the feature vector is larger, and if the quality of the three-dimensional point cloud data is good, the weight of the dimension derived from the three-dimensional point cloud data in the feature vector is larger; for another example, if each channel uses 64-kernel convolution on feature channels of the same dimension, a matrix of 64 channels is generated, the feature of each channel represents a component of the input signal on a different convolution kernel, that is, the feature is decomposed into components on 64 convolution kernels, and each component contributes to the key information to a small extent, so that a corresponding weight can be configured for each component to represent the contribution of the component to the key information.

In an optional embodiment, each dimension in the feature vector may be configured with a corresponding weight by an Attention (Attention) sub-model in the traffic scene complexity classification model, wherein a dimension with a higher degree of importance corresponds to a larger weight, and/or a dimension from which the traffic scene data quality is better corresponds to a larger weight.

In the embodiment, each dimension in the feature vectors is dynamically configured with a corresponding weight through an attention mechanism in the traffic scene complexity classification model, so that the traffic scene complexity classification model is more concentrated on the dimension with the larger weight in the feature vectors, and the complexity classification is more scientific and accurate. In particular, attention is paid to how the force mechanism is weight configured, which can be achieved through training.

S302, determining a target complexity level corresponding to the traffic scene data through a classifier in the traffic scene complexity classification model according to the feature vector and the weight corresponding to each dimension in the feature vector.

In this embodiment, after obtaining the weight corresponding to each dimension in the feature vector, the traffic scene complexity classification model may implement final complexity classification by using a classifier of an output layer, and output a target complexity grade corresponding to traffic scene data, where the classifier may be a Soft Max classifier.

Specifically, the probability that the traffic scene data respectively belongs to each of a plurality of preset complexity levels can be determined by the classifier according to the feature vector and the weight corresponding to each dimension in the feature vector, and the preset complexity level with the maximum probability is used as the target complexity level.

In this embodiment, the classifier is used to calculate the probability that the traffic scene data respectively belongs to each of a plurality of preset complexity levels, for example, 10 preset complexity levels are set, the classifier is used to calculate the probability that the traffic scene data belongs to the 10 preset complexity levels, and the preset complexity level corresponding to the maximum probability is used as the target complexity level and output. In the embodiment, the target complexity level of the traffic scene data can be quickly acquired through the Soft Max classifier.

On the basis of the above embodiments, the present embodiment provides a traffic scene complexity classification model, as shown in fig. 4, which includes a residual network sub-model, a Point-net network sub-model, a feature fusion sub-model (Embedding), an Attention sub-model (Attention), and a classifier (e.g., Soft Max classifier). Based on the traffic scene complexity classification model, the method for evaluating the environmental cognitive ability of the automatic driving vehicle comprises the following steps:

inputting two-dimensional image data in traffic scene data into a residual error network submodel to extract a first feature vector, inputting three-dimensional Point cloud data in the traffic scene data into a Point-net network submodel to extract a second feature vector, fusing the first feature vector and the second feature vector through a feature fusion submodel to obtain a fusion feature vector, inputting the fusion feature vector into an attention submodel, configuring corresponding weight for each dimension in the feature vector through the attention submodel, and determining a target complexity level corresponding to the traffic scene data through a classifier according to the feature vector and the weight corresponding to each dimension in the feature vector.

According to the method for evaluating the environment cognitive ability of the automatic driving vehicle, the complexity grading can be performed on any traffic scene data comprising two-dimensional image data and/or three-dimensional point cloud data through the traffic complexity classification model, the method is suitable for the complexity grading of various traffic scene data, the complexity grading is accurate, the discrimination degree is good, and therefore the environment cognitive ability of the automatic driving vehicle can be evaluated more accurately based on the traffic scene data and the complexity grade of the traffic scene data. And the traffic scene data adopts corresponding two-dimensional image data and three-dimensional point cloud data, the traffic complexity classification model respectively extracts feature vectors from the two-dimensional image data and the three-dimensional point cloud data, then the extracted vectors are fused to obtain fused feature vectors, the fused feature vectors are used as the feature vectors, and complexity classification is carried out based on the fused feature vectors, so that the accuracy and the robustness can be improved, the defect of a single feature is avoided, in addition, an attention mechanism is introduced, the dynamic configuration of corresponding weight for each dimension in the feature vectors is realized, the traffic scene complexity classification model is more concentrated on the dimension with larger weight in the feature vectors, and the complexity classification is more scientific and accurate.

On the basis of any of the above embodiments, as shown in fig. 5, the traffic scene complexity classification model needs to be obtained through a training process performed in advance, and the specific training process may include:

s501, acquiring training data and an initial traffic scene complexity classification model, wherein the training data is traffic scene data with labeled complexity grades;

s502, training the initial traffic scene complexity classification model according to the training data to obtain the traffic scene complexity classification model.

In this embodiment, the traffic scene data of the training data may include two-dimensional image data and/or three-dimensional point cloud data of the traffic scene, the two-dimensional image data may be acquired by using a camera, the three-dimensional point cloud data may be acquired by using a laser radar sensor, and the traffic scene data is labeled by a manual operation or other means according to the complexity level. It should be noted that, if the traffic scene data includes two-dimensional image data and three-dimensional point cloud data, the two-dimensional image data and the three-dimensional point cloud data are required to correspond to each other.

And constructing an initial traffic scene complexity classification model according to the framework shown in FIG. 4, and performing iterative training on the initial traffic scene complexity classification model according to training data until the accuracy of the initial traffic scene complexity classification model meets the target requirement, and ending the iterative training to obtain a final traffic scene complexity classification model. It should be noted that the training process in this embodiment may be instructed by the same processor as the method flow in the above embodiment, or may be executed by a different processor.

Fig. 6 is a structural diagram of an environment cognition ability evaluation device for an autonomous vehicle according to an embodiment of the present invention. As shown in fig. 6, the apparatus specifically includes: an acquisition module 601, a processing module 602, and an evaluation module 603.

An obtaining module 601, configured to obtain traffic scene data to be ranked, where the traffic scene data includes two-dimensional image data and/or three-dimensional point cloud data of a traffic scene;

a processing module 602, configured to process the traffic scene data through a traffic scene complexity classification model, so as to extract a feature vector according to the two-dimensional image data and/or the three-dimensional point cloud data, and classify the traffic scene data according to the feature vector, so as to obtain a target complexity grade corresponding to the traffic scene data;

and the evaluation module 603 is configured to evaluate the environment cognitive ability of the autonomous vehicle according to the traffic scene data and the target complexity level corresponding to the traffic scene data.

On the basis of the embodiment, the traffic scene data comprises two-dimensional image data and three-dimensional point cloud data of a traffic scene, and the two-dimensional image data corresponds to the three-dimensional point cloud data;

the processing module 602, when processing the traffic scene data through a traffic scene complexity classification model to extract feature vectors from the two-dimensional image data and/or the three-dimensional point cloud data, is configured to:

extracting a first characteristic vector for the two-dimensional image data through a residual error network sub-model in the traffic scene complexity classification model, and extracting a second characteristic vector for the three-dimensional Point cloud data through a Point-net network sub-model in the traffic scene complexity classification model;

and fusing the first feature vector and the second feature vector to obtain a fused feature vector as the feature vector.

On the basis of any of the above embodiments, when the processing module 602 classifies the traffic scene data according to the feature vector to obtain a target complexity level corresponding to the traffic scene data, the processing module is configured to:

configuring a respective weight for each dimension in the feature vector;

and determining a target complexity level corresponding to the traffic scene data through a classifier in the traffic scene complexity classification model according to the feature vector and the weight corresponding to each dimension in the feature vector.

On the basis of any of the above embodiments, when configuring a corresponding weight for each dimension in the feature vector, the processing module 602 is configured to:

and configuring corresponding weight for each dimension in the feature vector through an attention sub-model in the traffic scene complexity classification model, wherein the higher the importance degree is, the larger the weight is corresponding to the dimension, and/or the higher the quality of the sourced traffic scene data is, the larger the weight is corresponding to the dimension.

On the basis of any of the above embodiments, when determining, by a classifier in the traffic scene complexity classification model, a target complexity level corresponding to the traffic scene data according to the feature vector and the weight corresponding to each dimension in the feature vector, the processing module 602 is configured to:

and according to the feature vector and the weight corresponding to each dimension in the feature vector, determining the probability that the traffic scene data respectively belongs to each level in a plurality of preset complexity levels through the classifier, and taking the preset complexity level with the maximum probability as the target complexity level.

On the basis of any of the above embodiments, when the processing module 602 performs fusion on the first feature vector and the second feature vector to obtain a fused feature vector, the processing module is configured to:

On the basis of any of the above embodiments, when acquiring the traffic scene data to be ranked, the acquiring module 601 is configured to:

acquiring the two-dimensional image data by using a camera; and/or

And acquiring the three-dimensional point cloud data by adopting a laser radar sensor.

On the basis of any of the above embodiments, the apparatus further includes a training module configured to:

acquiring training data and an initial traffic scene complexity classification model, wherein the training data is traffic scene data with labeled complexity grades;

and training the initial traffic scene complexity classification model according to the training data to obtain the traffic scene complexity classification model.

The device for evaluating the environment cognitive ability of the autonomous vehicle provided by the embodiment may be specifically configured to execute the method embodiment provided by the above-mentioned drawing, and specific functions are not described herein again.

The evaluation device for the environment cognitive ability of the automatic driving vehicle obtains traffic scene data to be graded, wherein the traffic scene data comprises two-dimensional image data and/or three-dimensional point cloud data of a traffic scene; processing the traffic scene data through a traffic scene complexity classification model to extract a feature vector according to the two-dimensional image data and/or the three-dimensional point cloud data, and classifying the traffic scene data according to the feature vector to obtain a target complexity grade corresponding to the traffic scene data; and evaluating the environment cognitive ability of the automatic driving vehicle according to the traffic scene data and the target complexity level corresponding to the traffic scene data. According to the method provided by the embodiment, the complexity grading can be performed on any traffic scene data comprising two-dimensional image data and/or three-dimensional point cloud data through the traffic complexity classification model, the method is applicable to the complexity grading of various traffic scene data, the complexity grading is accurate, the distinguishing degree is good, and therefore the environment cognitive ability of the automatic driving vehicle can be more accurately evaluated based on the traffic scene data and the complexity grade of the traffic scene data.

According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.

As shown in fig. 7, the method is a block diagram of an electronic device according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.

As shown in fig. 7, the electronic apparatus includes: one or more processors 701, a memory 702, and interfaces for connecting the various components, including a high-speed interface and a low-speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). In fig. 7, one processor 701 is taken as an example.

The memory 702 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the method for evaluating environmental awareness of an autonomous vehicle provided herein. The non-transitory computer-readable storage medium of the present application stores computer instructions for causing a computer to execute the method for evaluating environmental awareness of an autonomous vehicle provided by the present application.

The memory 702, as a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the method for evaluating environmental awareness capabilities of an autonomous vehicle in the embodiment of the present application (for example, the obtaining module 601, the processing module 602, and the evaluation module 603 shown in fig. 6). The processor 701 executes various functional applications of the server and data processing by running non-transitory software programs, instructions and modules stored in the memory 702, so as to implement the method for evaluating the environment awareness ability of the autonomous driving vehicle in the above method embodiment.

The memory 702 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to use of the electronic device of the autonomous vehicle environment recognition capability evaluation method, and the like. Further, the memory 702 may include high speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 702 may optionally include memory located remotely from the processor 701, and such remote memory may be connected to the electronics of the autonomous vehicle environmental awareness assessment method via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the method for evaluating the environmental awareness ability of the autonomous vehicle may further include: an input device 703 and an output device 704. The processor 701, the memory 702, the input device 703 and the output device 704 may be connected by a bus or other means, and fig. 7 illustrates an example of a connection by a bus.

The input device 703 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic apparatus of the automated driving vehicle environment cognition ability evaluation method, such as an input device of a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointing stick, one or more mouse buttons, a track ball, a joystick, or the like. The output devices 704 may include a display device, auxiliary lighting devices (e.g., LEDs), and tactile feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

According to the technical scheme of the embodiment of the application, traffic scene data to be graded are obtained, wherein the traffic scene data comprise two-dimensional image data and/or three-dimensional point cloud data of a traffic scene; processing the traffic scene data through a traffic scene complexity classification model to extract a feature vector according to the two-dimensional image data and/or the three-dimensional point cloud data, and classifying the traffic scene data according to the feature vector to obtain a target complexity grade corresponding to the traffic scene data; and evaluating the environment cognitive ability of the automatic driving vehicle according to the traffic scene data and the target complexity level corresponding to the traffic scene data. According to the method provided by the embodiment, the complexity grading can be performed on any traffic scene data comprising two-dimensional image data and/or three-dimensional point cloud data through the traffic complexity classification model, the method is applicable to the complexity grading of various traffic scene data, the complexity grading is accurate, the distinguishing degree is good, and therefore the environment cognitive ability of the automatic driving vehicle can be more accurately evaluated based on the traffic scene data and the complexity grade of the traffic scene data.

The application also provides a computer program comprising program code for executing the method for evaluating the environmental awareness capability of the autonomous vehicle according to the above embodiment when the computer program is run by a computer.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.

The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims

1. A method for evaluating the environmental cognition ability of an automatic driving vehicle is characterized by comprising the following steps:

2. The method of claim 1, wherein the two-dimensional image data and three-dimensional point cloud data comprising a traffic scene correspond to the two-dimensional image data and the three-dimensional point cloud data;

the processing the traffic scene data through the traffic scene complexity classification model to extract the feature vector according to the two-dimensional image data and/or the three-dimensional point cloud data comprises the following steps:

3. The method of claim 2, wherein the classifying the traffic scene data according to the feature vector to obtain a target complexity level corresponding to the traffic scene data comprises:

configuring a respective weight for each dimension in the feature vector;

4. The method of claim 3, wherein configuring a respective weight for each dimension in the feature vector comprises:

5. The method of claim 3 or 4, wherein the determining, by a classifier in the traffic scene complexity classification model, a target complexity level corresponding to the traffic scene data according to the feature vector and a weight corresponding to each dimension in the feature vector comprises:

6. The method according to claim 2, wherein the fusing the first feature vector and the second feature vector to obtain a fused feature vector comprises:

7. The method of claim 1, wherein the obtaining traffic scene data to be ranked comprises:

acquiring the two-dimensional image data by using a camera; and/or

8. The method of claim 1, further comprising:

9. An apparatus for evaluating an environment recognition ability of an autonomous vehicle, comprising:

10. The apparatus of claim 9, wherein the traffic scene data comprises two-dimensional image data and three-dimensional point cloud data of a traffic scene, the two-dimensional image data corresponding to the three-dimensional point cloud data;

the processing module is used for processing the traffic scene data through a traffic scene complexity classification model so as to extract a feature vector according to the two-dimensional image data and/or the three-dimensional point cloud data, and is used for:

11. The apparatus of claim 10, wherein the processing module, when classifying the traffic scene data according to the feature vector to obtain a target complexity level corresponding to the traffic scene data, is configured to:

configuring a respective weight for each dimension in the feature vector;

12. The apparatus of claim 11, wherein the processing module, in configuring a respective weight for each dimension in the feature vector, is configured to:

13. The apparatus of claim 11 or 12, wherein the processing module, when determining the target complexity level corresponding to the traffic scene data according to the feature vector and the weight corresponding to each dimension in the feature vector through a classifier in the traffic scene complexity classification model, is configured to:

14. The apparatus of claim 10, wherein the processing module, when fusing the first feature vector and the second feature vector to obtain a fused feature vector, is configured to:

15. The apparatus of claim 9, wherein the obtaining module, when obtaining the traffic scene data to be ranked, is configured to:

acquiring the two-dimensional image data by using a camera; and/or

16. The apparatus of claim 9, further comprising a training module to:

17. An electronic device, comprising:

at least one processor; and

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-8.

18. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method of any one of claims 1-8.