CN110163248B

CN110163248B - Visualization method, visualization device, computer equipment and storage medium for model evaluation

Info

Publication number: CN110163248B
Application number: CN201910278714.1A
Authority: CN
Inventors: 陈飞; 彭绍东; 黎伟杰; 韩旭
Original assignee: WeRide Corp
Current assignee: WeRide Corp
Priority date: 2019-04-09
Filing date: 2019-04-09
Publication date: 2023-07-07
Anticipated expiration: 2039-04-09
Also published as: CN110163248A

Abstract

The application relates to a visualization method, a device, computer equipment and a storage medium for model evaluation, wherein a computer adopts marking information of a test sample set to carry out model evaluation on a prediction result of an unmanned vehicle deep learning model aiming at the test sample set, so as to obtain at least one model evaluation result; the labeling information is used for describing scene information of the test sample set; and displaying a model evaluation result corresponding to the triggering operation and/or process data during model evaluation according to the triggering operation of the user on the interface. By adopting the method, the improvement effect on the unmanned vehicle deep learning model can be improved; further, the computer device can display a corresponding model evaluation result and/or process data during model evaluation according to the triggering operation of the user on the interface, so that the working efficiency of model developers is improved.

Description

Visualization method, visualization device, computer equipment and storage medium for model evaluation

Technical Field

The present disclosure relates to the field of deep learning technologies, and in particular, to a method, an apparatus, a computer device, and a storage medium for visualizing model evaluation.

Background

In the development of deep learning models of unmanned vehicles, model evaluation takes an important part. By evaluating the deep learning model, a developer can obtain more information from the evaluation data, which is helpful for further improving the deep learning model of the unmanned vehicle.

The existing model evaluation method obtains model evaluation data such as confusion matrix and PR curve according to the test result of the deep learning model on the test data set, then determines erroneous judgment data in the test data set according to the model evaluation data, and then evaluates and analyzes the erroneous judgment data; further, the developer invokes the evaluation analysis result to improve the deep learning model.

However, when the deep learning model is evaluated by the method, a model developer needs to query the required evaluation analysis results from a plurality of folders, and the operation is complex, so that the working efficiency is low.

Disclosure of Invention

Based on the foregoing, it is necessary to provide a visualization method, apparatus, computer device and storage medium for model evaluation.

A method of visualizing model assessment, comprising:

Carrying out model evaluation on the prediction result of the unmanned vehicle deep learning model aiming at the test sample set by adopting the marking information of the test sample set to obtain at least one model evaluation result; the labeling information is used for describing scene information of the test sample set;

and displaying a model evaluation result corresponding to the triggering operation and/or process data during model evaluation according to the triggering operation of the user on the interface.

In one embodiment, the performing model evaluation on the prediction result of the unmanned vehicle deep learning model for the test sample set by using the labeling information of the test sample set to obtain at least one model evaluation result includes:

obtaining a prediction result of the unmanned vehicle deep learning model on the test sample set, and calculating model evaluation data of the unmanned vehicle deep learning model according to the prediction result;

and carrying out statistical feature analysis on the model evaluation data based on the labeling information to obtain a model evaluation result.

In one embodiment, the process data at the time of model evaluation includes at least one of model evaluation data, labeling information, and a set of test samples.

In one embodiment, the displaying the model evaluation result corresponding to the triggering operation according to the triggering operation of the user on the interface, and/or the process data during the model evaluation includes:

Acquiring triggering operation of a user in an evaluation task list; the evaluation tasks in the evaluation task list correspond to the model evaluation results obtained according to the execution of the model evaluation and/or process data during the model evaluation;

and displaying the model evaluation result and/or the process data during model evaluation corresponding to the triggered evaluation task.

In one embodiment, the model evaluation data includes a confusion matrix; each cell data of the confusion matrix is associated with a test sample corresponding to the cell data; and each cell data of the confusion matrix is associated with a model evaluation result obtained based on the cell data and/or process data at the time of model evaluation; according to the triggering operation of the user on the interface, displaying the model evaluation result corresponding to the triggering operation, and/or, the process data during model evaluation includes:

acquiring triggering operation of a user on a cell of a currently displayed confusion matrix;

in response to the triggering operation, display model evaluation results associated with the cell data and/or process data at the time of model evaluation associated with the cell data.

In one embodiment, the process data during model evaluation associated with the cell data includes a statistical analysis table obtained when the cell data is subjected to statistical feature analysis based on the labeling information; the statistical analysis table comprises the corresponding sample number of each value of the labeling information; the displaying the process data at the time of model evaluation associated with the cell data further includes:

Acquiring the value of the labeling information selected by a user in the statistical analysis table;

and displaying a target test sample corresponding to the value of the selected marking information in the test samples related to the cell data.

In one embodiment, before the target test sample corresponding to the value of the selected labeling information, the test sample associated with the display cell data further includes:

combining target test samples corresponding to the values of the selected marking information;

if the same test sample exists in the combined target test samples, one of the test samples is reserved.

In one embodiment, the performing statistical feature analysis on the model evaluation data based on the labeling information to obtain a model evaluation result includes:

selecting a test error sample set from the test sample set according to the model evaluation data; the test error sample set comprises test samples corresponding to the error prediction result;

and respectively carrying out statistical feature analysis on the test sample set and the test error sample set based on the values of the labeling information, and determining a model evaluation result.

In one embodiment, the performing statistical feature analysis on the test sample set and the test error sample set based on the values of the labeling information includes:

Calculating a first proportion of each value-corresponding test sample in the test sample set to the test sample set based on the value of the labeling information;

calculating a second proportion of the test samples corresponding to each value in the test error sample set to the test error sample set based on the value of the labeling information;

and calculating the significance of each value of the labeling information according to the first proportion and the second proportion, wherein the significance is used for representing the influence degree of the value of the labeling information on the prediction result of the unmanned vehicle deep learning model.

In one embodiment, the labeling information includes at least one of a collection time of the test sample set, weather information when the test sample set is collected, and position information of the collected test sample set.

In one embodiment, when the labeling information includes collecting position information of the test sample set, before the statistical feature analysis is performed on the test sample set and the test error sample set based on the values of the labeling information, the method further includes:

clustering the adjacent position information according to each position information in the test sample set to obtain at least one clustering coordinate;

and determining at least one cluster coordinate as a value of the labeling information, wherein one cluster coordinate corresponds to one value of the labeling information.

determining at least one path corresponding to each cluster coordinate in the unmanned vehicle running map based on a preset unmanned vehicle running map;

determining at least one path as the value of the labeling information; one path corresponds to one value of the labeling information.

A visualization apparatus for model evaluation, comprising:

the evaluation module is used for carrying out model evaluation on the prediction result of the unmanned vehicle deep learning model aiming at the test sample set by adopting the marking information of the test sample set to obtain at least one model evaluation result; the labeling information is used for describing scene information of the test sample set;

and the display module is used for displaying a model evaluation result corresponding to the triggering operation and/or process data during model evaluation according to the triggering operation of the user on the interface.

A computer device comprising a memory storing a computer program and a processor implementing the steps of the above described method of visualizing a model evaluation when the processor executes the computer program.

A computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of a visualization method of model evaluation as described above.

The visualization method, the visualization device, the computer equipment and the storage medium for model evaluation are characterized in that the computer adopts the marking information of the test sample set to carry out model evaluation on the prediction result of the unmanned vehicle deep learning model aiming at the test sample set, and at least one model evaluation result is obtained; the labeling information is used for describing scene information of the test sample set; and displaying a model evaluation result corresponding to the triggering operation and/or process data during model evaluation according to the triggering operation of the user on the interface. Because the computer equipment adopts the marking information of the test sample set to carry out model evaluation on the unmanned vehicle deep learning model aiming at the prediction result of the test sample set, the influence of different marking information of the test sample on the unmanned vehicle deep learning model can be analyzed, so that model developers can combine the specific scene of the test sample to improve the model, and the improvement effect on the unmanned vehicle deep learning model is improved; further, the computer device can display the corresponding model evaluation result and/or process data during model evaluation according to the triggering operation of the user on the interface, so that a model developer can select information such as the model evaluation result to be searched through the display interface, and the working efficiency is improved.

Drawings

FIG. 1 is an application environment diagram of a visualization method of model evaluation in one embodiment;

FIG. 2 is a flow diagram of a method of visualizing model evaluation in one embodiment;

FIG. 2A is a schematic diagram of model evaluation data in one embodiment;

FIG. 3 is a flow chart of a visualization method of model evaluation in another embodiment;

FIG. 4 is a flow chart of a visualization method of model evaluation in another embodiment;

FIG. 5 is a flow chart of a visualization method of model evaluation in another embodiment;

FIG. 5A is a schematic diagram of a statistical signature analysis process in one embodiment;

FIG. 6 is a flow chart of a visualization method of model evaluation in another embodiment;

FIG. 7 is a flow chart of a visualization method of model evaluation in another embodiment;

FIG. 7A is a schematic diagram of a statistical signature analysis process in one embodiment;

FIG. 8 is a flow chart of a visualization method of model evaluation in another embodiment;

FIG. 9 is a flow chart of a visualization method of model evaluation in another embodiment;

FIG. 10 is a block diagram of a visualization device for model evaluation in one embodiment;

FIG. 11 is a block diagram of another embodiment of a visualization device for model evaluation;

FIG. 12 is a block diagram of another embodiment of a visualization device for model evaluation;

FIG. 13 is a block diagram of another embodiment of a visualization device for model evaluation;

FIG. 14 is a block diagram of another embodiment of a visualization device for model evaluation;

FIG. 15 is a block diagram of another embodiment of a visualization device for model evaluation;

FIG. 16 is a block diagram of another embodiment of a visualization device for model evaluation;

FIG. 17 is a block diagram of another embodiment of a visualization device for model evaluation;

fig. 18 is an internal structural view of a computer device in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

The visualization method for model evaluation provided by the application can be applied to an application environment shown in fig. 1. After the unmanned vehicle 100 collects the training sample set 110, the unmanned vehicle deep learning model 120 analyzes and processes the training sample set 110 to obtain a prediction result 130; the computer device 140 may evaluate the unmanned vehicle deep learning model 120 according to the prediction result 130, and display information such as model evaluation results. The computer device 140 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, etc.

In one embodiment, as shown in fig. 2, a visualization method for model evaluation is provided, which is described by taking an example that the method is applied to the computer device in fig. 1, and includes:

s101, carrying out model evaluation on a prediction result of a test sample set by using label information of the test sample set to obtain at least one model evaluation result; the annotation information is used for describing scene information of the test sample set.

The unmanned vehicle deep learning model may be a neural network model or a convolutional network model, and the type of the deep learning model is not limited herein. The test sample set may be a set of two-dimensional pictures, or may be a set of three-dimensional images, for example, a set of point cloud data, and the types of the test sample set are not limited herein. For example, the unmanned vehicle runs on the road to collect a large number of two-dimensional images to form a test sample set, wherein the test sample set can comprise images collected by the unmanned vehicle during automatic driving or images collected under a state of driving by a person; the computer equipment inputs the images into a deep learning model of the unmanned vehicle, and the test sample set is analyzed through the deep learning model to obtain a prediction result. The prediction result may be to identify the pedestrian position in the road, or to identify the color or direction of the traffic light in the image, such as a left turn light or a right turn light, or to identify the duration of the display light of the traffic light, such as to identify the duration of the red light in the picture as 20 seconds; the prediction result may be represented by a bounding box (bounding box), or may be represented by a confidence level, and the type of the prediction result is not limited herein.

The labeling information may be a camera parameter used when the test sample is collected, a form speed of the unmanned vehicle when the test sample is collected, a quality of a picture or a point cloud of the test sample set, a degree of distinguishing the picture or the point cloud, and the like, and the type of the labeling information is not limited herein. Optionally, the labeling information may further include at least one of a collection time of the test sample set, weather information when the test sample set is collected, and position information of the collected test sample set.

Specifically, when model evaluation is carried out on the prediction results of the unmanned vehicle deep learning model aiming at the test sample set by adopting the marking information of the test sample set, the computer equipment can analyze what influence different marking information has on the prediction results output by the unmanned vehicle deep learning model through a data analysis algorithm; optionally, the computer device may further obtain a prediction result of the unmanned vehicle deep learning model on the test sample set, and calculate model evaluation data of the unmanned vehicle deep learning model according to the prediction result; and then, carrying out statistical feature analysis on the model evaluation data based on the labeling information to obtain a model evaluation result.

The model evaluation data refers to parameters capable of reflecting the prediction capability of the unmanned vehicle deep learning model, and may be a confusion matrix, a P-R curve, or the like, and the type of the model evaluation data is not limited herein. The confusion matrix is used for measuring model accuracy and is mainly used for comparing model prediction results with real information of test samples, each row in the matrix represents the prediction results of the test sample set, each column represents the real information of the test sample set, and cell data in the matrix are the number of the test samples with different prediction types; taking the confusion matrix of the traffic light prediction result shown in fig. 2A as an example, the data of the cells in the first column of the third row in the confusion matrix is 10, the row where the cells are located represents that the prediction result is yellow light, and the row where the cells are located represents that the real information is red light, so that the unmanned vehicle deep learning model predicts red light samples as the number of yellow light samples to be 10. The above-mentioned P-R curve is used to represent the relationship between the precision rate P and the recall rate R, where the precision rate P represents the ratio of the number of correctly predicted positive samples to the number of test samples, and the recall rate R represents the ratio of the number of correctly predicted positive samples to the number of actual positive samples; for example, when the unmanned vehicle deep learning model predicts green lights in test samples, all test samples containing the green lights are positive samples, and other test samples are negative samples; when the total number of test samples is 100 and the number of positive samples containing green lights is 30, if 20 samples are positive samples in the predicted samples corresponding to 25 green lights obtained by the unmanned vehicle deep learning model, the accuracy rate is 20/25=80% and the recall rate is 20/30=67%.

Specifically, after obtaining a prediction result obtained by the unmanned vehicle deep learning model, the computer equipment can compare the prediction result of the test sample set with real information of the test sample to obtain the confusion matrix; the computer equipment can also process the cell data in the confusion matrix, for example, the accuracy rate, the recall rate and the like are further obtained, and a P-R curve and the like are obtained; the calculation manner of the model evaluation data may be determined according to the type of the model evaluation data, which is not limited herein.

Further, when the computer equipment performs statistical characteristic analysis on the model evaluation data based on the labeling information, the model evaluation data can be analyzed based on one type of labeling information, and can also be analyzed based on a plurality of types of labeling information; in addition, the computer equipment can analyze based on one type of marking information respectively, and then continue to perform statistical feature analysis on the analysis result; this is not limited. For example, the computer device may analyze model evaluation data based on daytime acquisition of a test sample set; it may also be based on a test sample set acquired during the day and when the weather is sunny. The computer device may use a data science tool to analyze the model evaluation data when performing statistical feature analysis, where the data science tool may be a pandas framework or an R framework, and the specific method of the statistical feature analysis is not limited herein.

On the basis of the steps, the computer equipment can obtain a model evaluation result, and the model evaluation result can be marking information which is used for prompting a model developer that the marking information has a great influence on the accuracy of the unmanned vehicle deep learning model; the error phenomenon possibly caused by the labeling information affecting the accuracy of the unmanned vehicle deep learning model can also be considered, for example, the unmanned vehicle deep learning model is easy to predict a yellow light test sample acquired at night as a green light; the specific form of the above model evaluation result is not limited herein.

S102, according to the triggering operation of the user on the interface, displaying a model evaluation result corresponding to the triggering operation and/or process data during model evaluation.

The computer device may interact with the user via the interface, and when the user performs a triggering operation on the interface, the model evaluation result corresponding to the triggering operation and/or process data during the model evaluation are displayed. The interface may be a web page, or may be a result display interface in a model evaluation tool, and the type of the interface is not limited herein. The triggering operation may be a clicking operation or a dragging operation, and the type of the triggering operation is not limited herein. When the user performs the triggering operation on the interface, the triggering operation can be performed on the corresponding evaluation task during each model evaluation, or the triggering operation can be performed on different training sample sets, which is not limited herein.

The triggering operation may correspond to all the model evaluation results obtained in the above step, and/or process data during model evaluation, or may correspond to different model evaluation results according to different positions triggered by the user, and/or process data during model evaluation.

The process data during the model evaluation may include an analysis process of the prediction result by the computer device during the model evaluation, for example, a data statistics process which may be presented in a tabular manner; alternatively, the process data at the time of model evaluation may include at least one of model evaluation data, labeling information, and a test sample set.

Specifically, when the computer device displays the model evaluation result and/or process data during model evaluation, all the results corresponding to the triggering operation can be displayed, and interaction with the user can be continued through different levels of display modes, so that the user selects the expected model evaluation result and/or process data during model evaluation; the manner of the display is not limited herein. The computer device may display the corresponding model evaluation result through the skip page after the user performs the trigger operation, and/or the process data during the model evaluation may also be displayed through a pop-up window manner or a floating window manner, which is not limited.

According to the visualization method for model evaluation, the computer adopts the labeling information of the test sample set to perform model evaluation on the prediction result of the unmanned vehicle deep learning model aiming at the test sample set, so as to obtain at least one model evaluation result; the labeling information is used for describing scene information of the test sample set; and displaying a model evaluation result corresponding to the triggering operation and/or process data during model evaluation according to the triggering operation of the user on the interface. Because the computer equipment adopts the marking information of the test sample set to carry out model evaluation on the unmanned vehicle deep learning model aiming at the prediction result of the test sample set, the influence of different marking information of the test sample on the unmanned vehicle deep learning model can be analyzed, so that model developers can combine the specific scene of the test sample to improve the model, and the improvement effect on the unmanned vehicle deep learning model is improved; further, the computer device can display the corresponding model evaluation result and/or process data during model evaluation according to the triggering operation of the user on the interface, so that a model developer can select information such as the model evaluation result to be searched through the display interface, and the working efficiency is improved.

FIG. 3 is a flow chart of a visualization method of model evaluation in another embodiment; the present embodiment relates to a specific manner in which the computer device displays the model evaluation result corresponding to the trigger operation and/or the process data during the model evaluation, and on the basis of the foregoing embodiment, as shown in fig. 3, the foregoing S102 includes:

s201, acquiring triggering operation of a user in an evaluation task list; the evaluation tasks in the evaluation task list correspond to model evaluation results obtained according to execution model evaluation and/or process data during model evaluation.

The computer device may store the model evaluation result obtained by the current model evaluation and/or process data obtained by performing the current model evaluation according to the evaluation tasks after performing the model evaluation; in addition, the evaluation task may further include data such as parameter settings when the computer device performs model evaluation, and the type of data corresponding to the evaluation task is not limited herein.

Specifically, the evaluation task may be displayed on an interface in the form of a table, or may be displayed on the interface in the form of an icon; the display mode of the evaluation task is not limited here, and information such as the model evaluation time corresponding to the execution task, the person in charge of the evaluation task, and the version number of the corresponding unmanned vehicle deep learning model may be displayed in combination.

When a user selects a desired evaluation task from the evaluation task list, the selection may be performed by double-clicking one row in the table corresponding to the evaluation task, or may be performed by selecting a checkbox carried in the evaluation task list, and the triggering manner of the evaluation task is not limited herein.

S202, displaying a model evaluation result corresponding to the triggered evaluation task and/or process data during model evaluation.

After the computer equipment acquires the triggering operation, the computer equipment can determine an evaluation task required by a user according to the triggering operation, and then display a model evaluation result corresponding to the evaluation task and/or process data during model evaluation. The display manner is similar to that described in S102, and will not be described again here.

According to the visualization method for model evaluation, the computer equipment can display the model evaluation result corresponding to the evaluation task and/or the process data during model evaluation through the user selecting the required evaluation task from the evaluation task list, so that the user does not need to search from the folder of the computer equipment, and the working efficiency of the user is improved.

FIG. 4 is a flow chart of a visualization method of model evaluation in another embodiment; the present embodiment relates to another specific manner in which the computer device displays the model evaluation result corresponding to the trigger operation and/or the process data during the model evaluation, and on the basis of the foregoing embodiment, as shown in fig. 4, the foregoing S102 includes:

S301, acquiring triggering operation of a user on a cell of a currently displayed confusion matrix.

The model evaluation data acquired by the computer equipment comprises an confusion matrix, wherein each piece of cell data of the confusion matrix is associated with a test sample corresponding to the cell data; and is associated with the model evaluation results obtained based on the cell data and/or the model evaluation-time process data. Specifically, after the computer device performs statistical feature analysis on the cell data of the confusion matrix based on the labeling information, the obtained model evaluation result and the cell data can be associated, and the association relationship between the cell data and the model evaluation result is determined; in addition, the computer device may further associate the process data in the model evaluation with each cell, and determine an association relationship between the process data and the cell data, for example, each cell data is associated with a test sample corresponding to the cell data.

Continuing to take the confusion matrix of the traffic light prediction result shown in fig. 2A as an example; after the computer equipment performs statistical feature analysis on the three cell data of the first row, the obtained model evaluation result A can be respectively associated with the three cell data of the first row; in addition, the computer device may further perform statistical feature analysis on the three cell data in the third column, and correlate the obtained model evaluation result B with the three cell data in the first column respectively; the same cell data may be associated with different model evaluation results, and different cell data may also be associated with the same model evaluation result, which is not limited herein. After associating the model evaluation result with each cell of the confusion matrix, the computer device may display the confusion matrix on an interface, and add a response to each cell data of the confusion matrix, for example, set the cell as a virtual control; after the user triggers the cell data, the computer device may acquire the triggering operation, and determine which cell data is to be checked by the user, which is associated with the model evaluation result, and other information.

S302, in response to the triggering operation, displaying a model evaluation result associated with the cell data and/or process data during model evaluation associated with the cell data.

Further, the computer device may determine, in response to the triggering operation, cell data selected by the user according to the triggering operation, and then display, based on an association between the cell data and model evaluation data and/or an association between the cell data and process data, a model evaluation result associated with the cell data selected by the user and/or process data associated with the model evaluation associated with the cell data.

According to the visualization method for model evaluation, the computer equipment interacts with the user through the cell data of the confusion matrix, and the cell data selected by the user is obtained to display the model evaluation result associated with the cell data and/or the process data associated with the cell data during model evaluation, so that the user can directly select information such as the model evaluation result to be checked according to the cell data, and the work efficiency is further improved.

FIG. 5 is a flow chart of a visualization method of model evaluation in another embodiment; the present embodiment relates to another specific manner in which the computer device displays the model evaluation result corresponding to the trigger operation and/or the process data during the model evaluation, and on the basis of the foregoing embodiment, as shown in fig. 5, the foregoing S302 includes:

S401, obtaining the value of the labeling information selected by the user in the statistical analysis table.

After the user passes through the cell data of the confusion matrix, the computer equipment can display the process data related to the cell data, wherein the process data during the model evaluation related to the cell data can comprise a statistical analysis table obtained when the statistical feature analysis is performed on the cell data based on the labeling information; the statistical analysis table comprises the corresponding sample number of each value of the labeling information. Continuing with the example of the confusion matrix of the traffic light prediction result shown in fig. 2A, the statistical analysis table may include, as shown in fig. 5A, each value of the labeling information, the number of samples corresponding to each value, and data such as a percentage obtained in the statistical analysis process.

The user may select information such as a model evaluation result corresponding to the value of the labeling information to be checked through the statistical analysis table, may select the value of one labeling information, or may simultaneously select the values of a plurality of labeling information, which is not limited herein. The computer equipment can acquire the value of each piece of labeling information selected by the user according to the triggering operation of the user.

S402, displaying a target test sample corresponding to the value of the selected marking information in the test samples related to the cell data.

Specifically, after obtaining the value of the labeling information selected by the user, the computer device may extract, from the test samples associated with the unit cell, a target test sample corresponding to the value of the labeling information. For example, the user selects the cell data with the real information being yellow light and the predicted result being green light, and the computer device displays the statistical analysis table in fig. 5A; when the user selects the test error sample set through the statistical analysis table and the value of the labeling information is night, the computer equipment can determine 20 test samples with the collection time of night as target test samples from 30 test samples associated with the unit. The computer device may determine, as the target test sample, all test samples corresponding to the value of the selected labeling information among the test samples associated with the cell data, or may determine one test sample or a part of the test samples as the target test sample, which is not limited herein.

Further, the user may select the values of the plurality of labeling information at the same time, the number of test samples corresponding to the values of the plurality of labeling information may be relatively large, and there may be a case that the test samples are repeated; before the computer equipment displays the target test sample, the target test sample can be processed, and the target test sample corresponding to the value of each selected marking information can be combined; if the same test sample exists in the combined target test samples, one of the test samples is reserved. For example, the computer device may select two values of the annotation information, one of which is daytime and the other of which is sunny; the computer equipment can respectively acquire a group of target test samples corresponding to the values of the two labeling information, and then combine the two groups of target test samples; since there may be two identical target test samples in the merged target test sample, the computer device may delete one of the target test samples.

The computer equipment can display the target test sample nearby the confusion matrix on the basis of acquiring the target test sample, for example, the picture is displayed at the right side of the display interface in a floating manner; the target test sample may be extracted into another display window for centralized display, and the display method is not limited herein.

According to the visualization method for model evaluation, the computer equipment can enable a model developer to intuitively see the characteristics of the test sample with the misprediction by displaying the target test sample, and can also compare the test sample with the misprediction with the test sample with the misprediction, so that the model developer can be facilitated to improve the model.

FIG. 6 is a flow chart of a visualization method of model evaluation in another embodiment; the embodiment relates to a specific manner in which the computer device performs statistical feature analysis on model evaluation data based on the labeling information, and on the basis of the embodiment, as shown in fig. 6, the performing statistical feature analysis on the model evaluation data based on the labeling information to obtain a model evaluation result includes:

s501, selecting a test error sample set from the test sample set according to model evaluation data; the test error sample set comprises test samples corresponding to the error prediction result.

Specifically, the computer equipment can analyze the test sample with the error prediction of the unmanned vehicle deep learning model according to the model evaluation data, and select a test error sample set; the test sample set may be samples corresponding to all the misprediction results in the test sample set, or may select a part of test samples corresponding to the misprediction structure according to the model evaluation data; the selection method of the test error sample set is not limited herein.

The test error sample set may be one sample set selected according to the model evaluation data, or may be a plurality of sample sets selected according to a plurality of data in the model evaluation data, and the type of the test error sample set is not limited herein.

Taking the confusion matrix of the traffic light prediction results shown in fig. 2A as an example, the 4 cell data in the confusion matrix correspond to the error prediction results, the computer device may select a test error sample set according to each cell data, for example, the computer device may select a test sample corresponding to the cell data with the largest value in the 4 cell data as the test error sample set, and use 40 test samples with the prediction result being green light and the real information being red light as the test error sample set; in addition, the computer device may also select that the real information is yellow, and the test samples corresponding to the two

cell data

20 and 30 of the predicted result is red and green are test error sample sets, and the computer device may determine the above 50 test samples as one test error sample set, or may determine that the 20 test samples of the real information is red, the predicted result is red, and the 30 test samples of the real information is yellow, the predicted result is green, as one test error sample set, that is, the computer device selects two test error sample sets.

S502, carrying out statistical feature analysis on the test sample set and the test error sample set based on the values of the labeling information respectively, and determining a model evaluation result.

The value of the annotation information is different scene information contained in the type of the annotation information. For example, when the labeling information is the acquisition time, the value of the labeling information can be a specific acquisition time, such as eight-point integer, or daytime or evening; when the labeling information is weather information, the value of the labeling information can comprise sunny days, rainy days or haze indexes; when the labeling information is position information, the value of the labeling information may be a coordinate position at the time of collecting the test sample, or the value of the position information may be information obtained by processing the coordinate position, for example, may be a city, a road name, or the like.

In addition, the labeling information can be valued to include one type of labeling information, for example, the labeling information is valued in daytime and evening; the value of the labeling information can also comprise various types of labeling information, for example, when the labeling information type comprises acquisition time and weather information, the value of the labeling information can be daytime-sunny day, daytime-cloudy day, evening-sunny day and evening-cloudy day; the form of the value of the labeling information is not limited herein. The computer device may analyze model evaluation data corresponding to the test sample collected in the daytime and the test sample collected in the evening to determine whether the difference in collection time has an effect on the accuracy of the unmanned vehicle deep learning model, and the specific statistical analysis method is similar to that in S102, which is not limited herein.

According to the model analysis method, the computer equipment respectively performs statistical feature analysis on the test sample set and the test error sample set based on the values of the marking information, so that the computer equipment is beneficial to obtaining the statistical features of the test error sample set in terms of the values of different marking information, determining a model evaluation result, and further improving the deep learning model.

FIG. 7 is a flow chart of a visualization method of model evaluation in another embodiment; the embodiment relates to a specific manner in which the computer device performs statistical feature analysis on the test sample set and the test error sample set based on the values of the labeling information, where, based on the foregoing embodiment, as shown in fig. 7, S502 includes:

s601, calculating the first proportion of the test sample corresponding to each value in the test sample set to the test sample set based on the value of the labeling information.

S602, calculating a second proportion of the test samples corresponding to the values in the test error sample set to the test error sample set based on the values of the labeling information.

S603, calculating the significance of each value of the labeling information according to the first proportion and the second proportion.

Specifically, the computer device may calculate, based on the values of the labeling information, a first proportion of the test samples corresponding to the values in the test sample set, and a second proportion of the test samples corresponding to the values in the test error sample set.

Continuing taking the confusion matrix of the traffic light prediction result shown in fig. 2A as an example, taking a test sample set with real information of yellow light as an example, the number of the test sample sets is 1000, wherein the collection time of 720 test samples is daytime, and the collection time of 280 test samples is evening; the computer equipment selects 30 test samples with the prediction result of green lamps as test error sample sets, wherein in the test error sample sets, the collection time of 10 test samples is daytime, and the collection time of 20 test samples is evening; the computer performs statistical feature analysis processing on the model evaluation data to obtain the first proportion as follows: the test sample in the daytime accounts for 72% and the test sample in the evening accounts for 28%; meanwhile, the second ratio is that in the test error sample set, the test sample in the daytime is 33% and the test sample in the evening is 67%, and the analysis process is shown in fig. 7A.

The computer device may calculate the significance of each value of the labeling information according to the first proportion and the second proportion on the basis of obtaining the first proportion and the second proportion. The significance is used for representing the influence degree of the value of the labeling information on the prediction result of the unmanned vehicle deep learning model. The significance may be a probability that the first ratio and the second ratio in the test sample set and the erroneous test sample set are different due to the difference in the values of the labeling information. The computer device may obtain the significance level through the data science tool in S102.

By continuing to take the confusion matrix of the traffic light prediction result shown in fig. 2A as an example, by analyzing the first proportion and the second proportion, the computer equipment can obtain that only 28% of test samples in the test sample set are collected at night and 67% of test samples in the test error sample set are collected at night, so that when the value of the labeling information is at night, the prediction accuracy of the unmanned vehicle deep learning model is obviously affected, and the unmanned vehicle deep learning model easily predicts the yellow light test sample collected at night as a green light.

Further, after the computer device obtains the saliency of each value, the value with the highest saliency can be determined as a factor affecting the evaluation result of the unmanned vehicle deep learning model according to the saliency of each value of the labeling information. For example, the value of the labeling information can be taken as night, and the value can be determined as a factor of the evaluation result of the unmanned vehicle deep learning model.

According to the visualization method for model evaluation, the computer equipment calculates a test sample set and a test error sample set based on the values of the labeling information, wherein the first proportion and the second proportion of the test samples corresponding to the values are calculated; according to the first proportion and the second proportion, the influence of which value has on the prediction result of the unmanned vehicle deep learning model is more accurately determined, so that a model developer improves the model according to the value of the marked information, and the improvement effect on the unmanned vehicle deep learning model is improved.

FIG. 8 is a flow chart of a visualization method of model evaluation in another embodiment; the present embodiment relates to a case where the labeling information includes position information of the collected test sample set, and, based on the above embodiment, as shown in fig. 5, the method further includes, before S502:

s701, clustering adjacent position information according to each position information in the test sample set to obtain at least one clustering coordinate.

Specifically, when the unmanned vehicle collects the test sample set, multiple times of collection may be performed at the same position, for example, image information of the position may be collected through different angles, so that multiple images may exist in the test sample set and are collected at the same position, and the computer device may perform clustering processing on adjacent position information to obtain at least one clustering coordinate. When the computer equipment performs clustering processing on the position information, a distance threshold value between different position information can be set, and when the distance between the position information of two test samples is smaller than the distance threshold value, the computer equipment can consider that the positions corresponding to the two test samples are the same, and then clusters the position information of the two test samples to obtain a cluster coordinate.

When the computer equipment performs clustering processing on the position information of a plurality of test samples, one of the position information of the plurality of test samples can be determined as a clustering coordinate, or the coordinates of the position information of the plurality of test samples can be averaged, and then the averaged coordinates are determined as the clustering coordinate; the manner of obtaining the cluster coordinates is not limited herein.

S702, determining at least one cluster coordinate as a value of the labeling information, wherein one cluster coordinate corresponds to one value of the labeling information.

Specifically, the computer device may determine the above-mentioned cluster coordinates as the values of the labeling information, so that the computer device may perform statistical feature analysis on the model evaluation data based on each cluster coordinate. For example, the computer device may determine that test samples in the test sample set that correspond to those cluster coordinates are susceptible to misprediction by the unmanned vehicle deep learning model.

Further, the computer device may display the above cluster coordinates through a map.

According to the visualization method for model evaluation, the computer equipment determines the cluster coordinates as the values of the labeling information, so that the computer equipment can analyze which test samples at the positions are easy to be mispredicted by the unmanned vehicle deep learning model according to the cluster coordinates, a model developer can acquire the test samples at the positions again to train the model, and the accuracy of model prediction is improved.

FIG. 9 is a flow chart of a visualization method of model evaluation in another embodiment; the present embodiment relates to another method when the labeling information includes collecting position information of the test sample set, and on the basis of the above embodiment, as shown in fig. 9, the method further includes, before S502:

s801, clustering adjacent position information according to each position information in the test sample set to obtain at least one clustering coordinate.

Specifically, the manner of obtaining the cluster coordinates is similar to that described in S401, and will not be described here.

S802, determining at least one path corresponding to each cluster coordinate in the unmanned vehicle running map based on the preset unmanned vehicle running map.

After the computer equipment obtains the cluster coordinates, the cluster coordinates can be corresponding to corresponding coordinate points in the unmanned vehicle running map based on a preset unmanned vehicle map, and the cluster coordinates are determined to belong to one path in the map; the path determined by the computer device may be the number of the path, the coordinate range of the path, or the name of the path, which is not limited herein.

S803, determining at least one path as the value of the labeling information; one path corresponds to one value of the labeling information.

The computer device may determine the paths as values of the labeling information, so that the computer device may perform statistical feature analysis on model evaluation data based on each path. For example, the computer device may determine that test samples corresponding to those paths in the test sample set are susceptible to misprediction by the unmanned vehicle deep learning model.

Further, the computer device may determine, after performing statistical feature analysis on the model evaluation data based on each path, a path having the highest degree of significance as a target path affecting the evaluation result of the unmanned vehicle deep learning model, and determine, based on the unmanned vehicle traveling map, a route including the target path as a route to be optimized. Further, the computer device may display the above-mentioned paths on a map, or may display a route including the target path, and distinguish different routes by setting different colors.

Continuing to take the confusion matrix of the traffic light prediction result shown in fig. 2A as an example, clustering each position information of a test sample set by using computer equipment, determining a path corresponding to a clustering coordinate based on a preset unmanned vehicle running map, determining a value of labeling information as a "family charm road", "Zhongshan road", "Wushan road" and "Beijing road", and carrying out statistical feature analysis on cell data in the confusion matrix based on each path to obtain a first proportion of the test sample corresponding to each path to the test sample set and a second proportion of the test sample corresponding to each path to the test error sample set, wherein in the test sample set, the proportion of the test sample corresponding to the "family charm road" to the test sample set is 10%, and the proportion of the test sample corresponding to the "family charm road" to the test error sample set is 40%, that is, the unmanned vehicle deep learning model is easy to predict errors of the test sample collected in the "family charm road", and the path can be determined as a target path affecting the unmanned vehicle deep learning model evaluation result; further, the computer device may determine a route containing "charm route" as the route to be optimized.

According to the visualization method for model evaluation, the computer equipment determines the paths as the values of the labeling information, so that the computer equipment can analyze which paths of test samples are easy to be mispredicted by the unmanned vehicle deep learning model according to the paths, a model developer can acquire the paths of the test samples again to train the model, accuracy of model prediction is improved, and meanwhile, the computer equipment can enable the unmanned vehicle to avoid the target paths in a biased manner when planning the paths by determining the paths containing the target paths as the paths to be optimized.

It should be understood that, although the steps in the flowcharts of fig. 2-9 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in fig. 2-9 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the sub-steps or stages are performed necessarily occur in sequence, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.

In one embodiment, as shown in FIG. 10, a visualization apparatus for model evaluation is provided, comprising: an evaluation module 10, and a display module 20, wherein:

the evaluation module 10 is configured to perform model evaluation on the prediction result of the unmanned vehicle deep learning model for the test sample set by using the labeling information of the test sample set, so as to obtain at least one model evaluation result; the annotation information is used for describing scene information of the test sample set.

And the display module 20 is used for displaying a model evaluation result corresponding to the triggering operation and/or process data during model evaluation according to the triggering operation of the user on the interface.

The visualization device for model evaluation provided in the embodiment of the present application may implement the above method embodiment, and its implementation principle and technical effects are similar, and are not described herein again.

In one embodiment, as shown in fig. 11, the evaluation module 10 includes:

the calculation sub-module 101 is configured to obtain a prediction result of the unmanned vehicle deep learning model on the test sample set, and calculate model evaluation data of the unmanned vehicle deep learning model according to the prediction result.

The statistics sub-module 102 is configured to perform statistical feature analysis on the model evaluation data based on the labeling information, so as to obtain a model evaluation result.

In one embodiment, the process data at the time of model evaluation includes at least one of model evaluation data, annotation information, and a set of test samples.

In one embodiment, as shown in fig. 12, the display module 20 includes, on the basis of the above embodiment:

a triggering sub-module 201, configured to obtain a triggering operation of a user in the evaluation task list; the evaluation tasks in the evaluation task list correspond to model evaluation results obtained according to execution model evaluation and/or process data during model evaluation.

The display sub-module 202 is configured to display a model evaluation result and/or process data during model evaluation corresponding to the triggered evaluation task.

In one embodiment, as shown in FIG. 12, the model evaluation data includes a confusion matrix based on the above embodiments; each cell data of the confusion matrix is associated with a test sample corresponding to the cell data; and each cell data of the confusion matrix is associated with a model evaluation result obtained based on the cell data and/or process data at the time of model evaluation;

the triggering sub-module 201 is further configured to obtain a triggering operation of a user on a cell of the confusion matrix currently displayed.

The display sub-module 202 is further configured to display a model evaluation result associated with the cell data and/or process data at the time of model evaluation associated with the cell data in response to the triggering operation.

In one embodiment, as shown in fig. 13, the process data at the time of model evaluation associated with the cell data includes a statistical analysis table obtained when the cell data is statistically characterized based on the labeling information, on the basis of the above embodiment; the statistical analysis table comprises the corresponding sample number of each value of the labeling information; the display sub-module 202 further includes:

an obtaining unit 2021 is configured to obtain a value of the labeling information selected by the user in the statistical analysis table.

And the display unit 2022 is configured to display, among the test samples associated with the cell data, a target test sample corresponding to the value of the selected labeling information.

In one embodiment, on the basis of the above embodiment, the display unit 222 is further configured to: combining target test samples corresponding to the values of the selected marking information; if the same test sample exists in the combined target test samples, one of the test samples is reserved.

In one embodiment, as shown in fig. 14, based on the above embodiment, the statistics sub-module 102 includes:

a selecting unit 1021 for selecting a test error sample set from the test sample sets according to the model evaluation data; the test error sample set comprises test samples corresponding to the error prediction result.

The statistics unit 1022 is configured to perform statistical feature analysis on the test sample set and the test error sample set based on the values of the labeling information, respectively, and determine a model evaluation result.

In one embodiment, as shown in fig. 15, on the basis of the above embodiment, the statistics unit 1022 includes:

the first calculating subunit 10221 is configured to calculate, based on the values of the labeling information, a first proportion of the test samples corresponding to each value in the test sample set to the test sample set.

The second calculating subunit 10222 is configured to calculate, based on the values of the labeling information, a second proportion of the test samples corresponding to the values in the test error sample set to the test error sample set.

And a third calculating subunit 10223, configured to calculate, according to the first proportion and the second proportion, a significance of each value of the labeling information, where the significance is used to characterize a degree of influence of the value of the labeling information on a prediction result of the unmanned vehicle deep learning model.

In an embodiment, the labeling information includes at least one of the collection time of the test sample set, weather information when the test sample set is collected, and position information of the collected test sample set based on the above embodiment.

In one embodiment, as shown in fig. 16, when the labeling information includes location information of the collected test sample set, the statistics sub-module 102 further includes:

the clustering unit 1023 is used for clustering adjacent position information according to each position information in the test sample set to obtain at least one clustering coordinate;

and a determining unit 1024, configured to determine at least one cluster coordinate as a value of the labeling information, where one cluster coordinate corresponds to one value of the labeling information.

In one embodiment, as shown in fig. 16, when the labeling information includes the position information of the collected test sample set, on the basis of the above embodiment, the determining unit 1024 is further configured to determine at least one path corresponding to each cluster coordinate in the unmanned vehicle running map based on the preset unmanned vehicle running map; determining at least one path as the value of the labeling information; one path corresponds to one value of the labeling information.

In one embodiment, as shown in fig. 17, on the basis of the foregoing embodiment, the statistics unit 302 further includes a determining subunit 10224, configured to determine, according to the significance of each value of the labeling information, a value with the highest significance as a factor affecting the evaluation result of the unmanned vehicle deep learning model.

In one embodiment, based on the above embodiment, the determining subunit 10224 is specifically configured to: determining a path with highest significance as a target path affecting the evaluation result of the unmanned vehicle deep learning model; and determining the route including the target path as the route to be optimized based on the unmanned vehicle running map.

For specific limitations on the visualization means of the model evaluation, reference may be made to the above limitations on the visualization method of the model evaluation, which are not described in detail here. The various modules in the visualization device for model evaluation described above may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure thereof may be as shown in fig. 18. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of visualizing model evaluation. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the structure shown in fig. 18 is merely a block diagram of a portion of the structure associated with the present application and is not limiting of the computer device to which the present application is applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In one embodiment, a computer device is provided comprising a memory and a processor, the memory having stored therein a computer program, the processor when executing the computer program performing the steps of:

The computer device provided in this embodiment has similar implementation principles and technical effects to those of the above method embodiment, and will not be described herein.

In one embodiment, a computer readable storage medium is provided having a computer program stored thereon, which when executed by a processor, performs the steps of:

The computer readable storage medium provided in this embodiment has similar principles and technical effects to those of the above method embodiment, and will not be described herein.

Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by controlling the associated hardware by a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. A method of visualizing model evaluation, the method comprising:

obtaining a prediction result of the unmanned vehicle deep learning model on a test sample set, and calculating model evaluation data of the unmanned vehicle deep learning model according to the prediction result; selecting a test error sample set from the test sample set according to the model evaluation data; the test error sample set comprises test samples corresponding to error prediction results; respectively carrying out statistical feature analysis on the test sample set and the test error sample set based on the value of the labeling information of the test sample set, and determining the model evaluation result; the annotation information is used for describing scene information of the test sample set;

2. The method of claim 1, wherein the process data at the time of model evaluation comprises at least one of the model evaluation data, the annotation information, and the set of test samples.

3. The method according to any one of claims 1-2, wherein the displaying the model evaluation result corresponding to the triggering operation according to the triggering operation of the user on the interface and/or the process data during model evaluation includes:

4. The method of claim 1, wherein the model evaluation data comprises a confusion matrix; each cell data of the confusion matrix is associated with a test sample corresponding to the cell data; and each cell data of the confusion matrix is associated with a model evaluation result and/or model evaluation time process data obtained based on the cell data; according to the triggering operation of the user on the interface, displaying a model evaluation result corresponding to the triggering operation, and/or, the process data during model evaluation comprises:

and displaying a model evaluation result associated with the cell data and/or process data during model evaluation associated with the cell data in response to the triggering operation.

5. The method according to claim 4, wherein the process data at the time of model evaluation associated with the cell data includes a statistical analysis table obtained at the time of statistical feature analysis of the cell data based on the labeling information; the statistical analysis table comprises the corresponding sample number of each value of the labeling information; the displaying the process data at the time of model evaluation associated with the cell data further comprises:

acquiring the value of the labeling information selected by the user in the statistical analysis table;

and displaying a target test sample corresponding to the value of the selected marking information in the test samples associated with the cell data.

6. The method according to claim 5, wherein before displaying the target test sample corresponding to the value of the selected labeling information in the test samples associated with the cell data, further comprises:

and if the same test sample exists in the combined target test samples, one of the test samples is reserved.

7. The method according to claim 1, wherein the performing statistical feature analysis on the test sample set and the test error sample set based on the values of the labeling information of the test sample set, respectively, includes:

calculating a first proportion of the test sample corresponding to each value in the test sample set to the test sample set based on the value of the labeling information;

8. The method of claim 7, wherein the annotation information comprises at least one of a time of collection of the test sample set, weather information at the time of collection of the test sample set, and location information at which the test sample set was collected.

9. The method of claim 8, wherein when the labeling information includes collecting location information of the test sample set, before the statistical feature analysis is performed on the test sample set and the test error sample set based on the values of the labeling information, respectively, further comprising:

and determining the at least one cluster coordinate as the value of the labeling information, wherein one cluster coordinate corresponds to one value of the labeling information.

10. The method of claim 8, wherein when the labeling information includes collecting location information of the test sample set, before the statistical feature analysis is performed on the test sample set and the test error sample set based on the values of the labeling information, respectively, further comprising:

Determining the at least one path as the value of the labeling information; one path corresponds to one value of the labeling information.

11. A visualization apparatus for model evaluation, the apparatus comprising:

the evaluation module is used for acquiring a prediction result of the unmanned vehicle deep learning model on the test sample set and calculating model evaluation data of the unmanned vehicle deep learning model according to the prediction result; selecting a test error sample set from the test sample set according to the model evaluation data; the test error sample set comprises test samples corresponding to error prediction results; respectively carrying out statistical feature analysis on the test sample set and the test error sample set based on the value of the labeling information of the test sample set, and determining the model evaluation result; the annotation information is used for describing scene information of the test sample set;

12. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 10 when the computer program is executed.

13. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 10.