WO2022037387A1

WO2022037387A1 - Visual perception algorithm evaluation method and device

Info

Publication number: WO2022037387A1
Application number: PCT/CN2021/109529
Authority: WO
Inventors: 何杰; 王旭; 申艺华; 李俊; 董维山
Original assignee: 魔门塔(苏州)科技有限公司
Priority date: 2020-08-20
Filing date: 2021-07-30
Publication date: 2022-02-24
Also published as: CN114170448A

Abstract

Disclosed in embodiments of the present invention are a visual perception algorithm evaluation method and device. The method comprises: obtaining truth value information, corresponding to each object, determined on the basis of each point cloud data frame in an evaluation data set; obtaining detected information, corresponding to each detected object, detected on the basis of a preset visual detection algorithm and each image frame in the evaluation data set; and determining evaluation information corresponding to a preset visual perception algorithm on the basis of a preset result accuracy evaluation rule, a preset algorithm stability evaluation rule, marked pose information and object motion information in the truth value information corresponding to each object, and detected pose information and detected motion information in the detected information corresponding to each detected object, wherein the evaluation information comprises first evaluation information representing the detection result accuracy of the preset visual perception algorithm and second evaluation information representing the algorithm stability of the preset visual perception algorithm, thus implementing the comprehensive evaluation of the performance of a visual perception algorithm.

Description

Method and device for evaluating visual perception algorithm

technical field

The present invention relates to the technical field of algorithm evaluation, and in particular, to a method and device for evaluating a visual perception algorithm.

Background technique

Visual perception algorithms are the core components of autonomous driving systems, face recognition systems, and identity verification systems. The accuracy of the perception results of the visual perception algorithms affects the accuracy of the output results of the above systems to a certain extent. Correspondingly, in order to ensure the performance of the above system, the performance of the visual perception algorithm needs to be evaluated before the visual perception algorithm is actually applied.

Then, how to provide a method for evaluating the performance of the visual perception algorithm has become an urgent problem to be solved.

SUMMARY OF THE INVENTION

The present invention provides an evaluation method and device for a visual perception algorithm, so as to realize a comprehensive evaluation of the performance of the visual perception algorithm.

The innovative points of the embodiments of the present invention include:

1. It can be based on the preset result accuracy evaluation rules, the preset algorithm stability evaluation rules, the labeled pose information and object motion information in the true value information corresponding to each object, and the detection pose in the detection information corresponding to each detected object Information and detection motion information, determine the first evaluation information of the accuracy of the multi-faceted detection results of the preset visual perception algorithm, and the second evaluation information of the algorithm stability, the accuracy of the multi-faceted detection results and the detection results of the algorithm The stability of the preset visual perception algorithm is evaluated to achieve a comprehensive evaluation of the performance of the visual perception algorithm.

2. Automatically label each point cloud data frame in the evaluation data set based on the pre-trained 3D data perception model, obtain the label frame information of each object corresponding to each point cloud data frame, and then calculate the corresponding data frame of each point cloud data frame. The labeled position information and labeled attitude information of the object; combined with the timing information between each point cloud data frame in the evaluation data, determine the speed information and acceleration information of each object corresponding to each point cloud data frame, and obtain the corresponding point cloud data frame. The three-dimensional information including the labeling frame information, labeling position information, labeling attitude information, labeling speed information and labeling acceleration information of each object, realizes automatic labeling of the 3D information about the labelled objects, and saves labor costs.

Description of drawings

1 is a schematic flowchart of a method for evaluating a visual perception algorithm according to an embodiment of the present invention;

2 is an exemplary diagram of an error curve corresponding to a target error value provided by an embodiment of the present invention;

FIG. 3 is a schematic structural diagram of an apparatus for evaluating a visual perception algorithm according to an embodiment of the present invention.

detailed description

The present invention provides an evaluation method and device for a visual perception algorithm, so as to realize a comprehensive evaluation of the performance of the visual perception algorithm. The embodiments of the present invention will be described in detail below.

FIG. 1 is a schematic flowchart of a method for evaluating a visual perception algorithm according to an embodiment of the present invention. The method may include the following steps:

S101: Obtain truth value information corresponding to each object determined based on each point cloud data frame in the evaluation data set.

The ground truth information includes at least the labeled pose information and object motion information of the corresponding object, and each evaluation data includes a point cloud data frame and an image frame that have a corresponding relationship. The point cloud data frame may be a data frame collected by a lidar sensor, and the image frame may be an image frame collected by an image collection device.

The evaluation method of the visual perception algorithm provided by the embodiment of the present invention can be applied to any electronic device with computing capability, and the electronic device can be a terminal or a server.

The labeled pose information and object motion information of the corresponding object included in the ground truth information may be information based on three-dimensional space, such as the pose information and motion information in the device coordinate system of the device that acquired the point cloud data frame. The information can also be pose information and motion information in a preset space rectangular coordinate system, which are all possible, wherein the preset space rectangular coordinate system can be a world coordinate system or an image acquisition device coordinate. Among them, the labeled pose information may include labeled position information and labeled attitude information; the object motion information may include but not limited to: the speed information and acceleration information of the labeled object, for the sake of clarity, it can be called based on each point cloud in the evaluation data set The speed information included in the object motion information determined by the data frame is the marked speed information, and the acceleration information is the marked acceleration information.

In one case, the visual perception algorithm may be a visual perception algorithm applied in an automatic driving system. Correspondingly, each evaluation data included in the evaluation data set may be the evaluation data collected by the target vehicle during the driving process. An evaluation data includes point cloud data frames and image frames that have a corresponding relationship, and the existing corresponding relationship may refer to: point cloud data frames and image frames collected in the same collection period. Correspondingly, the above-mentioned lidar sensor and image acquisition device may both be installed in the target vehicle.

When the visual perception algorithm may be a visual perception algorithm applied in an automatic driving system, the above objects may include, but are not limited to, vehicles, pedestrians, and the like. In one case, when the object is a vehicle, the labeled position information in the labeled pose information of the corresponding object included in the ground truth information may refer to: the position information of the center point of the vehicle, the position information of the center point of the rear of the vehicle, or The position information of the center point of the front of the car, this is all possible. The labeled pose information in the labeled pose information of the corresponding object included in the true value information may refer to: the angle information of the coordinate axes corresponding to the coordinate system where the vehicle is located during the driving process, including: pitch angle information, roll Corner information and offset angle information. In one case, the pitch angle information and roll angle information generated when the vehicle is running on the ground may not be considered, that is, the pitch angle information and the roll angle information generated when the vehicle is running on the ground are considered to be zero.

In one implementation, the evaluation data in the evaluation data set may include: evaluation data collected for normal driving scenarios, or evaluation data sets collected for large vehicles or special-shaped vehicle scenarios, or evaluation data sets collected for pedestrians, complex intersections, and specific weather The evaluation data collected by the conditions are all possible.

In one implementation, the electronic device can directly obtain the ground truth information corresponding to each object determined based on each point cloud data frame in the evaluation data set and sent by other devices.

In an implementation manner of the present invention, S101, may include the following steps 011-013:

011: Obtain the evaluation data set;

012: Label each point cloud data frame in the evaluation data set based on the pre-trained 3D data perception model, and mark the label frame information of each object corresponding to each point cloud data frame, so as to determine each object corresponding to each point cloud data frame The labeled position information and labeled pose information of each point cloud data frame are obtained, and the labeled pose information of each object corresponding to each point cloud data frame is obtained;

013: Based on the labeled position information and labeled attitude information of each object corresponding to each point cloud data frame in the evaluation data set, and the time sequence information between each point cloud data frame in the evaluation data set, determine the corresponding object of each point cloud data frame. The velocity information and the acceleration information are marked to obtain the object motion information of each object corresponding to each point cloud data frame, and the true value information corresponding to each object corresponding to each point cloud data frame is obtained.

In this implementation manner, the electronic device can directly obtain the evaluation data set, wherein the evaluation data set includes multiple evaluation data; the electronic device inputs the point cloud data frame included in each evaluation data in the evaluation data set into the pre-trained 3D data perception The model uses a pre-trained 3D data perception model to detect each object in each point cloud data frame, and mark it with a label frame to obtain the label frame information of each object corresponding to each point cloud data frame, wherein the label The box can be a cube. The annotation frame information of each object includes: information that can represent the length, width and height of the object, and information that can represent the pose information of the object.

Subsequently, the electronic device converts the labeling frame information of each object corresponding to each point cloud data frame output by the pre-trained 3D data perception model to obtain the labeling position information of each object corresponding to each frame of point cloud data frame. and labeling pose information. The pre-trained three-dimensional data perception model may be: a neural network model trained based on the sample point cloud data frame and its corresponding calibration information including the calibration frame information corresponding to each object in the sample point cloud data frame, and the specific model training For the process, please refer to the training process of the model in the related art, which will not be repeated here.

The evaluation data in the evaluation data set is generally the data obtained by continuous collection, that is, the point cloud data frames in the evaluation data set are continuous frames, and the image frames are continuous frames. Correspondingly, the electronic device can determine the location of each point cloud data frame based on the marked position information and marked attitude information of each object corresponding to each point cloud data frame in the evaluation data set, and the timing information between each point cloud data frame in the evaluation data set. Corresponding to the marked speed information and marked acceleration information of each object.

In one implementation, the labeled position information of each object may include horizontal position information, vertical position information, and radial position information, and further, based on the horizontal position information of each object corresponding to each point cloud data frame, and each point cloud data frame The time series information between each object can be determined to obtain the marked lateral velocity information and marked lateral acceleration information; based on the longitudinal position information of each object corresponding to each point cloud data frame, and the time series information between each point cloud data frame, it can be Determine the marked longitudinal velocity information and marked longitudinal acceleration information of each object; based on the radial position information of each object corresponding to each point cloud data frame, and the time sequence information between each point cloud data frame, the annotation of each object can be determined. Radial velocity information and labeled radial acceleration information.

S102: Obtain detection information corresponding to each detection object detected based on a preset visual detection algorithm and each image frame in the evaluation data set.

The detection information includes at least detection pose information and detection motion information of the corresponding detected object.

The detection information corresponding to each object in the image can be detected from the image frame by using the preset visual detection algorithm. For the sake of clarity, the object detected from the image frame by the preset visual detection algorithm can be called a detection object.

The detection information may include two-dimensional information and three-dimensional information corresponding to the object; wherein, the two-dimensional information corresponding to the object may include: two-dimensional position information and two-dimensional speed information of the object in the image frame; the three-dimensional information corresponding to the object may include But not limited to: the pose information of the object in the specified space rectangular coordinate system, that is, the detected pose information, and the detected motion information, the detected motion information includes but is not limited to: the detected speed information and detected acceleration information of the corresponding object.

In one implementation, the electronic device can directly obtain the detection information corresponding to each detected object detected in each image frame in the evaluation data set based on the preset visual detection algorithm and sent by other devices.

In an implementation manner of the present invention, S102, may include the following steps 021-022:

021: Based on the preset visual perception algorithm, detect each image frame in the evaluation data set, and obtain the detection frame information corresponding to each detected object corresponding to each image frame, so as to determine the detected detected object corresponding to each image frame. The detection position information and detection posture information of each detection object are obtained, and the detection posture information of each detection object corresponding to each image frame is obtained.

022: Based on the preset visual perception algorithm and the detection position information and detection attitude information of each detection object corresponding to each image frame, determine the detection speed information and detection acceleration information of each detection object corresponding to each image frame to obtain each image frame. The detection motion information of each detection object corresponding to an image frame is obtained, and the detection information corresponding to each detection object corresponding to each image frame is obtained.

In this implementation manner, the preset visual perception algorithm is pre-stored locally on the electronic device or in a connected storage device, and after obtaining the evaluation data set, the electronic device can, based on the preset visual perception algorithm, analyze each image frame in the evaluation data set. Perform detection to obtain detection frame information corresponding to each detected object corresponding to each image frame; wherein, the detection frame information includes: information that can represent the length, width and height of the corresponding detected object, and information that can represent the corresponding detected object. The information of the pose information may also include information representing the two-dimensional position information of the corresponding detected object in the corresponding image frame.

The electronic device determines the detection position information and detection posture information of each detected object corresponding to each image frame based on a preset visual perception algorithm and detection frame information corresponding to each detected object corresponding to each image frame. And based on the preset visual perception algorithm and the detection position information and detection attitude information of each detection object corresponding to each image frame, the detection speed information and detection acceleration information of each detection object corresponding to each image frame are determined to obtain each image frame. The detected motion information of each detected object corresponding to the image frame.

In one implementation, the detection position information of each detected object may include lateral position information, longitudinal position information and radial position information, and further, based on the lateral position information of each detected object and the timing information between each image frame , the detected lateral velocity information and detected lateral acceleration information of each detected object can be determined; based on the longitudinal position information of each detected object and the timing information between each image frame, the detected longitudinal velocity information and Detecting longitudinal acceleration information; based on the radial position information of each detected object and the timing information between each image frame, the detected radial velocity information and detected radial acceleration information of each detected object can be determined.

S103: Based on preset result accuracy evaluation rules, preset algorithm stability evaluation rules, labeled pose information and object motion information in the true value information corresponding to each object, and detection pose information in the detection information corresponding to each detected object and detecting motion information to determine the evaluation information corresponding to the preset visual perception algorithm.

The evaluation information includes: first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm and second evaluation information regarding the stability of the algorithm.

In this step, the electronic device may, based on the preset result accuracy evaluation rules, process the true value information and the detection information that have a corresponding relationship to obtain first evaluation information representing the accuracy of the detection results of the preset visual perception algorithm; The algorithm stability evaluation rule is preset, and the truth information and detection information that have a corresponding relationship are processed to obtain second evaluation information representing the algorithm stability of the preset visual perception algorithm.

Wherein, the preset result accuracy evaluation rules may include but are not limited to: a specific test result accuracy evaluation index, and a process of instructing to determine the result corresponding to the specific test result accuracy evaluation index based on the true value information and the test information; the preset algorithm The stability evaluation rule may include, but is not limited to, a specific algorithm stability evaluation index, and a process indicating a result corresponding to the algorithm stability evaluation index determined based on the truth information and detection information.

Applying the embodiments of the present invention, it can be based on the preset result accuracy evaluation rules, the preset algorithm stability evaluation rules, the labeled pose information and object motion information in the true value information corresponding to each object, and the detection information corresponding to each detected object. The first evaluation information for the accuracy of the multi-faceted detection results of the preset visual perception algorithm and the second evaluation information for the stability of the algorithm are determined. The accuracy of the multi-faceted detection results and the The stability of the detection results of the algorithm is used to evaluate the preset visual perception algorithm, so as to realize a comprehensive evaluation of the performance of the visual perception algorithm.

In another embodiment of the present invention, the ground truth information includes the labeling frame information of each object corresponding to each point cloud data frame, the detection information includes the detection frame information of each detected object corresponding to each image frame, and the detection The frame information includes the two-dimensional position information of the corresponding detected object in the image frame;

The S103 may include the following steps 031-034:

031: For each object corresponding to each point cloud data frame, based on the annotation frame information corresponding to the object, the position conversion relationship between the point cloud data frame acquisition device and the image frame acquisition device, and the internal parameter information of the image frame acquisition device, determine the The projection position information of the projection frame corresponding to the object is projected to the image frame corresponding to the point cloud data frame, as the position information of the projection frame corresponding to the object.

032: For each object corresponding to each point cloud data frame, based on the position information of the projection frame corresponding to each object and the two-dimensional position information of each detected object in the image frame corresponding to the point cloud data frame, determine the matching position information of the projection frame and two-dimensional position information to determine the matching ground truth information and detection information corresponding to each point cloud data frame and its corresponding image frame.

The matched projection frame position information and two-dimensional position information are: projection frame position information and two-dimensional position information for which the intersection ratio value of the corresponding frame exceeds a preset intersection ratio threshold.

033: Based on the preset result accuracy evaluation rules, each point cloud data frame and its corresponding image frame corresponding to the matching truth information and detection information, the detection information that does not match the truth information, and the detection information that does not match the detection information. The truth value information is to determine the first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm.

034: Determine the second evaluation information representing the algorithm stability of the preset visual perception algorithm based on the preset algorithm stability evaluation rules, each point cloud data frame and the corresponding matching truth information and detection information of the corresponding image frame. .

In this implementation manner, before determining the evaluation information corresponding to the preset visual perception algorithm based on the true value information and the detection information, the electronic device needs to match the true value information and the detection information, so as to pass the matched truth value information and the detection information. , to determine the evaluation information corresponding to the preset visual perception algorithm. Correspondingly, the electronic device may, for the object corresponding to each point cloud data frame, based on the corresponding labeling frame information of the object and the positional conversion relationship between the point cloud data frame acquisition device and the image frame acquisition device, label the object corresponding to the object. The annotation frame corresponding to the frame information is converted from the coordinate system of the point cloud data frame acquisition device where it is located to the coordinate system of the image frame acquisition device, and the coordinates of the annotation frame corresponding to the annotation frame information corresponding to the object in the image frame acquisition device are obtained. Then, based on the position information of the label frame corresponding to the label frame information corresponding to the object in the coordinate system of the image frame acquisition device and the internal parameter information of the image frame capture device, the label frame corresponding to the object is projected to the In the image frame corresponding to the point cloud data frame, the projection position information of the annotation frame corresponding to the object projected to the projection frame in the image frame corresponding to the point cloud data frame is determined as the position information of the projection frame corresponding to the object.

For each point cloud data frame, after projecting the annotation frame corresponding to each object to the image frame corresponding to the point cloud data frame, the position information of the projection frame corresponding to each object and the image corresponding to the point cloud data frame can be used. The two-dimensional position information of each detected object in the frame is calculated, and the intersection ratio between the annotation frame corresponding to each object and the two-dimensional detection frame of each detected object in the image frame corresponding to the point cloud data frame is calculated, that is, the corresponding value of each object is calculated. The intersection area between the annotation frame and the two-dimensional detection frame of each detected object in the image frame corresponding to the point cloud data frame; The ratio between the union areas between the detection frames; for each ratio, compare the ratio with the preset intersection and union ratio threshold to determine the size of the ratio and the preset intersection and union ratio threshold, if the ratio exceeds the preset intersection and union ratio If the ratio threshold is determined, the object corresponding to the position information of the projection frame corresponding to the ratio and the detected object corresponding to the two-dimensional position information corresponding to the ratio are the same object. Accordingly, the position information of the projection frame corresponding to the ratio and the corresponding two-dimensional position information are the same object. The position information is the matched projection frame position information and two-dimensional position information.

For example, the corresponding objects in the point cloud data frame A include object 1, object 2, and object 3; the detected objects corresponding to the image frame a corresponding to the point cloud data frame A include detected object 1, detected object 2, and detected object 3 and detection object 4; for the object 1 corresponding to the point cloud data frame A, based on the position information of the projection frame corresponding to the object 1 and the two-dimensional position information of the detection object 1, calculate the projection frame corresponding to the object 1 and the detection object 1 corresponding to the two The intersection ratio between the two-dimensional detection frames; based on the position information of the projection frame corresponding to object 1 and the two-dimensional position information of detection object 2, calculate the intersection between the projection frame corresponding to object 1 and the two-dimensional detection frame corresponding to detection object 2 Parallel ratio; based on the position information of the projection frame corresponding to object 1 and the two-dimensional position information of detection object 3, calculate the intersection ratio between the projection frame corresponding to object 1 and the two-dimensional detection frame corresponding to detection object 3; The position information of the projection frame and the two-dimensional position information of the detection object 4 are calculated, and the intersection ratio between the projection frame corresponding to the object 1 and the two-dimensional detection frame corresponding to the detection object 4 is calculated.

By analogy, for the object 2 corresponding to the point cloud data frame A, the intersection ratio between the projection frame corresponding to the object 2 and the two-dimensional detection frame corresponding to the detection object 1 is calculated; the projection frame corresponding to the object 2 and the detection object are obtained. The intersection ratio between the two-dimensional detection frames corresponding to 2; the intersection ratio between the projection frame corresponding to object 2 and the two-dimensional detection frame corresponding to detection object 3; the projection frame corresponding to object 2 and the two-dimensional detection frame corresponding to detection object 4 The intersection ratio between the dimensional detection boxes. And for the object 3 corresponding to the point cloud data frame A, the intersection ratio between the projection frame corresponding to the object 3 and the two-dimensional detection frame corresponding to the detection object 1 is calculated; The intersection ratio between the two-dimensional detection frames; the intersection ratio between the projection frame corresponding to object 3 and the two-dimensional detection frame corresponding to detection object 3; the projection frame corresponding to object 3 and the two-dimensional detection frame corresponding to detection object 4 The intersection and comparison between.

Determine the size of each intersection ratio, that is, the ratio and the preset intersection ratio threshold; if: if the intersection ratio between the projection frame corresponding to object 1 and the two-dimensional detection frame corresponding to detection object 3 exceeds the preset intersection ratio threshold , the position information of the projection frame corresponding to object 1 and the two-dimensional position information corresponding to detection object 3 are determined to be matching projection frame position information and two-dimensional position information. Correspondingly, the true value information corresponding to object 1 corresponds to detection object 3 The detection information is the matching truth information and detection information.

If the intersection ratio between the projection frame corresponding to object 3 and the two-dimensional detection frame corresponding to detection object 1, and the intersection ratio between the projection frame corresponding to object 3 and the two-dimensional detection frame corresponding to detection object 2, object 3 corresponds to The intersection ratio between the projection frame corresponding to the detection object 3 and the two-dimensional detection frame corresponding to the detection object 3, and the intersection ratio between the projection frame corresponding to the object 3 and the two-dimensional detection frame corresponding to the detection object 4 do not exceed the preset intersection ratio. Compared with the threshold value, it is determined that there is no detection object that is the same physical object as object 3 in detection objects 1-4, that is, the true value information corresponding to object 3 is the true value information that does not match the detection information. Correspondingly, it can be called an object 3 is the missing object.

If the intersection ratio between the projection frame corresponding to object 1 and the two-dimensional detection frame corresponding to detection object 4, the intersection ratio between the projection frame corresponding to object 2 and the two-dimensional detection frame corresponding to detection object 4, object 3 corresponds to If the intersection ratio between the projection frame and the two-dimensional detection frame corresponding to the detection object 4 does not exceed the preset intersection ratio threshold, it is determined that there is no object that is the same physical object as the detection object 4 in the objects 1-3. That is, the detection information corresponding to the detection object 4 is the detection information that does not match the true value information, and accordingly, the detection object 4 may be called a false detection object.

Subsequently, the electronic device may be based on the preset result accuracy evaluation rules, the matching truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, the detection information that does not match the truth information, and the unmatched information. To the true value information of the detection information, determine the first evaluation information representing the accuracy of the detection results of the preset visual perception algorithm; based on the preset algorithm stability evaluation rules, each point cloud data frame and its corresponding image frame match The truth information and detection information of the preset visual perception algorithm determine the second evaluation information representing the algorithm stability of the preset visual perception algorithm.

In another embodiment of the present invention, the detected pose information includes: the detected position information and the detected posture information of the detected objects corresponding to each image frame and determined by their detection frame information; the detected motion information includes: : the detected speed information and detected acceleration information of each detected object corresponding to each image frame;

The labeling pose information includes: labeling position information and labeling posture information of each object corresponding to each point cloud data frame through its labeling frame information; object motion information includes: labeling speed information of each object corresponding to each point cloud data frame and marking acceleration information;

The 033 may include the following steps 0331-0339:

0331: Based on the matched truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, the detection information that does not match the truth information, and the truth information that does not match the detection information, determine the prediction. Set the precision information and recall information of the detection results corresponding to the visual perception algorithm.

0332: Determine the detection between the matching truth information and the detection information based on the labeling position information included in the matched truth information corresponding to each point cloud data frame and its corresponding image frame and the detection position information included in the detection information Position error value.

0333: Determine the detection between the matching truth information and the detection information based on the annotation attitude information included in the matched truth information corresponding to each point cloud data frame and its corresponding image frame and the detection attitude information included in the detection information Attitude error value.

0334: Based on the annotation speed information included in the matched truth information corresponding to each point cloud data frame and its corresponding image frame and the detection speed information included in the detection information, determine the detection between the matching truth information and the detection information Speed error value.

0335: Based on the marked acceleration information included in the matched true value information corresponding to each point cloud data frame and its corresponding image frame, and the detected acceleration information included in the detected information, determine the detection between the matched true value information and the detected information. Acceleration error value.

0336: Determine the detection between the matched true value information and the detection information based on the annotation frame information included in the matched true value information corresponding to each point cloud data frame and its corresponding image frame and the detection frame information included in the detection information The length and width error value of the box.

0337: Draw an error curve corresponding to the target error value based on the target error value between the matched truth information and the detection information and the preset error threshold corresponding to the target error value.

The horizontal axis of the error curve is the preset error threshold, the vertical axis of the error curve is the ratio of the number of target error values smaller than each preset error threshold among the target error values to the total amount of data in the evaluation data set, and the target error value is : Detect position error value, detect attitude error value, detect speed error value, detect acceleration error value or detect frame length and width error value;

0338: Sort the target error values between the matched true value information and detection information according to the size of the value to obtain a sorting sequence corresponding to the target error value; determine the first percentage in the sorting sequence corresponding to the target error value The first target error value with the largest value among the target error values of , and the second target error value with the largest value among the target error values of the first second percentage.

0339: Based on the precision rate information and recall rate information of the detection result corresponding to the preset visual perception algorithm, the error curve corresponding to the target error information, the first target error value and the second target error value and/or the target error value, determine the characterization prediction. Assume the first evaluation information of the accuracy of the detection result of the visual perception algorithm.

In this implementation, the preset visual perception algorithm can be used to detect the 2D information and 3D information of each detected object from the image frame, including the two-dimensional position information of each detected object in the image frame, the detection object in the specified space The detection position information, the detection attitude information, the detection speed information and the detection acceleration information in the Cartesian coordinate system.

In order to realize the multi-dimensional evaluation of the accuracy of the detection results of the preset visual perception algorithm and the stability of the algorithm, the true value information includes various dimensional labeling parameters of the corresponding object, which may include but are not limited to the labeling position information of the corresponding object, Annotate attitude, annotate velocity information, and annotate acceleration information.

The preset result accuracy evaluation rules may include rules indicating the determination of the precision information and the recall information of the detection results. Correspondingly, the electronic device may determine the preset precision information and the preset recall rate according to the determination method. method, based on the matching truth information and detection information corresponding to the one point cloud data frame and its corresponding image frame, the detection information that does not match the truth information, and the truth information that does not match the detection information, determine the preset visual perception The precision information and recall rate information of the detection results corresponding to the algorithm. The precision rate information and recall rate information of the detection result corresponding to the preset visual perception algorithm are used as an evaluation index for evaluating the accuracy of the detection result corresponding to the preset visual perception algorithm. The method for determining the preset precision rate information and the method for determining the preset recall rate may refer to the method for determining the accuracy rate information and the method for determining the recall rate in the related art, which will not be repeated here.

In this implementation, an evaluation index for the accuracy of the detection result corresponding to the preset visual perception algorithm is added, in order to draw a special form of error curve, and then use the error curve to evaluate the detection corresponding to the preset visual perception algorithm Accuracy of results. Another new evaluation index for the accuracy of the detection results corresponding to the preset visual perception algorithm is: counting the number of error values corresponding to different proportions of the error values in the same dimension, and then, based on the statistics, among the error values in the same dimension The number of error values corresponding to different proportions is used to evaluate the accuracy of the detection results corresponding to the preset visual perception algorithm.

Specifically, firstly, each error value corresponding to different dimensions is calculated based on the detection information of the matched truth information:

Based on the annotation position information included in the matched ground truth information corresponding to each point cloud data frame and its corresponding image frame, and the detection position information included in the detection information, determine the detection position error between the matched ground truth information and the detection information value. That is, for each matched true value information and detection information, unify the coordinate system of the marked position information and the detected position information, and then calculate the detected position error between the marked position information after the unified coordinate system and the detected position information. value; wherein, the detected position error value may be an absolute error value and/or a relative error value between the marked position information and the detected position information.

Based on the annotation attitude information included in the matched ground truth information corresponding to each point cloud data frame and its corresponding image frame, and the detection attitude information included in the detection information, the detection attitude error between the matched ground truth information and the detection information is determined. value, that is, for each matched true value information and detection information, unify the coordinate system of the marked attitude information and the detected attitude information, and then calculate the detection between the marked attitude information and the detected attitude information after the unified coordinate system. Attitude error value; wherein, the detected attitude error value may be an absolute error value and/or a relative error value between the marked attitude information and the detected attitude information.

Based on the annotation velocity information included in the matched ground truth information corresponding to each point cloud data frame and its corresponding image frame, and the detection velocity information included in the detection information, the detection velocity error between the matched ground truth information and the detection information is determined. value, that is, for each matched true value information and detection information, unify the coordinate system of the labeling speed information and detection speed information, and then calculate the detection speed between the labeling speed information and the detection speed information after the unified coordinate system. Speed error value; wherein, the detected speed error value may be an absolute error value and/or a relative error value between the marked speed information and the detected speed information.

Based on the labeled acceleration information included in the matched true value information corresponding to each point cloud data frame and its corresponding image frame, and the detected acceleration information included in the detected information, determine the detected acceleration error between the matched true value information and the detected information. value, that is, for each matched true value information and detection information, unify the coordinate system of the marked acceleration information and the detected acceleration information, and then calculate the detection between the marked acceleration information and the detected acceleration information after the unified coordinate system. Acceleration error value; wherein, the detected acceleration error value may be an absolute error value and/or a relative error value between the marked acceleration information and the detected acceleration information.

Based on the annotation frame information included in the matched ground truth information corresponding to each point cloud data frame and its corresponding image frame, and the detection frame information included in the detection information, determine the difference between the matching ground truth information and the detection frame between the detection information. Length and width error value; that is, for each matched true value information and detection information, unify the scale between the annotation frame information and detection frame information, and calculate the annotation frame based on the unified scale information and detection frame information. The error between the length and width of the detection frame, as the length and width error value of the detection frame, where the length and width error value of the detection frame can be the absolute error value of the length and width between the label frame and the detection frame and/ or relative error value. In one case, the error between the heights of the label box and the detection box can also be calculated.

In one case, the electronic device may sequentially use the above-determined detection position error value, detection attitude error value, detection speed error value, detection acceleration error value, and length and width error value of the detection frame as the target error value; The target error value between the true value information and the detection information and the preset error threshold corresponding to the target error value, and the error curve corresponding to the target error value is drawn. Specifically, for each preset error threshold, count the number of target error values that are smaller than the preset error threshold among the target error values, and calculate the number of target error values that are smaller than the preset error threshold among the target error values. The ratio of the number to the total amount of data in the evaluation data set; further, taking the preset error threshold as the horizontal axis of the error curve corresponding to the target error value, and taking the number of target error values smaller than each preset error threshold as the target The vertical axis of the error curve corresponding to the error value, and the error curve corresponding to the target error value is drawn. Wherein, the preset error threshold includes a plurality of preset error thresholds, and the preset error threshold can be set from 0 and increase sequentially.

As shown in FIG. 2 , it is an example diagram of the error curve corresponding to the drawn target error value.

In another case, the electronic device sorts the target error values between the matched true value information and detection information according to the size of the numerical value, and obtains a sorting sequence corresponding to the target error value; The first target error value with the largest value among the target error values of the first percentage, and the second target error value with the largest value among the first second percentage of target error values. The first target error value may be referred to as 1sigma, and the second target error value may be referred to as 2sigma.

For example, the top first percentage may be the top 68.26% of the sorted sequences, and the top second percentage may be the top 95.44% of the sorted sequences.

Subsequently, the electronic device may be based on the precision rate information and recall rate information of the detection result corresponding to the preset visual perception algorithm, the error curve corresponding to the target error information, the first target error value, the second target error value and/or the target error value. , and determine the first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm.

It can be understood that, the higher the precision rate information and the higher the recall rate information of the detection result corresponding to the preset visual perception algorithm, the higher the accuracy of the detection result corresponding to the preset visual perception algorithm.

For the error curve corresponding to the target error value, as shown in Figure 2, the larger the area under the error curve (AUC: Area Under Curve) corresponding to the target error value, the closer the area value is to 1, which can represent the matching The smaller the overall error of the information on the corresponding dimension in the true value information and the detection information, the better the performance of the preset visual perception algorithm. For example, the target error value is the detection position error value, which measures the position judgment ability of the preset visual perception algorithm. The larger the area under the curve in the error curve corresponding to the detection position error value, it can indicate the preset visual perception algorithm. The better the location judgment ability. Another example: the target error value is the detection attitude error value, which measures the attitude judgment ability of the preset visual perception algorithm, and the larger the area under the curve in the error curve corresponding to the detection attitude error value, it can indicate the preset visual perception The better the location judgment ability of the algorithm.

For the evaluation index of counting the number of error values corresponding to different proportions of the error values in the same dimension, the smaller the determined values of the first target error value and the second target error value, the smaller the value of the preset visual perception algorithm. The better the performance, the higher the accuracy of the detection results.

In one implementation, the evaluation index for the accuracy of the detection result corresponding to the preset visual perception algorithm may also include, but is not limited to, the PR curve and the proportion of false detections and missed detections of objects in the detection results in the entire detection results and other metrics, wherein the PR curve is the precision rate information and the recall rate information curve, which is a curve drawn with the recall rate information as the abscissa axis and the precision rate information as the ordinate axis.

In order to improve the comprehensiveness of the evaluation of the preset visual perception algorithm, the embodiment of the present invention not only evaluates the accuracy of the detection result corresponding to the preset visual perception algorithm, but also provides an evaluation of the algorithm stability of the preset visual perception algorithm . In another embodiment of the present invention, the 034 may include the following steps:

0341: Determine the target error value corresponding to the same object from the target error value based on the time series information between the point cloud data frames or the image frames in the evaluation data set;

0342: For the target error values corresponding to different objects, based on the time series information between point cloud data frames or image frames in the evaluation data set, the target error value corresponding to the object and the preset curve fitting algorithm, the corresponding object corresponding to the object is obtained by fitting. The fitted error curve corresponding to the target error value of ,

Wherein, the fitting error curve includes: fitting errors corresponding to the object obtained by fitting at the acquisition moments corresponding to each point cloud data frame or image frame;

0343: For the target error value corresponding to different objects, based on the target error value corresponding to the object and the fitting error curve corresponding to the target error value corresponding to the object, the object is collected in each point cloud data frame or image frame corresponding to the object. The fitting error corresponding to the moment is determined, and the second evaluation information representing the algorithm stability of the preset visual perception algorithm is determined.

In this implementation manner, the existence of continuity between the point cloud data frames in the evaluation data set means that there is continuity in time series, and the existence of continuity between image frames means that there is continuity in time series. In view of this, the electronic device can determine the target error value corresponding to the same object from the target error value based on the time series information between the point cloud data frames or the image frames in the evaluation data set, wherein the target error value corresponding to the same object Arrange according to the time series information between the corresponding point cloud data frames or image frames.

Correspondingly, the electronic device obtains the target error value corresponding to different objects based on the point cloud data frame or the timing information between the image frames in the evaluation data set, the target error value corresponding to the object and the preset curve fitting algorithm. The fitting error curve corresponding to the target error value corresponding to the object, and the fitting error curve corresponding to the target error value corresponding to the object may include: the fitting obtained by the object corresponds to the acquisition moment corresponding to each point cloud data frame or image frame. fitting error. The preset curve fitting algorithm may be any type of curve fitting algorithm in the related art, which is not limited in this embodiment of the present invention.

Subsequently, the target error value corresponding to the different objects and the fitting error curve corresponding to the target error value corresponding to the object by the electronic device includes the fitting error corresponding to the object at the acquisition time corresponding to each point cloud data frame or image frame, A second evaluation information that characterizes the algorithm stability of the preset visual perception algorithm is determined. In another embodiment of the present invention, the 0343 includes:

03431: For the target error value corresponding to different objects, based on the target error value corresponding to the object and the fitting error curve corresponding to the target error value corresponding to the object, the object is collected in each point cloud data frame or image frame corresponding to the object. The fitting error corresponding to the moment is calculated, and the difference between the target error and the fitting error corresponding to the same acquisition moment is calculated;

03432: For the target error values corresponding to different objects, based on the difference between the target error and the fitting error at each acquisition time corresponding to the object, and the preset difference threshold corresponding to the target error value corresponding to the object, draw the target error value corresponding to the object. The difference error curve corresponding to the target error.

Wherein, the horizontal axis of the difference error curve corresponding to the target error corresponding to the object is the preset difference threshold corresponding to the target error value corresponding to the object, and the vertical axis of the difference error curve corresponding to the target error corresponding to the object is the The difference value corresponding to the target error value corresponding to the object is smaller than the number of each preset difference value threshold, and the ratio of the total number of target error values corresponding to the object.

03433: For the target error values corresponding to different objects, sort the difference values corresponding to the target error values corresponding to the objects according to the magnitude of the value, and determine the first one with the largest value among the first third percentage differences in the sorting sequence. The difference, and the second difference with the largest value among the differences in the first fourth percentile;

03434: Based on the first difference and the second difference corresponding to the target error value corresponding to each object, and/or the difference error curve corresponding to the target error value corresponding to each object, determine the algorithm stability that characterizes the preset visual perception algorithm of the second evaluation information.

In this implementation manner, the target error value corresponding to the object corresponds to different point cloud data frames or image frames, and each point cloud data frame or image frame corresponds to a collection moment. Correspondingly, in one implementation, the target error values corresponding to the object correspond to different collection moments. In view of this, the electronic device can target the target error values corresponding to different objects based on the target error values corresponding to the objects and the fitted error curves corresponding to the target error values corresponding to the objects contained in the object in each point cloud data frame or image. The fitting error corresponding to the acquisition moment corresponding to the frame is calculated, and the difference between the target error and the fitting error corresponding to the same acquisition moment is calculated.

Further, with respect to different preset difference thresholds corresponding to the target error value corresponding to the object, count the difference between the target error and the fitting error at each acquisition time corresponding to the object, which are smaller than the difference between the preset difference thresholds. and calculate the ratio of the number of differences less than the preset difference threshold to the total number of target error values corresponding to the object; take the difference corresponding to the target error value corresponding to the object less than the number of preset difference thresholds , the ratio of the total number of target error values corresponding to the object is the vertical axis, and the preset difference threshold corresponding to the target error value corresponding to the object is the horizontal axis, and the difference error curve corresponding to the target error corresponding to the object is drawn.

In another implementation, for the target error values corresponding to different objects, the electronic device can sort the difference values corresponding to the target error values corresponding to the objects according to the magnitude of the numerical value, and determine the difference of the first third percentage in the sorting sequence. The first difference with the largest value among the values, and the second difference with the largest value among the first fourth percentile differences. Wherein, the third percentage may be the same or different from the first percentage, and the fourth percentage may be the same or different from the second percentage.

Subsequently, the electronic device may determine the value representing the preset visual perception algorithm based on the first difference value and the second difference value corresponding to the target error value corresponding to each object, and/or the difference value error curve corresponding to the target error value corresponding to each object. Second evaluation information for algorithm stability. Wherein, the smaller the value between the first difference and the second difference corresponding to the target error value corresponding to each object, the smaller the difference between the target error and the fitting error at each acquisition moment corresponding to the object can be represented, Correspondingly, it can be characterized that the preset visual perception algorithm has better algorithm stability in the dimension corresponding to the target error value.

The larger the area under the difference error curve corresponding to the target error value corresponding to each object, that is, the closer the area value is to 1, the smaller the overall error between the target error and the fitting error at each acquisition time corresponding to the object can be represented. That is, the better the performance of the preset visual perception algorithm, the higher the detection stability of the preset visual perception algorithm in the dimension corresponding to the target error value.

In order to provide user experience and help users understand the evaluation information, the embodiment of the present invention can visually display the evaluation information and the intermediate information generated in the evaluation process. In another embodiment of the present invention, the method further includes:

Display the first evaluation information, the second evaluation information, the precision rate information and recall rate information of the detection result corresponding to the preset visual perception algorithm, the error curve corresponding to the target error information, the first target error value and the second target error value, the target The error value, the determined target error value corresponding to the same object, and/or the fitting error curve corresponding to the target error value corresponding to each object obtained by fitting.

Among them, the electronic device can visually display the evaluation information and the intermediate information generated during the evaluation process in a web-based visualization method and a 3D rendering-based visualization method, and can provide a function of interacting with the user.

Corresponding to the above method embodiment, the embodiment of the present invention provides an evaluation device for a visual perception algorithm. As shown in FIG. 3 , the device may include:

The first obtaining module 310 is configured to obtain the truth value information corresponding to each object determined based on each point cloud data frame in the evaluation data set, wherein the truth value information at least includes the marked pose information of the corresponding object and the motion of the object information, each evaluation data includes point cloud data frame and image frame with corresponding relationship;

The second obtaining module 320 is configured to obtain the detection information corresponding to each detected object detected based on the preset visual detection algorithm and each image frame in the evaluation data set, wherein the detection information at least includes the corresponding detected object. Detect pose information and detect motion information;

The determination module 330 is configured to be based on the preset result accuracy evaluation rules, the preset algorithm stability evaluation rules, the labeled pose information and object motion information in the true value information corresponding to each object, and the detection information corresponding to each detected object. The detected pose information and detected motion information are determined, and the evaluation information corresponding to the preset visual perception algorithm is determined, wherein the evaluation information includes: first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm and Second evaluation information for algorithm stability.

In another embodiment of the present invention, the first obtaining module 310 is specifically configured to obtain an evaluation data set;

Label each point cloud data frame in the evaluation data set based on the pre-trained 3D data perception model, and mark the label frame information of each object corresponding to each point cloud data frame, so as to determine each object corresponding to each point cloud data frame The labeled position information and labeled pose information of each point cloud data frame are obtained, and the labeled pose information of each object corresponding to each point cloud data frame is obtained;

Based on the labeling position information and labeling attitude information of each object corresponding to each point cloud data frame in the evaluation data set, and the time sequence information between each point cloud data frame in the evaluation data set, determine each point cloud data frame corresponding to each object. The marked velocity information and the marked acceleration information of the object are used to obtain the object motion information of each object corresponding to each point cloud data frame, and the true value information corresponding to each object corresponding to each point cloud data frame is obtained.

In another embodiment of the present invention, the second obtaining module 320 is specifically configured to detect each image frame in the evaluation data set based on a preset visual perception algorithm, and obtain the corresponding image frame of each image frame. The detection frame information corresponding to each detected object is used to determine the detection position information and detection posture information of each detected detection object corresponding to each image frame, and the detection posture information of each detection object corresponding to each image frame is obtained. ;

Based on the preset visual perception algorithm and the detection position information and detection attitude information of each detection object corresponding to each image frame, the detection speed information and detection acceleration information of each detection object corresponding to each image frame are determined to obtain each image frame. The detection motion information of each detection object corresponding to the frame is obtained, and the detection information corresponding to each detection object corresponding to each image frame is obtained.

In another embodiment of the present invention, the ground truth information includes frame information of each object corresponding to each point cloud data frame, and the detection information includes detection of each detected object corresponding to each image frame Frame information, the detection frame information includes the two-dimensional position information of the corresponding detected object in the image frame;

The determining module 330 includes:

The first determination unit (not shown in the figure) is configured to, for each object corresponding to each point cloud data frame, based on the corresponding annotation frame information of the object, the position between the point cloud data frame acquisition device and the image frame acquisition device The conversion relationship and the internal reference information of the image frame acquisition device, determine the projection position information of the annotation frame corresponding to the object projected to the projection frame in the image frame corresponding to the point cloud data frame, as the projection frame position information corresponding to the object;

The second determination unit (not shown in the figure) is configured to, for each object corresponding to each point cloud data frame, based on the position information of the projection frame corresponding to each object and the position information of each detected object in the image frame corresponding to the point cloud data frame Two-dimensional position information, determine the matching projection frame position information and two-dimensional position information to determine the matching truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, wherein the matching projection The frame position information and the two-dimensional position information are: the projection frame position information and the two-dimensional position information of which the intersection ratio value of the corresponding frame exceeds the preset intersection ratio threshold;

The third determination unit (not shown in the figure) is configured to be based on the preset result accuracy evaluation rule, the matched truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, and the unmatched The detection information of the true value information and the true value information that does not match the detection information, determine the first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm;

The fourth determination unit (not shown in the figure) is configured to determine, based on the preset algorithm stability evaluation rule, the matched truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, The second evaluation information of the algorithm stability of the preset visual perception algorithm.

In another embodiment of the present invention, the detection posture information includes: detection position information and detection posture information determined by the detection frame information of each detected object corresponding to each image frame; the The detected motion information includes: the detected speed information and detected acceleration information of each detected object corresponding to each image frame; the labeled pose information includes: each object corresponding to each point cloud data frame through its labeled frame information The determined labeling position information and labeling attitude information; the object motion information includes: labeling speed information and labeling acceleration information of each object corresponding to each point cloud data frame;

The third determining unit is specifically configured to be based on the matching truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, the detection information not matching the truth information, and the detection information not matching The truth value information, determine the precision rate information and recall rate information of the detection result corresponding to the preset visual perception algorithm;

Based on the annotation position information included in the matched ground truth information corresponding to each point cloud data frame and its corresponding image frame, and the detection position information included in the detection information, determine the detection position error between the matched ground truth information and the detection information value;

Based on the annotation attitude information included in the matched ground truth information corresponding to each point cloud data frame and its corresponding image frame, and the detection attitude information included in the detection information, the detection attitude error between the matched ground truth information and the detection information is determined. value;

Based on the annotation velocity information included in the matched ground truth information corresponding to each point cloud data frame and its corresponding image frame, and the detection velocity information included in the detection information, the detection velocity error between the matched ground truth information and the detection information is determined. value;

Based on the labeled acceleration information included in the matched true value information corresponding to each point cloud data frame and its corresponding image frame, and the detected acceleration information included in the detected information, determine the detected acceleration error between the matched true value information and the detected information. value;

Based on the annotation frame information included in the matched ground truth information corresponding to each point cloud data frame and its corresponding image frame, and the detection frame information included in the detection information, determine the difference between the matching ground truth information and the detection frame between the detection information. Length and width error value;

Based on the target error value between the matched true value information and detection information and the preset error threshold corresponding to the target error value, an error curve corresponding to the target error value is drawn, wherein the horizontal axis of the error curve is the prediction value. Set the error threshold, the vertical axis of the error curve is the ratio of the number of target error values smaller than each preset error threshold in the target error value to the total amount of data in the evaluation data set, and the target error value is: the detection position Error value, detection attitude error value, detection speed error value, detection acceleration error value or length and width error value of detection frame;

Sort the target error values between the matched true value information and detection information according to the size of the value to obtain a sorting sequence corresponding to the target error value; determine the top one hundred in the sorting sequence corresponding to the target error value The first target error value with the largest numerical value among the target error values of the percentage, and the second target error value with the largest numerical value among the target error values of the first second percentage;

The precision rate information and recall rate information based on the detection result corresponding to the preset visual perception algorithm, the error curve corresponding to the target error information, the first target error value and the second target error value and/or the target error value , and determine the first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm.

In another embodiment of the present invention, the fourth determining unit is specifically configured to determine from the target error value based on the time series information between point cloud data frames or image frames in the evaluation data set The target error value corresponding to the same object;

For the target error values corresponding to different objects, based on the time series information between point cloud data frames or image frames in the evaluation data set, the target error value corresponding to the object and the preset curve fitting algorithm, the corresponding object is obtained by fitting. The fitting error curve corresponding to the target error value of , wherein the fitting error curve includes: the fitting error corresponding to the object obtained by fitting at the acquisition moment corresponding to each point cloud data frame or image frame;

For the target error value corresponding to different objects, based on the target error value corresponding to the object and the fitting error curve corresponding to the target error value corresponding to the object, the object corresponds to the acquisition time corresponding to each point cloud data frame or image frame. The fitting error is determined, and the second evaluation information representing the algorithm stability of the preset visual perception algorithm is determined.

In another embodiment of the present invention, the fourth determining unit is specifically configured to target error values corresponding to different objects, based on the target error values corresponding to the objects and the target error values corresponding to the objects. The fitting error of the object included in the fitting error curve corresponding to the acquisition time corresponding to each point cloud data frame or image frame, and the difference between the target error and the fitting error corresponding to the same acquisition time is calculated;

For the target error values corresponding to different objects, the target corresponding to the object is drawn based on the difference between the target error and the fitting error at each acquisition time corresponding to the object, and the preset difference threshold corresponding to the target error value corresponding to the object. The difference error curve corresponding to the error, wherein the horizontal axis of the difference error curve corresponding to the target error corresponding to the object is the preset difference threshold corresponding to the target error value corresponding to the object, and the difference value corresponding to the target error corresponding to the object The vertical axis of the value error curve is the ratio of the number of differences corresponding to the target error values corresponding to the object that are smaller than the preset difference thresholds, accounting for the total number of target error values corresponding to the object;

For the target error values corresponding to different objects, sort the difference values corresponding to the target error values corresponding to the objects according to the size of the value, and determine the first difference value with the largest value among the first third percentage differences in the sorting sequence. , and the second difference with the largest value among the differences of the first fourth percentage;

Based on the first difference value and the second difference value corresponding to the target error value corresponding to each object, and/or the difference value error curve corresponding to the target error value corresponding to each object, it is determined to characterize the preset visual perception algorithm The second evaluation information of the algorithm stability.

In another embodiment of the present invention, the device further includes:

A display module (not shown in the figure), configured to display the first evaluation information, the second evaluation information, the precision rate information and the recall rate information of the detection result corresponding to the preset visual perception algorithm, the The error curve corresponding to the target error information, the first target error value and the second target error value, the target error value, the determined target error value corresponding to the same object, and/or each obtained by fitting. The fitting error curve corresponding to the target error value corresponding to the object.

Claims

A method for evaluating a visual perception algorithm, wherein the method comprises:

Obtain the true value information corresponding to each object determined based on each point cloud data frame in the evaluation data set, wherein the true value information at least includes the labeled pose information and object motion information of the corresponding object, and each evaluation data includes the existence of corresponding Relational point cloud data frames and image frames;

Obtaining detection information corresponding to each detection object detected based on a preset visual detection algorithm and each image frame in the evaluation data set, wherein the detection information at least includes detection pose information and detection motion information of the corresponding detection object;

Based on preset result accuracy evaluation rules, preset algorithm stability evaluation rules, labeled pose information and object motion information in the ground truth information corresponding to each object, and detection pose information and detection information in the detection information corresponding to each detected object motion information, and determine the evaluation information corresponding to the preset visual perception algorithm, wherein the evaluation information includes: first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm and second evaluation of the stability of the algorithm information.
The method according to claim 1, wherein the process of obtaining the truth value information corresponding to each object determined based on each point cloud data frame in the evaluation data set comprises:

Obtain the evaluation data set;

Label each point cloud data frame in the evaluation data set based on the pre-trained 3D data perception model, and mark the label frame information of each object corresponding to each point cloud data frame, so as to determine each object corresponding to each point cloud data frame The labeled position information and labeled pose information of each point cloud data frame are obtained, and the labeled pose information of each object corresponding to each point cloud data frame is obtained;

Based on the labeling position information and labeling attitude information of each object corresponding to each point cloud data frame in the evaluation data set, and the time sequence information between each point cloud data frame in the evaluation data set, determine each point cloud data frame corresponding to each object. The marked velocity information and the marked acceleration information of the object are used to obtain the object motion information of each object corresponding to each point cloud data frame, and the true value information corresponding to each object corresponding to each point cloud data frame.
The method of claim 1, wherein the step of obtaining detection information corresponding to each object detected based on a preset visual perception algorithm and each image frame in the evaluation data set comprises:

Based on a preset visual perception algorithm, each image frame in the evaluation data set is detected, and the detection frame information corresponding to each detected object corresponding to each image frame is obtained, so as to determine the detected detected object corresponding to each image frame. The detection position information and detection posture information of each detection object are obtained, and the detection posture information of each detection object corresponding to each image frame is obtained;

Based on the preset visual perception algorithm and the detection position information and detection attitude information of each detection object corresponding to each image frame, the detection speed information and detection acceleration information of each detection object corresponding to each image frame are determined to obtain each image frame. The detection motion information of each detection object corresponding to the frame is obtained, and the detection information corresponding to each detection object corresponding to each image frame is obtained.
The method according to claim 1, wherein the ground-truth information includes frame information of each object corresponding to each point cloud data frame, and the detection information includes each detected detection information corresponding to each image frame The detection frame information of the object, the detection frame information includes the two-dimensional position information of the corresponding detected object in the image frame;

The accuracy evaluation rules based on the preset results, the preset algorithm stability evaluation rules, the labeled pose information and object motion information in the true value information corresponding to each object, and the detected pose information in the detection information corresponding to each detected object and detecting motion information, the steps of determining the evaluation information corresponding to the preset visual perception algorithm, including:

For each object corresponding to each point cloud data frame, based on the corresponding annotation frame information of the object, the position conversion relationship between the point cloud data frame acquisition device and the image frame acquisition device, and the internal parameter information of the image frame acquisition device, determine the Projection position information of the annotation frame corresponding to the object projected to the projection frame in the image frame corresponding to the point cloud data frame, as the position information of the projection frame corresponding to the object;

For each object corresponding to each point cloud data frame, based on the position information of the projection frame corresponding to each object and the two-dimensional position information of each detected object in the image frame corresponding to the point cloud data frame, the matching position information of the projection frame and the two-dimensional position information of the detected object are determined. dimensional position information to determine the matching truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, wherein the matching projection frame position information and two-dimensional position information are: the intersection of the corresponding frame The position information and two-dimensional position information of the projection frame whose sum ratio value exceeds the preset intersection ratio threshold;

Based on the preset result accuracy evaluation rules, the matched ground truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, the detection information that does not match the true value information, and the true value that does not match the detection information information, and determine the first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm;

Based on the preset algorithm stability evaluation rules, the matched truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, determine the second evaluation information representing the algorithm stability of the preset visual perception algorithm .
The method according to claim 4, wherein the detection pose information comprises: detection position information and detection attitude information of each detected object corresponding to each image frame and determined by its detection frame information ; The detection motion information includes: the detection speed information and the detection acceleration information of the detected objects corresponding to each image frame; the marked pose information includes: the passing of each object corresponding to each point cloud data frame through its The labeling position information and labeling attitude information determined by the labeling frame information; the object motion information includes: labeling speed information and labeling acceleration information of each object corresponding to each point cloud data frame;

The accuracy evaluation rules based on the preset results, the matching truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, the detection information that does not match the truth information, and the detection information that does not match the detection information. True value information, the step of determining the first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm, including:

Based on the matched ground truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, the detected information not matched to the ground truth information, and the ground truth information not matched with the detected information, the preset visual information is determined. The precision information and recall rate information of the detection results corresponding to the perception algorithm;

Based on the annotation position information included in the matched ground truth information corresponding to each point cloud data frame and its corresponding image frame, and the detection position information included in the detection information, determine the detection position error between the matched ground truth information and the detection information value;

Based on the annotation attitude information included in the matched ground truth information corresponding to each point cloud data frame and its corresponding image frame, and the detection attitude information included in the detection information, the detection attitude error between the matched ground truth information and the detection information is determined. value;

Based on the annotation velocity information included in the matched ground truth information corresponding to each point cloud data frame and its corresponding image frame, and the detection velocity information included in the detection information, the detection velocity error between the matched ground truth information and the detection information is determined. value;

Based on the labeled acceleration information included in the matched true value information corresponding to each point cloud data frame and its corresponding image frame, and the detected acceleration information included in the detected information, determine the detected acceleration error between the matched true value information and the detected information. value;

Based on the annotation frame information included in the matched ground truth information corresponding to each point cloud data frame and its corresponding image frame, and the detection frame information included in the detection information, determine the difference between the matching ground truth information and the detection frame between the detection information. Length and width error value;

Based on the target error value between the matched true value information and detection information and the preset error threshold corresponding to the target error value, an error curve corresponding to the target error value is drawn, wherein the horizontal axis of the error curve is the prediction value. Set the error threshold, the vertical axis of the error curve is the ratio of the number of target error values smaller than each preset error threshold in the target error value to the total amount of data in the evaluation data set, and the target error value is: the detection position Error value, detection attitude error value, detection speed error value, detection acceleration error value or length and width error value of detection frame;

Sort the target error values between the matched true value information and detection information according to the size of the value to obtain a sorting sequence corresponding to the target error value; determine the top one hundred in the sorting sequence corresponding to the target error value The first target error value with the largest numerical value among the target error values of the fractional ratio, and the second target error value with the largest numerical value among the target error values of the first second percentage;

The precision rate information and recall rate information based on the detection result corresponding to the preset visual perception algorithm, the error curve corresponding to the target error information, the first target error value and the second target error value and/or the target error value , and determine the first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm.
The method according to claim 5, characterized in that, based on a preset algorithm stability evaluation rule, the matched truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, it is determined to characterize the The steps of presetting the second evaluation information of the algorithm stability of the visual perception algorithm include:

Determine the target error value corresponding to the same object from the target error value based on the time sequence information between point cloud data frames or image frames in the evaluation data set;

For the target error values corresponding to different objects, based on the time series information between point cloud data frames or image frames in the evaluation data set, the target error value corresponding to the object and the preset curve fitting algorithm, the corresponding object is obtained by fitting. The fitting error curve corresponding to the target error value of , wherein the fitting error curve includes: the fitting error corresponding to the object obtained by fitting at the acquisition moment corresponding to each point cloud data frame or image frame;

For the target error value corresponding to different objects, based on the target error value corresponding to the object and the fitting error curve corresponding to the target error value corresponding to the object, the object corresponds to the acquisition time corresponding to each point cloud data frame or image frame. The fitting error is determined, and the second evaluation information representing the algorithm stability of the preset visual perception algorithm is determined.
6. The method of claim 6, wherein the target error value corresponding to different objects is based on the target error value corresponding to the object, and the fitting error curve corresponding to the target error value corresponding to the object includes the The steps of determining the second evaluation information representing the algorithm stability of the preset visual perception algorithm according to the fitting error of the object at the time of acquisition corresponding to each point cloud data frame or image frame, including:

For the target error value corresponding to different objects, based on the target error value corresponding to the object and the fitting error curve corresponding to the target error value corresponding to the object, the object corresponds to the acquisition time corresponding to each point cloud data frame or image frame. The fitting error is calculated, and the difference between the target error and the fitting error corresponding to the same acquisition moment is calculated;

For the target error values corresponding to different objects, the target corresponding to the object is drawn based on the difference between the target error and the fitting error at each acquisition time corresponding to the object, and the preset difference threshold corresponding to the target error value corresponding to the object. The difference error curve corresponding to the error, wherein the horizontal axis of the difference error curve corresponding to the target error corresponding to the object is the preset difference threshold corresponding to the target error value corresponding to the object, and the difference value corresponding to the target error corresponding to the object The vertical axis of the value error curve is the ratio of the number of differences corresponding to the target error values corresponding to the object that are smaller than the preset difference thresholds, accounting for the total number of target error values corresponding to the object;

For the target error values corresponding to different objects, sort the difference values corresponding to the target error values corresponding to the objects according to the size of the value, and determine the first difference value with the largest value among the first third percentage differences in the sorting sequence. , and the second difference with the largest value among the differences of the first fourth percentage;

Based on the first difference value and the second difference value corresponding to the target error value corresponding to each object, and/or the difference value error curve corresponding to the target error value corresponding to each object, it is determined to characterize the preset visual perception algorithm The second evaluation information of the algorithm stability.
The method according to any one of claims 1-7, wherein the method further comprises:

Display the first evaluation information, the second evaluation information, the precision rate information and recall rate information of the detection result corresponding to the preset visual perception algorithm, the error curve corresponding to the target error information, and the first target The error value and the second target error value, the target error value, the determined target error value corresponding to the same object and/or the fitting error curve corresponding to the target error value corresponding to each object obtained by fitting.
A visual perception algorithm evaluation device, characterized in that the device comprises:

The first obtaining module is configured to obtain the truth value information corresponding to each object determined based on each point cloud data frame in the evaluation data set, wherein the truth value information at least includes the marked pose information and object motion information of the corresponding object , and each evaluation data includes a point cloud data frame and an image frame that have a corresponding relationship;

The second obtaining module is configured to obtain the detection information corresponding to each detection object detected based on the preset visual detection algorithm and each image frame in the evaluation data set, wherein the detection information at least includes the detection of the corresponding detection object Pose information and detection motion information;

The determination module is configured to be based on the preset result accuracy evaluation rules, the preset algorithm stability evaluation rules, the marked pose information and object motion information in the true value information corresponding to each object, and the detection information corresponding to each detected object. Detecting pose information and detecting motion information, and determining evaluation information corresponding to the preset visual perception algorithm, wherein the evaluation information includes: first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm and an algorithm Second evaluation information for stability.
The apparatus of claim 9, wherein the first obtaining module is specifically configured to

Obtain the evaluation data set;

Label each point cloud data frame in the evaluation data set based on the pre-trained 3D data perception model, and mark the label frame information of each object corresponding to each point cloud data frame, so as to determine each object corresponding to each point cloud data frame The labeled position information and labeled pose information of each point cloud data frame are obtained, and the labeled pose information of each object corresponding to each point cloud data frame is obtained;

Based on the labeling position information and labeling attitude information of each object corresponding to each point cloud data frame in the evaluation data set, and the time sequence information between each point cloud data frame in the evaluation data set, determine each point cloud data frame corresponding to each object. The marked velocity information and the marked acceleration information of the object are used to obtain the object motion information of each object corresponding to each point cloud data frame, and the true value information corresponding to each object corresponding to each point cloud data frame.