WO2022037387A1 - Visual perception algorithm evaluation method and device - Google Patents

Visual perception algorithm evaluation method and device Download PDF

Info

Publication number
WO2022037387A1
WO2022037387A1 PCT/CN2021/109529 CN2021109529W WO2022037387A1 WO 2022037387 A1 WO2022037387 A1 WO 2022037387A1 CN 2021109529 W CN2021109529 W CN 2021109529W WO 2022037387 A1 WO2022037387 A1 WO 2022037387A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
detection
frame
point cloud
evaluation
Prior art date
Application number
PCT/CN2021/109529
Other languages
French (fr)
Chinese (zh)
Inventor
何杰
王旭
申艺华
李俊
董维山
Original Assignee
魔门塔(苏州)科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 魔门塔(苏州)科技有限公司 filed Critical 魔门塔(苏州)科技有限公司
Publication of WO2022037387A1 publication Critical patent/WO2022037387A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition

Definitions

  • the present invention relates to the technical field of algorithm evaluation, and in particular, to a method and device for evaluating a visual perception algorithm.
  • Visual perception algorithms are the core components of autonomous driving systems, face recognition systems, and identity verification systems.
  • the accuracy of the perception results of the visual perception algorithms affects the accuracy of the output results of the above systems to a certain extent.
  • the performance of the visual perception algorithm needs to be evaluated before the visual perception algorithm is actually applied.
  • the present invention provides an evaluation method and device for a visual perception algorithm, so as to realize a comprehensive evaluation of the performance of the visual perception algorithm.
  • each point cloud data frame in the evaluation data set based on the pre-trained 3D data perception model, obtain the label frame information of each object corresponding to each point cloud data frame, and then calculate the corresponding data frame of each point cloud data frame.
  • the labeled position information and labeled attitude information of the object combined with the timing information between each point cloud data frame in the evaluation data, determine the speed information and acceleration information of each object corresponding to each point cloud data frame, and obtain the corresponding point cloud data frame.
  • the three-dimensional information including the labeling frame information, labeling position information, labeling attitude information, labeling speed information and labeling acceleration information of each object, realizes automatic labeling of the 3D information about the labelled objects, and saves labor costs.
  • FIG. 1 is a schematic flowchart of a method for evaluating a visual perception algorithm according to an embodiment of the present invention
  • FIG. 2 is an exemplary diagram of an error curve corresponding to a target error value provided by an embodiment of the present invention
  • FIG. 3 is a schematic structural diagram of an apparatus for evaluating a visual perception algorithm according to an embodiment of the present invention.
  • the present invention provides an evaluation method and device for a visual perception algorithm, so as to realize a comprehensive evaluation of the performance of the visual perception algorithm.
  • the embodiments of the present invention will be described in detail below.
  • FIG. 1 is a schematic flowchart of a method for evaluating a visual perception algorithm according to an embodiment of the present invention. The method may include the following steps:
  • S101 Obtain truth value information corresponding to each object determined based on each point cloud data frame in the evaluation data set.
  • the ground truth information includes at least the labeled pose information and object motion information of the corresponding object, and each evaluation data includes a point cloud data frame and an image frame that have a corresponding relationship.
  • the point cloud data frame may be a data frame collected by a lidar sensor, and the image frame may be an image frame collected by an image collection device.
  • the evaluation method of the visual perception algorithm provided by the embodiment of the present invention can be applied to any electronic device with computing capability, and the electronic device can be a terminal or a server.
  • the labeled pose information and object motion information of the corresponding object included in the ground truth information may be information based on three-dimensional space, such as the pose information and motion information in the device coordinate system of the device that acquired the point cloud data frame.
  • the information can also be pose information and motion information in a preset space rectangular coordinate system, which are all possible, wherein the preset space rectangular coordinate system can be a world coordinate system or an image acquisition device coordinate.
  • the labeled pose information may include labeled position information and labeled attitude information
  • the object motion information may include but not limited to: the speed information and acceleration information of the labeled object, for the sake of clarity, it can be called based on each point cloud in the evaluation data set
  • the speed information included in the object motion information determined by the data frame is the marked speed information
  • the acceleration information is the marked acceleration information.
  • the visual perception algorithm may be a visual perception algorithm applied in an automatic driving system.
  • each evaluation data included in the evaluation data set may be the evaluation data collected by the target vehicle during the driving process.
  • An evaluation data includes point cloud data frames and image frames that have a corresponding relationship, and the existing corresponding relationship may refer to: point cloud data frames and image frames collected in the same collection period.
  • the above-mentioned lidar sensor and image acquisition device may both be installed in the target vehicle.
  • the visual perception algorithm may be a visual perception algorithm applied in an automatic driving system
  • the above objects may include, but are not limited to, vehicles, pedestrians, and the like.
  • the labeled position information in the labeled pose information of the corresponding object included in the ground truth information may refer to: the position information of the center point of the vehicle, the position information of the center point of the rear of the vehicle, or The position information of the center point of the front of the car, this is all possible.
  • the labeled pose information in the labeled pose information of the corresponding object included in the true value information may refer to: the angle information of the coordinate axes corresponding to the coordinate system where the vehicle is located during the driving process, including: pitch angle information, roll Corner information and offset angle information.
  • the pitch angle information and roll angle information generated when the vehicle is running on the ground may not be considered, that is, the pitch angle information and the roll angle information generated when the vehicle is running on the ground are considered to be zero.
  • the evaluation data in the evaluation data set may include: evaluation data collected for normal driving scenarios, or evaluation data sets collected for large vehicles or special-shaped vehicle scenarios, or evaluation data sets collected for pedestrians, complex intersections, and specific weather The evaluation data collected by the conditions are all possible.
  • the electronic device can directly obtain the ground truth information corresponding to each object determined based on each point cloud data frame in the evaluation data set and sent by other devices.
  • S101 may include the following steps 011-013:
  • 012 Label each point cloud data frame in the evaluation data set based on the pre-trained 3D data perception model, and mark the label frame information of each object corresponding to each point cloud data frame, so as to determine each object corresponding to each point cloud data frame
  • the labeled position information and labeled pose information of each point cloud data frame are obtained, and the labeled pose information of each object corresponding to each point cloud data frame is obtained;
  • 013 Based on the labeled position information and labeled attitude information of each object corresponding to each point cloud data frame in the evaluation data set, and the time sequence information between each point cloud data frame in the evaluation data set, determine the corresponding object of each point cloud data frame. The velocity information and the acceleration information are marked to obtain the object motion information of each object corresponding to each point cloud data frame, and the true value information corresponding to each object corresponding to each point cloud data frame is obtained.
  • the electronic device can directly obtain the evaluation data set, wherein the evaluation data set includes multiple evaluation data; the electronic device inputs the point cloud data frame included in each evaluation data in the evaluation data set into the pre-trained 3D data perception
  • the model uses a pre-trained 3D data perception model to detect each object in each point cloud data frame, and mark it with a label frame to obtain the label frame information of each object corresponding to each point cloud data frame, wherein the label
  • the box can be a cube.
  • the annotation frame information of each object includes: information that can represent the length, width and height of the object, and information that can represent the pose information of the object.
  • the electronic device converts the labeling frame information of each object corresponding to each point cloud data frame output by the pre-trained 3D data perception model to obtain the labeling position information of each object corresponding to each frame of point cloud data frame. and labeling pose information.
  • the pre-trained three-dimensional data perception model may be: a neural network model trained based on the sample point cloud data frame and its corresponding calibration information including the calibration frame information corresponding to each object in the sample point cloud data frame, and the specific model training For the process, please refer to the training process of the model in the related art, which will not be repeated here.
  • the evaluation data in the evaluation data set is generally the data obtained by continuous collection, that is, the point cloud data frames in the evaluation data set are continuous frames, and the image frames are continuous frames.
  • the electronic device can determine the location of each point cloud data frame based on the marked position information and marked attitude information of each object corresponding to each point cloud data frame in the evaluation data set, and the timing information between each point cloud data frame in the evaluation data set.
  • the marked speed information and marked acceleration information of each object corresponding to the marked speed information and marked acceleration information of each object.
  • the labeled position information of each object may include horizontal position information, vertical position information, and radial position information, and further, based on the horizontal position information of each object corresponding to each point cloud data frame, and each point cloud data frame
  • the time series information between each object can be determined to obtain the marked lateral velocity information and marked lateral acceleration information; based on the longitudinal position information of each object corresponding to each point cloud data frame, and the time series information between each point cloud data frame, it can be Determine the marked longitudinal velocity information and marked longitudinal acceleration information of each object; based on the radial position information of each object corresponding to each point cloud data frame, and the time sequence information between each point cloud data frame, the annotation of each object can be determined.
  • Radial velocity information and labeled radial acceleration information are examples of Radial velocity information and labeled radial acceleration information.
  • S102 Obtain detection information corresponding to each detection object detected based on a preset visual detection algorithm and each image frame in the evaluation data set.
  • the detection information includes at least detection pose information and detection motion information of the corresponding detected object.
  • the detection information corresponding to each object in the image can be detected from the image frame by using the preset visual detection algorithm.
  • the object detected from the image frame by the preset visual detection algorithm can be called a detection object.
  • the detection information may include two-dimensional information and three-dimensional information corresponding to the object; wherein, the two-dimensional information corresponding to the object may include: two-dimensional position information and two-dimensional speed information of the object in the image frame; the three-dimensional information corresponding to the object may include But not limited to: the pose information of the object in the specified space rectangular coordinate system, that is, the detected pose information, and the detected motion information, the detected motion information includes but is not limited to: the detected speed information and detected acceleration information of the corresponding object.
  • the electronic device can directly obtain the detection information corresponding to each detected object detected in each image frame in the evaluation data set based on the preset visual detection algorithm and sent by other devices.
  • S102 may include the following steps 021-022:
  • the preset visual perception algorithm is pre-stored locally on the electronic device or in a connected storage device, and after obtaining the evaluation data set, the electronic device can, based on the preset visual perception algorithm, analyze each image frame in the evaluation data set. Perform detection to obtain detection frame information corresponding to each detected object corresponding to each image frame; wherein, the detection frame information includes: information that can represent the length, width and height of the corresponding detected object, and information that can represent the corresponding detected object.
  • the information of the pose information may also include information representing the two-dimensional position information of the corresponding detected object in the corresponding image frame.
  • the electronic device determines the detection position information and detection posture information of each detected object corresponding to each image frame based on a preset visual perception algorithm and detection frame information corresponding to each detected object corresponding to each image frame. And based on the preset visual perception algorithm and the detection position information and detection attitude information of each detection object corresponding to each image frame, the detection speed information and detection acceleration information of each detection object corresponding to each image frame are determined to obtain each image frame. The detected motion information of each detected object corresponding to the image frame.
  • the detection position information of each detected object may include lateral position information, longitudinal position information and radial position information, and further, based on the lateral position information of each detected object and the timing information between each image frame , the detected lateral velocity information and detected lateral acceleration information of each detected object can be determined; based on the longitudinal position information of each detected object and the timing information between each image frame, the detected longitudinal velocity information and Detecting longitudinal acceleration information; based on the radial position information of each detected object and the timing information between each image frame, the detected radial velocity information and detected radial acceleration information of each detected object can be determined.
  • S103 Based on preset result accuracy evaluation rules, preset algorithm stability evaluation rules, labeled pose information and object motion information in the true value information corresponding to each object, and detection pose information in the detection information corresponding to each detected object and detecting motion information to determine the evaluation information corresponding to the preset visual perception algorithm.
  • the evaluation information includes: first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm and second evaluation information regarding the stability of the algorithm.
  • the electronic device may, based on the preset result accuracy evaluation rules, process the true value information and the detection information that have a corresponding relationship to obtain first evaluation information representing the accuracy of the detection results of the preset visual perception algorithm;
  • the algorithm stability evaluation rule is preset, and the truth information and detection information that have a corresponding relationship are processed to obtain second evaluation information representing the algorithm stability of the preset visual perception algorithm.
  • the preset result accuracy evaluation rules may include but are not limited to: a specific test result accuracy evaluation index, and a process of instructing to determine the result corresponding to the specific test result accuracy evaluation index based on the true value information and the test information; the preset algorithm
  • the stability evaluation rule may include, but is not limited to, a specific algorithm stability evaluation index, and a process indicating a result corresponding to the algorithm stability evaluation index determined based on the truth information and detection information.
  • Applying the embodiments of the present invention it can be based on the preset result accuracy evaluation rules, the preset algorithm stability evaluation rules, the labeled pose information and object motion information in the true value information corresponding to each object, and the detection information corresponding to each detected object.
  • the first evaluation information for the accuracy of the multi-faceted detection results of the preset visual perception algorithm and the second evaluation information for the stability of the algorithm are determined.
  • the accuracy of the multi-faceted detection results and the The stability of the detection results of the algorithm is used to evaluate the preset visual perception algorithm, so as to realize a comprehensive evaluation of the performance of the visual perception algorithm.
  • the ground truth information includes the labeling frame information of each object corresponding to each point cloud data frame
  • the detection information includes the detection frame information of each detected object corresponding to each image frame
  • the detection The frame information includes the two-dimensional position information of the corresponding detected object in the image frame
  • the S103 may include the following steps 031-034:
  • the matched projection frame position information and two-dimensional position information are: projection frame position information and two-dimensional position information for which the intersection ratio value of the corresponding frame exceeds a preset intersection ratio threshold.
  • the truth value information is to determine the first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm.
  • the electronic device before determining the evaluation information corresponding to the preset visual perception algorithm based on the true value information and the detection information, the electronic device needs to match the true value information and the detection information, so as to pass the matched truth value information and the detection information. , to determine the evaluation information corresponding to the preset visual perception algorithm.
  • the electronic device may, for the object corresponding to each point cloud data frame, based on the corresponding labeling frame information of the object and the positional conversion relationship between the point cloud data frame acquisition device and the image frame acquisition device, label the object corresponding to the object.
  • the annotation frame corresponding to the frame information is converted from the coordinate system of the point cloud data frame acquisition device where it is located to the coordinate system of the image frame acquisition device, and the coordinates of the annotation frame corresponding to the annotation frame information corresponding to the object in the image frame acquisition device are obtained. Then, based on the position information of the label frame corresponding to the label frame information corresponding to the object in the coordinate system of the image frame acquisition device and the internal parameter information of the image frame capture device, the label frame corresponding to the object is projected to the In the image frame corresponding to the point cloud data frame, the projection position information of the annotation frame corresponding to the object projected to the projection frame in the image frame corresponding to the point cloud data frame is determined as the position information of the projection frame corresponding to the object.
  • the position information of the projection frame corresponding to each object and the image corresponding to the point cloud data frame can be used.
  • the two-dimensional position information of each detected object in the frame is calculated, and the intersection ratio between the annotation frame corresponding to each object and the two-dimensional detection frame of each detected object in the image frame corresponding to the point cloud data frame is calculated, that is, the corresponding value of each object is calculated.
  • the position information is the matched projection frame position information and two-dimensional position information.
  • the corresponding objects in the point cloud data frame A include object 1, object 2, and object 3;
  • the detected objects corresponding to the image frame a corresponding to the point cloud data frame A include detected object 1, detected object 2, and detected object 3 and detection object 4;
  • for the object 1 corresponding to the point cloud data frame A based on the position information of the projection frame corresponding to the object 1 and the two-dimensional position information of the detection object 1, calculate the projection frame corresponding to the object 1 and the detection object 1 corresponding to the two The intersection ratio between the two-dimensional detection frames; based on the position information of the projection frame corresponding to object 1 and the two-dimensional position information of detection object 2, calculate the intersection between the projection frame corresponding to object 1 and the two-dimensional detection frame corresponding to detection object 2 Parallel ratio; based on the position information of the projection frame corresponding to object 1 and the two-dimensional position information of detection object 3, calculate the intersection ratio between the projection frame corresponding to object 1 and the two-dimensional detection frame corresponding to detection object 3;
  • the intersection ratio between the projection frame corresponding to the object 2 and the two-dimensional detection frame corresponding to the detection object 1 is calculated; the projection frame corresponding to the object 2 and the detection object are obtained.
  • the intersection ratio between the two-dimensional detection frames corresponding to 2; the intersection ratio between the projection frame corresponding to object 2 and the two-dimensional detection frame corresponding to detection object 3; the projection frame corresponding to object 2 and the two-dimensional detection frame corresponding to detection object 4 The intersection ratio between the dimensional detection boxes.
  • the intersection ratio between the projection frame corresponding to the object 3 and the two-dimensional detection frame corresponding to the detection object 1 is calculated; The intersection ratio between the two-dimensional detection frames; the intersection ratio between the projection frame corresponding to object 3 and the two-dimensional detection frame corresponding to detection object 3; the projection frame corresponding to object 3 and the two-dimensional detection frame corresponding to detection object 4 The intersection and comparison between.
  • each intersection ratio that is, the ratio and the preset intersection ratio threshold; if: if the intersection ratio between the projection frame corresponding to object 1 and the two-dimensional detection frame corresponding to detection object 3 exceeds the preset intersection ratio threshold , the position information of the projection frame corresponding to object 1 and the two-dimensional position information corresponding to detection object 3 are determined to be matching projection frame position information and two-dimensional position information.
  • the true value information corresponding to object 1 corresponds to detection object 3
  • the detection information is the matching truth information and detection information.
  • object 3 corresponds to The intersection ratio between the projection frame corresponding to the detection object 3 and the two-dimensional detection frame corresponding to the detection object 3, and the intersection ratio between the projection frame corresponding to the object 3 and the two-dimensional detection frame corresponding to the detection object 4 do not exceed the preset intersection ratio.
  • the threshold value it is determined that there is no detection object that is the same physical object as object 3 in detection objects 1-4, that is, the true value information corresponding to object 3 is the true value information that does not match the detection information.
  • object 3 is the missing object.
  • intersection ratio between the projection frame corresponding to object 1 and the two-dimensional detection frame corresponding to detection object 4 the intersection ratio between the projection frame corresponding to object 2 and the two-dimensional detection frame corresponding to detection object 4, object 3 corresponds to If the intersection ratio between the projection frame and the two-dimensional detection frame corresponding to the detection object 4 does not exceed the preset intersection ratio threshold, it is determined that there is no object that is the same physical object as the detection object 4 in the objects 1-3. That is, the detection information corresponding to the detection object 4 is the detection information that does not match the true value information, and accordingly, the detection object 4 may be called a false detection object.
  • the electronic device may be based on the preset result accuracy evaluation rules, the matching truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, the detection information that does not match the truth information, and the unmatched information.
  • the truth information and detection information of the preset visual perception algorithm determine the second evaluation information representing the algorithm stability of the preset visual perception algorithm.
  • the detected pose information includes: the detected position information and the detected posture information of the detected objects corresponding to each image frame and determined by their detection frame information;
  • the detected motion information includes: : the detected speed information and detected acceleration information of each detected object corresponding to each image frame;
  • the labeling pose information includes: labeling position information and labeling posture information of each object corresponding to each point cloud data frame through its labeling frame information;
  • object motion information includes: labeling speed information of each object corresponding to each point cloud data frame and marking acceleration information;
  • the 033 may include the following steps 0331-0339:
  • 0333 Determine the detection between the matching truth information and the detection information based on the annotation attitude information included in the matched truth information corresponding to each point cloud data frame and its corresponding image frame and the detection attitude information included in the detection information Attitude error value.
  • 0336 Determine the detection between the matched true value information and the detection information based on the annotation frame information included in the matched true value information corresponding to each point cloud data frame and its corresponding image frame and the detection frame information included in the detection information The length and width error value of the box.
  • the horizontal axis of the error curve is the preset error threshold
  • the vertical axis of the error curve is the ratio of the number of target error values smaller than each preset error threshold among the target error values to the total amount of data in the evaluation data set
  • the target error value is : Detect position error value, detect attitude error value, detect speed error value, detect acceleration error value or detect frame length and width error value;
  • the preset visual perception algorithm can be used to detect the 2D information and 3D information of each detected object from the image frame, including the two-dimensional position information of each detected object in the image frame, the detection object in the specified space The detection position information, the detection attitude information, the detection speed information and the detection acceleration information in the Cartesian coordinate system.
  • the true value information includes various dimensional labeling parameters of the corresponding object, which may include but are not limited to the labeling position information of the corresponding object, Annotate attitude, annotate velocity information, and annotate acceleration information.
  • the preset result accuracy evaluation rules may include rules indicating the determination of the precision information and the recall information of the detection results.
  • the electronic device may determine the preset precision information and the preset recall rate according to the determination method. method, based on the matching truth information and detection information corresponding to the one point cloud data frame and its corresponding image frame, the detection information that does not match the truth information, and the truth information that does not match the detection information, determine the preset visual perception
  • the precision information and recall rate information of the detection results corresponding to the algorithm are used as an evaluation index for evaluating the accuracy of the detection result corresponding to the preset visual perception algorithm.
  • the method for determining the preset precision rate information and the method for determining the preset recall rate may refer to the method for determining the accuracy rate information and the method for determining the recall rate in the related art, which will not be repeated here.
  • an evaluation index for the accuracy of the detection result corresponding to the preset visual perception algorithm is added, in order to draw a special form of error curve, and then use the error curve to evaluate the detection corresponding to the preset visual perception algorithm Accuracy of results.
  • Another new evaluation index for the accuracy of the detection results corresponding to the preset visual perception algorithm is: counting the number of error values corresponding to different proportions of the error values in the same dimension, and then, based on the statistics, among the error values in the same dimension The number of error values corresponding to different proportions is used to evaluate the accuracy of the detection results corresponding to the preset visual perception algorithm.
  • each error value corresponding to different dimensions is calculated based on the detection information of the matched truth information:
  • the detection position information included in the matched ground truth information corresponding to each point cloud data frame and its corresponding image frame determines the detection position error between the matched ground truth information and the detection information value. That is, for each matched true value information and detection information, unify the coordinate system of the marked position information and the detected position information, and then calculate the detected position error between the marked position information after the unified coordinate system and the detected position information. value; wherein, the detected position error value may be an absolute error value and/or a relative error value between the marked position information and the detected position information.
  • the detection attitude error between the matched ground truth information and the detection information is determined. value, that is, for each matched true value information and detection information, unify the coordinate system of the marked attitude information and the detected attitude information, and then calculate the detection between the marked attitude information and the detected attitude information after the unified coordinate system.
  • Attitude error value wherein, the detected attitude error value may be an absolute error value and/or a relative error value between the marked attitude information and the detected attitude information.
  • the detection velocity error between the matched ground truth information and the detection information is determined.
  • value that is, for each matched true value information and detection information, unify the coordinate system of the labeling speed information and detection speed information, and then calculate the detection speed between the labeling speed information and the detection speed information after the unified coordinate system.
  • Speed error value wherein, the detected speed error value may be an absolute error value and/or a relative error value between the marked speed information and the detected speed information.
  • the detected acceleration error Based on the labeled acceleration information included in the matched true value information corresponding to each point cloud data frame and its corresponding image frame, and the detected acceleration information included in the detected information, determine the detected acceleration error between the matched true value information and the detected information. value, that is, for each matched true value information and detection information, unify the coordinate system of the marked acceleration information and the detected acceleration information, and then calculate the detection between the marked acceleration information and the detected acceleration information after the unified coordinate system. Acceleration error value; wherein, the detected acceleration error value may be an absolute error value and/or a relative error value between the marked acceleration information and the detected acceleration information.
  • the detection frame information included in the detection information determines the difference between the matching ground truth information and the detection frame between the detection information.
  • Length and width error value that is, for each matched true value information and detection information, unify the scale between the annotation frame information and detection frame information, and calculate the annotation frame based on the unified scale information and detection frame information.
  • the error between the length and width of the detection frame as the length and width error value of the detection frame, where the length and width error value of the detection frame can be the absolute error value of the length and width between the label frame and the detection frame and/ or relative error value. In one case, the error between the heights of the label box and the detection box can also be calculated.
  • the electronic device may sequentially use the above-determined detection position error value, detection attitude error value, detection speed error value, detection acceleration error value, and length and width error value of the detection frame as the target error value;
  • the target error value between the true value information and the detection information and the preset error threshold corresponding to the target error value, and the error curve corresponding to the target error value is drawn. Specifically, for each preset error threshold, count the number of target error values that are smaller than the preset error threshold among the target error values, and calculate the number of target error values that are smaller than the preset error threshold among the target error values.
  • the vertical axis of the error curve corresponding to the error value, and the error curve corresponding to the target error value is drawn.
  • the preset error threshold includes a plurality of preset error thresholds, and the preset error threshold can be set from 0 and increase sequentially.
  • FIG. 2 it is an example diagram of the error curve corresponding to the drawn target error value.
  • the electronic device sorts the target error values between the matched true value information and detection information according to the size of the numerical value, and obtains a sorting sequence corresponding to the target error value;
  • the first target error value may be referred to as 1sigma, and the second target error value may be referred to as 2sigma.
  • the top first percentage may be the top 68.26% of the sorted sequences
  • the top second percentage may be the top 95.44% of the sorted sequences.
  • the electronic device may be based on the precision rate information and recall rate information of the detection result corresponding to the preset visual perception algorithm, the error curve corresponding to the target error information, the first target error value, the second target error value and/or the target error value. , and determine the first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm.
  • AUC Area Under Curve
  • the target error value is the detection position error value, which measures the position judgment ability of the preset visual perception algorithm.
  • the target error value is the detection attitude error value, which measures the attitude judgment ability of the preset visual perception algorithm, and the larger the area under the curve in the error curve corresponding to the detection attitude error value, it can indicate the preset visual perception The better the location judgment ability of the algorithm.
  • the smaller the determined values of the first target error value and the second target error value the smaller the value of the preset visual perception algorithm. The better the performance, the higher the accuracy of the detection results.
  • the evaluation index for the accuracy of the detection result corresponding to the preset visual perception algorithm may also include, but is not limited to, the PR curve and the proportion of false detections and missed detections of objects in the detection results in the entire detection results and other metrics, wherein the PR curve is the precision rate information and the recall rate information curve, which is a curve drawn with the recall rate information as the abscissa axis and the precision rate information as the ordinate axis.
  • the embodiment of the present invention not only evaluates the accuracy of the detection result corresponding to the preset visual perception algorithm, but also provides an evaluation of the algorithm stability of the preset visual perception algorithm .
  • the 034 may include the following steps:
  • the fitting error curve includes: fitting errors corresponding to the object obtained by fitting at the acquisition moments corresponding to each point cloud data frame or image frame;
  • the object is collected in each point cloud data frame or image frame corresponding to the object.
  • the fitting error corresponding to the moment is determined, and the second evaluation information representing the algorithm stability of the preset visual perception algorithm is determined.
  • the existence of continuity between the point cloud data frames in the evaluation data set means that there is continuity in time series
  • the existence of continuity between image frames means that there is continuity in time series.
  • the electronic device can determine the target error value corresponding to the same object from the target error value based on the time series information between the point cloud data frames or the image frames in the evaluation data set, wherein the target error value corresponding to the same object Arrange according to the time series information between the corresponding point cloud data frames or image frames.
  • the electronic device obtains the target error value corresponding to different objects based on the point cloud data frame or the timing information between the image frames in the evaluation data set, the target error value corresponding to the object and the preset curve fitting algorithm.
  • the fitting error curve corresponding to the target error value corresponding to the object, and the fitting error curve corresponding to the target error value corresponding to the object may include: the fitting obtained by the object corresponds to the acquisition moment corresponding to each point cloud data frame or image frame. fitting error.
  • the preset curve fitting algorithm may be any type of curve fitting algorithm in the related art, which is not limited in this embodiment of the present invention.
  • the target error value corresponding to the different objects and the fitting error curve corresponding to the target error value corresponding to the object by the electronic device includes the fitting error corresponding to the object at the acquisition time corresponding to each point cloud data frame or image frame, A second evaluation information that characterizes the algorithm stability of the preset visual perception algorithm is determined.
  • the 0343 includes:
  • the horizontal axis of the difference error curve corresponding to the target error corresponding to the object is the preset difference threshold corresponding to the target error value corresponding to the object
  • the vertical axis of the difference error curve corresponding to the target error corresponding to the object is the The difference value corresponding to the target error value corresponding to the object is smaller than the number of each preset difference value threshold, and the ratio of the total number of target error values corresponding to the object.
  • the target error value corresponding to the object corresponds to different point cloud data frames or image frames, and each point cloud data frame or image frame corresponds to a collection moment.
  • the target error values corresponding to the object correspond to different collection moments.
  • the electronic device can target the target error values corresponding to different objects based on the target error values corresponding to the objects and the fitted error curves corresponding to the target error values corresponding to the objects contained in the object in each point cloud data frame or image.
  • the fitting error corresponding to the acquisition moment corresponding to the frame is calculated, and the difference between the target error and the fitting error corresponding to the same acquisition moment is calculated.
  • the electronic device can sort the difference values corresponding to the target error values corresponding to the objects according to the magnitude of the numerical value, and determine the difference of the first third percentage in the sorting sequence.
  • the third percentage may be the same or different from the first percentage
  • the fourth percentage may be the same or different from the second percentage.
  • the electronic device may determine the value representing the preset visual perception algorithm based on the first difference value and the second difference value corresponding to the target error value corresponding to each object, and/or the difference value error curve corresponding to the target error value corresponding to each object.
  • Second evaluation information for algorithm stability wherein, the smaller the value between the first difference and the second difference corresponding to the target error value corresponding to each object, the smaller the difference between the target error and the fitting error at each acquisition moment corresponding to the object can be represented,
  • the preset visual perception algorithm has better algorithm stability in the dimension corresponding to the target error value.
  • the embodiment of the present invention can visually display the evaluation information and the intermediate information generated in the evaluation process.
  • the method further includes:
  • the electronic device can visually display the evaluation information and the intermediate information generated during the evaluation process in a web-based visualization method and a 3D rendering-based visualization method, and can provide a function of interacting with the user.
  • the embodiment of the present invention provides an evaluation device for a visual perception algorithm.
  • the device may include:
  • the first obtaining module 310 is configured to obtain the truth value information corresponding to each object determined based on each point cloud data frame in the evaluation data set, wherein the truth value information at least includes the marked pose information of the corresponding object and the motion of the object information, each evaluation data includes point cloud data frame and image frame with corresponding relationship;
  • the second obtaining module 320 is configured to obtain the detection information corresponding to each detected object detected based on the preset visual detection algorithm and each image frame in the evaluation data set, wherein the detection information at least includes the corresponding detected object. Detect pose information and detect motion information;
  • the determination module 330 is configured to be based on the preset result accuracy evaluation rules, the preset algorithm stability evaluation rules, the labeled pose information and object motion information in the true value information corresponding to each object, and the detection information corresponding to each detected object.
  • the detected pose information and detected motion information are determined, and the evaluation information corresponding to the preset visual perception algorithm is determined, wherein the evaluation information includes: first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm and Second evaluation information for algorithm stability.
  • Applying the embodiments of the present invention it can be based on the preset result accuracy evaluation rules, the preset algorithm stability evaluation rules, the labeled pose information and object motion information in the true value information corresponding to each object, and the detection information corresponding to each detected object.
  • the first evaluation information for the accuracy of the multi-faceted detection results of the preset visual perception algorithm and the second evaluation information for the stability of the algorithm are determined.
  • the accuracy of the multi-faceted detection results and the The stability of the detection results of the algorithm is used to evaluate the preset visual perception algorithm, so as to realize a comprehensive evaluation of the performance of the visual perception algorithm.
  • the first obtaining module 310 is specifically configured to obtain an evaluation data set
  • each point cloud data frame corresponding to each object Based on the labeling position information and labeling attitude information of each object corresponding to each point cloud data frame in the evaluation data set, and the time sequence information between each point cloud data frame in the evaluation data set, determine each point cloud data frame corresponding to each object.
  • the marked velocity information and the marked acceleration information of the object are used to obtain the object motion information of each object corresponding to each point cloud data frame, and the true value information corresponding to each object corresponding to each point cloud data frame is obtained.
  • the second obtaining module 320 is specifically configured to detect each image frame in the evaluation data set based on a preset visual perception algorithm, and obtain the corresponding image frame of each image frame.
  • the detection frame information corresponding to each detected object is used to determine the detection position information and detection posture information of each detected detection object corresponding to each image frame, and the detection posture information of each detection object corresponding to each image frame is obtained. ;
  • the detection speed information and detection acceleration information of each detection object corresponding to each image frame are determined to obtain each image frame.
  • the detection motion information of each detection object corresponding to the frame is obtained, and the detection information corresponding to each detection object corresponding to each image frame is obtained.
  • the ground truth information includes frame information of each object corresponding to each point cloud data frame, and the detection information includes detection of each detected object corresponding to each image frame Frame information, the detection frame information includes the two-dimensional position information of the corresponding detected object in the image frame;
  • the determining module 330 includes:
  • the first determination unit (not shown in the figure) is configured to, for each object corresponding to each point cloud data frame, based on the corresponding annotation frame information of the object, the position between the point cloud data frame acquisition device and the image frame acquisition device The conversion relationship and the internal reference information of the image frame acquisition device, determine the projection position information of the annotation frame corresponding to the object projected to the projection frame in the image frame corresponding to the point cloud data frame, as the projection frame position information corresponding to the object;
  • the second determination unit (not shown in the figure) is configured to, for each object corresponding to each point cloud data frame, based on the position information of the projection frame corresponding to each object and the position information of each detected object in the image frame corresponding to the point cloud data frame Two-dimensional position information, determine the matching projection frame position information and two-dimensional position information to determine the matching truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, wherein the matching projection
  • the frame position information and the two-dimensional position information are: the projection frame position information and the two-dimensional position information of which the intersection ratio value of the corresponding frame exceeds the preset intersection ratio threshold;
  • the third determination unit (not shown in the figure) is configured to be based on the preset result accuracy evaluation rule, the matched truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, and the unmatched The detection information of the true value information and the true value information that does not match the detection information, determine the first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm;
  • the fourth determination unit (not shown in the figure) is configured to determine, based on the preset algorithm stability evaluation rule, the matched truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, The second evaluation information of the algorithm stability of the preset visual perception algorithm.
  • the detection posture information includes: detection position information and detection posture information determined by the detection frame information of each detected object corresponding to each image frame;
  • the detected motion information includes: the detected speed information and detected acceleration information of each detected object corresponding to each image frame;
  • the labeled pose information includes: each object corresponding to each point cloud data frame through its labeled frame information The determined labeling position information and labeling attitude information;
  • the object motion information includes: labeling speed information and labeling acceleration information of each object corresponding to each point cloud data frame;
  • the third determining unit is specifically configured to be based on the matching truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, the detection information not matching the truth information, and the detection information not matching The truth value information, determine the precision rate information and recall rate information of the detection result corresponding to the preset visual perception algorithm;
  • the detection attitude error between the matched ground truth information and the detection information is determined. value
  • the detection velocity error between the matched ground truth information and the detection information is determined. value
  • an error curve corresponding to the target error value is drawn, wherein the horizontal axis of the error curve is the prediction value.
  • Set the error threshold, the vertical axis of the error curve is the ratio of the number of target error values smaller than each preset error threshold in the target error value to the total amount of data in the evaluation data set, and the target error value is: the detection position Error value, detection attitude error value, detection speed error value, detection acceleration error value or length and width error value of detection frame;
  • the precision rate information and recall rate information based on the detection result corresponding to the preset visual perception algorithm, the error curve corresponding to the target error information, the first target error value and the second target error value and/or the target error value , and determine the first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm.
  • the fourth determining unit is specifically configured to determine from the target error value based on the time series information between point cloud data frames or image frames in the evaluation data set The target error value corresponding to the same object;
  • the corresponding object is obtained by fitting.
  • the fitting error curve corresponding to the target error value of wherein the fitting error curve includes: the fitting error corresponding to the object obtained by fitting at the acquisition moment corresponding to each point cloud data frame or image frame;
  • the object corresponds to the acquisition time corresponding to each point cloud data frame or image frame.
  • the fitting error is determined, and the second evaluation information representing the algorithm stability of the preset visual perception algorithm is determined.
  • the fourth determining unit is specifically configured to target error values corresponding to different objects, based on the target error values corresponding to the objects and the target error values corresponding to the objects.
  • the fitting error of the object included in the fitting error curve corresponding to the acquisition time corresponding to each point cloud data frame or image frame, and the difference between the target error and the fitting error corresponding to the same acquisition time is calculated;
  • the target corresponding to the object is drawn based on the difference between the target error and the fitting error at each acquisition time corresponding to the object, and the preset difference threshold corresponding to the target error value corresponding to the object.
  • the difference error curve corresponding to the error wherein the horizontal axis of the difference error curve corresponding to the target error corresponding to the object is the preset difference threshold corresponding to the target error value corresponding to the object, and the difference value corresponding to the target error corresponding to the object
  • the vertical axis of the value error curve is the ratio of the number of differences corresponding to the target error values corresponding to the object that are smaller than the preset difference thresholds, accounting for the total number of target error values corresponding to the object;
  • the device further includes:
  • a display module (not shown in the figure), configured to display the first evaluation information, the second evaluation information, the precision rate information and the recall rate information of the detection result corresponding to the preset visual perception algorithm, the The error curve corresponding to the target error information, the first target error value and the second target error value, the target error value, the determined target error value corresponding to the same object, and/or each obtained by fitting.
  • the fitting error curve corresponding to the target error value corresponding to the object configured to display the first evaluation information, the second evaluation information, the precision rate information and the recall rate information of the detection result corresponding to the preset visual perception algorithm, the The error curve corresponding to the target error information, the first target error value and the second target error value, the target error value, the determined target error value corresponding to the same object, and/or each obtained by fitting.
  • the fitting error curve corresponding to the target error value corresponding to the object configured to display the first evaluation information, the second evaluation information, the precision rate information and the recall rate information of the detection result corresponding to the preset visual perception algorithm, the The

Abstract

Disclosed in embodiments of the present invention are a visual perception algorithm evaluation method and device. The method comprises: obtaining truth value information, corresponding to each object, determined on the basis of each point cloud data frame in an evaluation data set; obtaining detected information, corresponding to each detected object, detected on the basis of a preset visual detection algorithm and each image frame in the evaluation data set; and determining evaluation information corresponding to a preset visual perception algorithm on the basis of a preset result accuracy evaluation rule, a preset algorithm stability evaluation rule, marked pose information and object motion information in the truth value information corresponding to each object, and detected pose information and detected motion information in the detected information corresponding to each detected object, wherein the evaluation information comprises first evaluation information representing the detection result accuracy of the preset visual perception algorithm and second evaluation information representing the algorithm stability of the preset visual perception algorithm, thus implementing the comprehensive evaluation of the performance of a visual perception algorithm.

Description

一种视觉感知算法的评测方法及装置Method and device for evaluating visual perception algorithm 技术领域technical field
本发明涉及算法评测技术领域,具体而言,涉及一种视觉感知算法的评测方法及装置。The present invention relates to the technical field of algorithm evaluation, and in particular, to a method and device for evaluating a visual perception algorithm.
背景技术Background technique
视觉感知算法是自动驾驶系统、人脸识别系统以及身份验证系统等领域的核心组成部分,视觉感知算法的感知结果的准确与否,在一定程度上影响着上述系统输出结果的准确与否。相应的,为了保证上述系统的性能,在实际应用视觉感知算法之前需要对视觉感知算法的性能进行评测。Visual perception algorithms are the core components of autonomous driving systems, face recognition systems, and identity verification systems. The accuracy of the perception results of the visual perception algorithms affects the accuracy of the output results of the above systems to a certain extent. Correspondingly, in order to ensure the performance of the above system, the performance of the visual perception algorithm needs to be evaluated before the visual perception algorithm is actually applied.
那么,如何提供一种对视觉感知算法的性能进行评测的方法成为亟待解决的问题。Then, how to provide a method for evaluating the performance of the visual perception algorithm has become an urgent problem to be solved.
发明内容SUMMARY OF THE INVENTION
本发明提供了一种视觉感知算法的评测方法及装置,以实现对视觉感知算法的性能的全面评测。The present invention provides an evaluation method and device for a visual perception algorithm, so as to realize a comprehensive evaluation of the performance of the visual perception algorithm.
本发明实施例的创新点包括:The innovative points of the embodiments of the present invention include:
1、可以基于预设结果准确性评测规则、预设算法稳定性评测规则、各物体对应的真值信息中的标注位姿信息和物体运动信息以及各检测物体对应的检测信息中的检测位姿信息和检测运动信息,确定出预设视觉感知算法的多方面检测结果的准确性的第一评测信息,以及算法稳定性的第二评测信息,从多方面检测结果的准确性和算法的检测结果的稳定性来对预设视觉感知算法进行评测,实现对视觉感知算法的性能的全面评测。1. It can be based on the preset result accuracy evaluation rules, the preset algorithm stability evaluation rules, the labeled pose information and object motion information in the true value information corresponding to each object, and the detection pose in the detection information corresponding to each detected object Information and detection motion information, determine the first evaluation information of the accuracy of the multi-faceted detection results of the preset visual perception algorithm, and the second evaluation information of the algorithm stability, the accuracy of the multi-faceted detection results and the detection results of the algorithm The stability of the preset visual perception algorithm is evaluated to achieve a comprehensive evaluation of the performance of the visual perception algorithm.
2、基于预先训练的三维数据感知模型对评测数据集中的每一点云数据帧进行自动标注,得到每一点云数据帧对应的各物体的标注框信息,进而推算出每一点云数据帧所对应各物体的标注位置信息和标注姿态信息;并结合评测数据中每一点云数据帧之间的时序信息,确定各点云数据帧所对应各物体的速度信息以及加速度信息,得到点云数据帧中对应的包括各物体的标注框信息、标注位置信息和标注姿态信息以及标注速度信息以及标注加速度信息等的三维信息,实现对各关于所标注的物体的三维信息的自动标注,节省人力成本。2. Automatically label each point cloud data frame in the evaluation data set based on the pre-trained 3D data perception model, obtain the label frame information of each object corresponding to each point cloud data frame, and then calculate the corresponding data frame of each point cloud data frame. The labeled position information and labeled attitude information of the object; combined with the timing information between each point cloud data frame in the evaluation data, determine the speed information and acceleration information of each object corresponding to each point cloud data frame, and obtain the corresponding point cloud data frame. The three-dimensional information including the labeling frame information, labeling position information, labeling attitude information, labeling speed information and labeling acceleration information of each object, realizes automatic labeling of the 3D information about the labelled objects, and saves labor costs.
附图说明Description of drawings
图1为本发明实施例提供的视觉感知算法的评测方法的一种流程示意图;1 is a schematic flowchart of a method for evaluating a visual perception algorithm according to an embodiment of the present invention;
图2为本发明实施例提供的目标误差值对应的误差曲线的一种示例图;2 is an exemplary diagram of an error curve corresponding to a target error value provided by an embodiment of the present invention;
图3为本发明实施例提供的视觉感知算法的评测装置的一种结构示意图。FIG. 3 is a schematic structural diagram of an apparatus for evaluating a visual perception algorithm according to an embodiment of the present invention.
具体实施方式detailed description
本发明提供了一种视觉感知算法的评测方法及装置,以实现对视觉感知算法的性能的全面评测。下面对本发明实施例进行详细说明。The present invention provides an evaluation method and device for a visual perception algorithm, so as to realize a comprehensive evaluation of the performance of the visual perception algorithm. The embodiments of the present invention will be described in detail below.
图1为本发明实施例提供的视觉感知算法的评测方法的一种流程示意图。该方法可以包括如下步骤:FIG. 1 is a schematic flowchart of a method for evaluating a visual perception algorithm according to an embodiment of the present invention. The method may include the following steps:
S101:获得基于评测数据集中的每一点云数据帧确定的各物体对应的真值信息。S101: Obtain truth value information corresponding to each object determined based on each point cloud data frame in the evaluation data set.
其中,真值信息至少包括所对应物体的标注位姿信息以及物体运动信息,每一评测数据包括存在对应关系的点云数据帧和图像帧。该点云数据帧可以是激光雷达传感器所采集的数据帧,该图像帧可以是图像采集设备所采集的图像帧。The ground truth information includes at least the labeled pose information and object motion information of the corresponding object, and each evaluation data includes a point cloud data frame and an image frame that have a corresponding relationship. The point cloud data frame may be a data frame collected by a lidar sensor, and the image frame may be an image frame collected by an image collection device.
本发明实施例所提供的视觉感知算法的评测方法,可以应用于任一具有计算能力的电子设备,该电子设备可以为终端或者服务器。The evaluation method of the visual perception algorithm provided by the embodiment of the present invention can be applied to any electronic device with computing capability, and the electronic device can be a terminal or a server.
该真值信息包括的所对应物体的标注位姿信息以及物体运动信息可以为基于三维空间的信息,例如可以是在采集得到该点云数据帧的设备的设备坐标系下的位姿信息以及运动信息,也可以是预设空间直角坐标系下的位姿信息以及运动信息,这都是可以的,其中,该预设空间直角坐标系可以为世界坐标系或者为图像采集设备坐标。其中,标注位姿信息可以包括标注位置信息和标注姿态信息;物体运动信息可以包括但不限于:所标注的物体的速度信息以及加速度信息,为了描述清楚,可以称基于评测数据集中的每一点云数据帧确定的物体运动信息所包括的速度信息为标注速度信息,加速度信息为标注加速度信息。The labeled pose information and object motion information of the corresponding object included in the ground truth information may be information based on three-dimensional space, such as the pose information and motion information in the device coordinate system of the device that acquired the point cloud data frame. The information can also be pose information and motion information in a preset space rectangular coordinate system, which are all possible, wherein the preset space rectangular coordinate system can be a world coordinate system or an image acquisition device coordinate. Among them, the labeled pose information may include labeled position information and labeled attitude information; the object motion information may include but not limited to: the speed information and acceleration information of the labeled object, for the sake of clarity, it can be called based on each point cloud in the evaluation data set The speed information included in the object motion information determined by the data frame is the marked speed information, and the acceleration information is the marked acceleration information.
在一种情况中,该视觉感知算法可以是应用于自动驾驶系统中的视觉感知算法,相应的,该评测数据集中包括的各评测数据可以是目标车辆在行驶过程中所采集的评测数据,每一评测数据包括存在对应关系的点云数据帧和图像帧,该存在对应关系可以指:在同一个采集周期内所采集的点云数据帧和图像帧。相应的,上述激光雷达传感器和图像采集设备可以均设置于目标车辆内。In one case, the visual perception algorithm may be a visual perception algorithm applied in an automatic driving system. Correspondingly, each evaluation data included in the evaluation data set may be the evaluation data collected by the target vehicle during the driving process. An evaluation data includes point cloud data frames and image frames that have a corresponding relationship, and the existing corresponding relationship may refer to: point cloud data frames and image frames collected in the same collection period. Correspondingly, the above-mentioned lidar sensor and image acquisition device may both be installed in the target vehicle.
该视觉感知算法可以是应用于自动驾驶系统中的视觉感知算法的情况下,上述各物体可以包括但不限于车辆以及行人等。在一种情况中,物体为车辆时,真值信息中所包括的所对应物体的标注位姿信息中的标注位置信息可以指:车辆的中心点的位置信息、车尾中心点的位置信息或车头中心点的位置信息,这都是可以的。真值信息中所包括的 所对应物体的标注位姿信息中的标注姿态信息可以指:车辆在行驶过程中相对应其所在坐标系的坐标轴的各夹角信息,包括:俯仰角信息、滚转角信息以及偏移角信息。在一种情况中,可以不考虑车辆在地表行驶过程中所产生的俯仰角信息和滚转角信息,即认为车辆在地表行驶过程中所产生的俯仰角信息和滚转角信息均为零。When the visual perception algorithm may be a visual perception algorithm applied in an automatic driving system, the above objects may include, but are not limited to, vehicles, pedestrians, and the like. In one case, when the object is a vehicle, the labeled position information in the labeled pose information of the corresponding object included in the ground truth information may refer to: the position information of the center point of the vehicle, the position information of the center point of the rear of the vehicle, or The position information of the center point of the front of the car, this is all possible. The labeled pose information in the labeled pose information of the corresponding object included in the true value information may refer to: the angle information of the coordinate axes corresponding to the coordinate system where the vehicle is located during the driving process, including: pitch angle information, roll Corner information and offset angle information. In one case, the pitch angle information and roll angle information generated when the vehicle is running on the ground may not be considered, that is, the pitch angle information and the roll angle information generated when the vehicle is running on the ground are considered to be zero.
在一种实现中,该评测数据集中的评测数据可以包括:针对正常行驶场景采集的评测数据,或针对大车或异型车场景所采集的评测数据集,或针对行人、复杂十字路口以及特定天气条件所采集的评测数据,这都是可以的。In one implementation, the evaluation data in the evaluation data set may include: evaluation data collected for normal driving scenarios, or evaluation data sets collected for large vehicles or special-shaped vehicle scenarios, or evaluation data sets collected for pedestrians, complex intersections, and specific weather The evaluation data collected by the conditions are all possible.
在一种实现中,电子设备可以直接获得其他设备所发送的基于评测数据集中的每一点云数据帧确定的各物体对应的真值信息。In one implementation, the electronic device can directly obtain the ground truth information corresponding to each object determined based on each point cloud data frame in the evaluation data set and sent by other devices.
在本发明的一种实现方式中,S101,可以包括如下步骤011-013:In an implementation manner of the present invention, S101, may include the following steps 011-013:
011:获得评测数据集;011: Obtain the evaluation data set;
012:基于预先训练的三维数据感知模型对评测数据集中的每一点云数据帧进行标注,标注出每一点云数据帧对应的各物体的标注框信息,以确定每一点云数据帧所对应各物体的标注位置信息和标注姿态信息,得到每一点云数据帧所对应各物体的标注位姿信息;012: Label each point cloud data frame in the evaluation data set based on the pre-trained 3D data perception model, and mark the label frame information of each object corresponding to each point cloud data frame, so as to determine each object corresponding to each point cloud data frame The labeled position information and labeled pose information of each point cloud data frame are obtained, and the labeled pose information of each object corresponding to each point cloud data frame is obtained;
013:基于评测数据集中每一点云数据帧所对应各物体的标注位置信息和标注姿态信息,以及评测数据集中每一点云数据帧之间的时序信息,确定各点云数据帧所对应各物体的标注速度信息以及标注加速度信息,以得到每一点云数据帧所对应各物体的物体运动信息,得到每一点云数据帧所对应各物体对应的真值信息。013: Based on the labeled position information and labeled attitude information of each object corresponding to each point cloud data frame in the evaluation data set, and the time sequence information between each point cloud data frame in the evaluation data set, determine the corresponding object of each point cloud data frame. The velocity information and the acceleration information are marked to obtain the object motion information of each object corresponding to each point cloud data frame, and the true value information corresponding to each object corresponding to each point cloud data frame is obtained.
本实现方式中,电子设备可以直接获得评测数据集,其中,评测数据集中包括多个评测数据;电子设备将评测数据集中的每一评测数据包括的点云数据帧,输入预先训练的三维数据感知模型,通过预先训练的三维数据感知模型从每一点云数据帧中检测去其中的各物体,并通过标注框进行标注,得到每一点云数据帧对应的各物体的标注框信息,其中,该标注框可以为一立方体。各物体的标注框信息包括:可以表征物体的长宽高的信息,和可以表征物体的位姿信息的信息。In this implementation manner, the electronic device can directly obtain the evaluation data set, wherein the evaluation data set includes multiple evaluation data; the electronic device inputs the point cloud data frame included in each evaluation data in the evaluation data set into the pre-trained 3D data perception The model uses a pre-trained 3D data perception model to detect each object in each point cloud data frame, and mark it with a label frame to obtain the label frame information of each object corresponding to each point cloud data frame, wherein the label The box can be a cube. The annotation frame information of each object includes: information that can represent the length, width and height of the object, and information that can represent the pose information of the object.
后续的,电子设备通过预先训练的三维数据感知模型针对每一点云数据帧所输出的其所对应的各物体的标注框信息,换算得到每一帧点云数据帧对应的各物体的标注位置信息和标注姿态信息。该预先训练的三维数据感知模型可以为:基于样本点云数据帧及其对应的包括该样本点云数据帧中各物体对应的标定框信息的标定信息训练所得的神经网络模型,具体的模型训练过程可以参见相关技术中模型的训练过程,在此不再赘述。Subsequently, the electronic device converts the labeling frame information of each object corresponding to each point cloud data frame output by the pre-trained 3D data perception model to obtain the labeling position information of each object corresponding to each frame of point cloud data frame. and labeling pose information. The pre-trained three-dimensional data perception model may be: a neural network model trained based on the sample point cloud data frame and its corresponding calibration information including the calibration frame information corresponding to each object in the sample point cloud data frame, and the specific model training For the process, please refer to the training process of the model in the related art, which will not be repeated here.
评测数据集中的评测数据一般为连续采集得到的数据,即评测数据集中的点云数据帧之间为连续帧,且图像帧为连续帧。相应的,电子设备可以基于评测数据集中每一点云数据帧所对应各物体的标注位置信息和标注姿态信息,以及评测数据集中每一点云数 据帧之间的时序信息,确定各点云数据帧所对应各物体的标注速度信息以及标注加速度信息。The evaluation data in the evaluation data set is generally the data obtained by continuous collection, that is, the point cloud data frames in the evaluation data set are continuous frames, and the image frames are continuous frames. Correspondingly, the electronic device can determine the location of each point cloud data frame based on the marked position information and marked attitude information of each object corresponding to each point cloud data frame in the evaluation data set, and the timing information between each point cloud data frame in the evaluation data set. Corresponding to the marked speed information and marked acceleration information of each object.
在一种实现中,各物体的标注位置信息可以包括横向位置信息、纵向位置信息以及径向位置信息,进而,基于每一点云数据帧所对应各物体的横向位置信息,以及每一点云数据帧之间的时序信息,可以确定得到各物体的标注横向速度信息和标注横向加速度信息;基于每一点云数据帧所对应各物体的纵向位置信息,以及每一点云数据帧之间的时序信息,可以确定得到各物体的标注纵向速度信息和标注纵向加速度信息;基于每一点云数据帧所对应各物体的径向位置信息,以及每一点云数据帧之间的时序信息,可以确定得到各物体的标注径向速度信息和标注径向加速度信息。In one implementation, the labeled position information of each object may include horizontal position information, vertical position information, and radial position information, and further, based on the horizontal position information of each object corresponding to each point cloud data frame, and each point cloud data frame The time series information between each object can be determined to obtain the marked lateral velocity information and marked lateral acceleration information; based on the longitudinal position information of each object corresponding to each point cloud data frame, and the time series information between each point cloud data frame, it can be Determine the marked longitudinal velocity information and marked longitudinal acceleration information of each object; based on the radial position information of each object corresponding to each point cloud data frame, and the time sequence information between each point cloud data frame, the annotation of each object can be determined. Radial velocity information and labeled radial acceleration information.
S102:获得基于预设视觉检测算法以及评测数据集中每一图像帧检测出的各检测物体对应的检测信息。S102: Obtain detection information corresponding to each detection object detected based on a preset visual detection algorithm and each image frame in the evaluation data set.
其中,检测信息至少包括所对应检测物体的检测位姿信息以及检测运动信息。The detection information includes at least detection pose information and detection motion information of the corresponding detected object.
利用预设视觉检测算法可以从图像帧中检测出该图像中的各物体对应的检测信息,为了描述清楚,可以称利用预设视觉检测算法从图像帧中检测出的物体为检测物体。The detection information corresponding to each object in the image can be detected from the image frame by using the preset visual detection algorithm. For the sake of clarity, the object detected from the image frame by the preset visual detection algorithm can be called a detection object.
该检测信息中可以包括物体对应的二维信息以及三维信息;其中,物体对应的二维信息可以包括:物体在图像帧中的二维位置信息以及二维速度信息;物体对应的三维信息可以包括但不限于:物体在指定空间直角坐标系下的位姿信息,即检测位姿信息,以及检测运动信息,该检测运动信息包括但不限于:所对应物体的检测速度信息和检测加速度信息。The detection information may include two-dimensional information and three-dimensional information corresponding to the object; wherein, the two-dimensional information corresponding to the object may include: two-dimensional position information and two-dimensional speed information of the object in the image frame; the three-dimensional information corresponding to the object may include But not limited to: the pose information of the object in the specified space rectangular coordinate system, that is, the detected pose information, and the detected motion information, the detected motion information includes but is not limited to: the detected speed information and detected acceleration information of the corresponding object.
在一种实现中,电子设备可以直接获得其他设备所发送的基于预设视觉检测算法以及评测数据集中每一图像帧检测出的各检测物体对应的检测信息。In one implementation, the electronic device can directly obtain the detection information corresponding to each detected object detected in each image frame in the evaluation data set based on the preset visual detection algorithm and sent by other devices.
在本发明的一种实现方式中,S102,可以包括如下步骤021-022:In an implementation manner of the present invention, S102, may include the following steps 021-022:
021:基于预设视觉感知算法,对评测数据集中每一图像帧进行检测,得到每一图像帧对应的所检测出的各物体所对应检测框信息,以确定每一图像帧对应的所检测出的各检测物体的检测位置信息和检测姿态信息,得到每一图像帧所对应各检测物体的检测位姿信息。021: Based on the preset visual perception algorithm, detect each image frame in the evaluation data set, and obtain the detection frame information corresponding to each detected object corresponding to each image frame, so as to determine the detected detected object corresponding to each image frame. The detection position information and detection posture information of each detection object are obtained, and the detection posture information of each detection object corresponding to each image frame is obtained.
022:基于预设视觉感知算法以及每一图像帧所对应各检测物体的检测位置信息和检测姿态信息,确定出每一图像帧所对应各检测物体的检测速度信息和检测加速度信息,以得到每一图像帧所对应各检测物体的检测运动信息,得到每一图像帧所对应各检测物体对应的检测信息。022: Based on the preset visual perception algorithm and the detection position information and detection attitude information of each detection object corresponding to each image frame, determine the detection speed information and detection acceleration information of each detection object corresponding to each image frame to obtain each image frame. The detection motion information of each detection object corresponding to an image frame is obtained, and the detection information corresponding to each detection object corresponding to each image frame is obtained.
本实现方式中,电子设备本地或所连接的存储设备预先存储有该预设视觉感知算法,获得评测数据集之后,电子设备可以基于该预设视觉感知算法,对评测数据集中的每一 图像帧进行检测,得到每一图像帧对应的所检测出的各物体所对应检测框信息;其中,检测框信息包括:可以表征所对应检测物体的长宽高的信息,和可以表征所对应检测物体的位姿信息的信息,也可以包括表征所对应检测物体在相应的图像帧中的二维位置信息的信息。In this implementation manner, the preset visual perception algorithm is pre-stored locally on the electronic device or in a connected storage device, and after obtaining the evaluation data set, the electronic device can, based on the preset visual perception algorithm, analyze each image frame in the evaluation data set. Perform detection to obtain detection frame information corresponding to each detected object corresponding to each image frame; wherein, the detection frame information includes: information that can represent the length, width and height of the corresponding detected object, and information that can represent the corresponding detected object. The information of the pose information may also include information representing the two-dimensional position information of the corresponding detected object in the corresponding image frame.
电子设备基于预设视觉感知算法以及每一图像帧对应的所检测出的各物体所对应检测框信息,确定每一图像帧对应的所检测出的各检测物体的检测位置信息和检测姿态信息。并基于预设视觉感知算法以及每一图像帧所对应各检测物体的检测位置信息和检测姿态信息,确定出每一图像帧所对应各检测物体的检测速度信息和检测加速度信息,以得到每一图像帧所对应各检测物体的检测运动信息。The electronic device determines the detection position information and detection posture information of each detected object corresponding to each image frame based on a preset visual perception algorithm and detection frame information corresponding to each detected object corresponding to each image frame. And based on the preset visual perception algorithm and the detection position information and detection attitude information of each detection object corresponding to each image frame, the detection speed information and detection acceleration information of each detection object corresponding to each image frame are determined to obtain each image frame. The detected motion information of each detected object corresponding to the image frame.
在一种实现中,各检测物体的检测位置信息可以包括横向位置信息、纵向位置信息以及径向位置信息,进而,基于每一检测物体的横向位置信息,以及每一图像帧之间的时序信息,可以确定得到各检测物体的检测横向速度信息和检测横向加速度信息;基于每一检测物体的纵向位置信息,以及每一图像帧之间的时序信息,可以确定得到各检测的检测纵向速度信息和检测纵向加速度信息;基于每一检测物体的径向位置信息,以及每一图像帧之间的时序信息,可以确定得到各检测物体的检测径向速度信息和检测径向加速度信息。In one implementation, the detection position information of each detected object may include lateral position information, longitudinal position information and radial position information, and further, based on the lateral position information of each detected object and the timing information between each image frame , the detected lateral velocity information and detected lateral acceleration information of each detected object can be determined; based on the longitudinal position information of each detected object and the timing information between each image frame, the detected longitudinal velocity information and Detecting longitudinal acceleration information; based on the radial position information of each detected object and the timing information between each image frame, the detected radial velocity information and detected radial acceleration information of each detected object can be determined.
S103:基于预设结果准确性评测规则、预设算法稳定性评测规则、各物体对应的真值信息中的标注位姿信息和物体运动信息以及各检测物体对应的检测信息中的检测位姿信息和检测运动信息,确定预设视觉感知算法对应的评测信息。S103: Based on preset result accuracy evaluation rules, preset algorithm stability evaluation rules, labeled pose information and object motion information in the true value information corresponding to each object, and detection pose information in the detection information corresponding to each detected object and detecting motion information to determine the evaluation information corresponding to the preset visual perception algorithm.
其中,评测信息包括:表征预设视觉感知算法的检测结果准确性的第一评测信息以及算法稳定性的第二评测信息。The evaluation information includes: first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm and second evaluation information regarding the stability of the algorithm.
本步骤中,电子设备可以基于预设结果准确性评测规则,对存在对应关系的真值信息以及检测信息进行处理,得到表征预设视觉感知算法的检测结果准确性的第一评测信息;并基于预设算法稳定性评测规则,对存在对应关系的真值信息以及检测信息进行处理,得到表征预设视觉感知算法的算法稳定性的第二评测信息。In this step, the electronic device may, based on the preset result accuracy evaluation rules, process the true value information and the detection information that have a corresponding relationship to obtain first evaluation information representing the accuracy of the detection results of the preset visual perception algorithm; The algorithm stability evaluation rule is preset, and the truth information and detection information that have a corresponding relationship are processed to obtain second evaluation information representing the algorithm stability of the preset visual perception algorithm.
其中,预设结果准确性评测规则可以包括但不限于:特定检测结果准确性评定指标,以及指示基于真值信息以及检测信息确定特定检测结果准确性评定指标对应的结果的过程;该预设算法稳定性评测规则可以包括但不限于:特定算法稳定性评定指标,以及指示基于真值信息以及检测信息确定算法稳定性评定指标对应的结果的过程。Wherein, the preset result accuracy evaluation rules may include but are not limited to: a specific test result accuracy evaluation index, and a process of instructing to determine the result corresponding to the specific test result accuracy evaluation index based on the true value information and the test information; the preset algorithm The stability evaluation rule may include, but is not limited to, a specific algorithm stability evaluation index, and a process indicating a result corresponding to the algorithm stability evaluation index determined based on the truth information and detection information.
应用本发明实施例,可以基于预设结果准确性评测规则、预设算法稳定性评测规则、各物体对应的真值信息中的标注位姿信息和物体运动信息以及各检测物体对应的检测信息中的检测位姿信息和检测运动信息,确定出预设视觉感知算法的多方面检测结果的准 确性的第一评测信息,以及算法稳定性的第二评测信息,从多方面检测结果的准确性和算法的检测结果的稳定性来对预设视觉感知算法进行评测,实现对视觉感知算法的性能的全面评测。Applying the embodiments of the present invention, it can be based on the preset result accuracy evaluation rules, the preset algorithm stability evaluation rules, the labeled pose information and object motion information in the true value information corresponding to each object, and the detection information corresponding to each detected object. The first evaluation information for the accuracy of the multi-faceted detection results of the preset visual perception algorithm and the second evaluation information for the stability of the algorithm are determined. The accuracy of the multi-faceted detection results and the The stability of the detection results of the algorithm is used to evaluate the preset visual perception algorithm, so as to realize a comprehensive evaluation of the performance of the visual perception algorithm.
在本发明的另一实施例中,真值信息包括每一点云数据帧所对应各物体的标注框信息,检测信息包括每一图像帧对应的所检测出的各检测物体的检测框信息,检测框信息包括所对应检测物体在图像帧中的二维位置信息;In another embodiment of the present invention, the ground truth information includes the labeling frame information of each object corresponding to each point cloud data frame, the detection information includes the detection frame information of each detected object corresponding to each image frame, and the detection The frame information includes the two-dimensional position information of the corresponding detected object in the image frame;
所述S103,可以包括如下步骤031-034:The S103 may include the following steps 031-034:
031:针对每一点云数据帧对应的各物体,基于该物体对应的标注框信息、点云数据帧采集设备与图像帧采集设备之间的位置转换关系以及图像帧采集设备的内参信息,确定该物体对应的标注框投影至该点云数据帧对应的图像帧中的投影框的投影位置信息,作为该物体对应的投影框位置信息。031: For each object corresponding to each point cloud data frame, based on the annotation frame information corresponding to the object, the position conversion relationship between the point cloud data frame acquisition device and the image frame acquisition device, and the internal parameter information of the image frame acquisition device, determine the The projection position information of the projection frame corresponding to the object is projected to the image frame corresponding to the point cloud data frame, as the position information of the projection frame corresponding to the object.
032:针对每一点云数据帧对应的各物体,基于各物体对应的投影框位置信息和该点云数据帧对应的图像帧中各检测物体的二维位置信息,确定相匹配的投影框位置信息和二维位置信息,以确定每一点云数据帧及其对应的图像帧所对应相匹配的真值信息及检测信息。032: For each object corresponding to each point cloud data frame, based on the position information of the projection frame corresponding to each object and the two-dimensional position information of each detected object in the image frame corresponding to the point cloud data frame, determine the matching position information of the projection frame and two-dimensional position information to determine the matching ground truth information and detection information corresponding to each point cloud data frame and its corresponding image frame.
其中,相匹配的投影框位置信息和二维位置信息为:所对应框的交并比数值超过预设交并比阈值的投影框位置信息和二维位置信息。The matched projection frame position information and two-dimensional position information are: projection frame position information and two-dimensional position information for which the intersection ratio value of the corresponding frame exceeds a preset intersection ratio threshold.
033:基于预设结果准确性评测规则、每一点云数据帧及其对应的图像帧所对应相匹配的真值信息及检测信息、未匹配到真值信息的检测信息以及未匹配到检测信息的真值信息,确定表征预设视觉感知算法的检测结果准确性的第一评测信息。033: Based on the preset result accuracy evaluation rules, each point cloud data frame and its corresponding image frame corresponding to the matching truth information and detection information, the detection information that does not match the truth information, and the detection information that does not match the detection information. The truth value information is to determine the first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm.
034:基于预设算法稳定性评测规则、每一点云数据帧及其对应的图像帧所对应相匹配的真值信息及检测信息,确定表征预设视觉感知算法的算法稳定性的第二评测信息。034: Determine the second evaluation information representing the algorithm stability of the preset visual perception algorithm based on the preset algorithm stability evaluation rules, each point cloud data frame and the corresponding matching truth information and detection information of the corresponding image frame. .
本实现方式中,电子设备在基于真值信息和检测信息,确定预设视觉感知算法对应的评测信息之前,需要对真值信息和检测信息进行匹配,以通过相互匹配的真值信息和检测信息,确定预设视觉感知算法对应的评测信息。相应的,电子设备可以针对每一点云数据帧对应的物体,基于该物体对应的标注框信息,以及点云数据帧采集设备与图像帧采集设备之间的位置转换关系,将该物体对应的标注框信息对应的标注框从其所在点云数据帧采集设备的坐标系下,转换至图像帧采集设备的坐标系下,得到该物体对应的标注框信息对应的标注框在图像帧采集设备的坐标系下的位置信息;进而,基于该物体对应的标注框信息对应的标注框在图像帧采集设备的坐标系下的位置信息以及图像帧采集设备的内参信息,将该物体对应的标注框投影至该点云数据帧对应的图像帧中,确定出该物体对应的标注框投影至该点云数据帧对应的图像帧中的投影框的投影位置信息, 作为该物体对应的投影框位置信息。In this implementation manner, before determining the evaluation information corresponding to the preset visual perception algorithm based on the true value information and the detection information, the electronic device needs to match the true value information and the detection information, so as to pass the matched truth value information and the detection information. , to determine the evaluation information corresponding to the preset visual perception algorithm. Correspondingly, the electronic device may, for the object corresponding to each point cloud data frame, based on the corresponding labeling frame information of the object and the positional conversion relationship between the point cloud data frame acquisition device and the image frame acquisition device, label the object corresponding to the object. The annotation frame corresponding to the frame information is converted from the coordinate system of the point cloud data frame acquisition device where it is located to the coordinate system of the image frame acquisition device, and the coordinates of the annotation frame corresponding to the annotation frame information corresponding to the object in the image frame acquisition device are obtained. Then, based on the position information of the label frame corresponding to the label frame information corresponding to the object in the coordinate system of the image frame acquisition device and the internal parameter information of the image frame capture device, the label frame corresponding to the object is projected to the In the image frame corresponding to the point cloud data frame, the projection position information of the annotation frame corresponding to the object projected to the projection frame in the image frame corresponding to the point cloud data frame is determined as the position information of the projection frame corresponding to the object.
针对每一点云数据帧,将其所对应各物体对应的标注框均投影至该点云数据帧对应的图像帧之后,可以基于各物体对应的投影框位置信息和该点云数据帧对应的图像帧中各检测物体的二维位置信息,计算各物体对应的标注框与该点云数据帧对应的图像帧中各检测物体的二维检测框之间的交并比,即计算各物体对应的标注框与该点云数据帧对应的图像帧中各检测物体的二维检测框之间交集面积,与各物体对应的标注框与该点云数据帧对应的图像帧中各检测物体的二维检测框之间并集面积之间的比值;针对每一比值,将该比值与预设交并比阈值进行比较,确定该比值与预设交并比阈值的大小,若比值超过预设交并比阈值,则确定该比值对应的投影框位置信息所对应物体与该比值对应的二维位置信息所对应检测物体,为同一物体,相应的,该比值对应的投影框位置信息和对应的二维位置信息为相匹配的投影框位置信息和二维位置信息。For each point cloud data frame, after projecting the annotation frame corresponding to each object to the image frame corresponding to the point cloud data frame, the position information of the projection frame corresponding to each object and the image corresponding to the point cloud data frame can be used. The two-dimensional position information of each detected object in the frame is calculated, and the intersection ratio between the annotation frame corresponding to each object and the two-dimensional detection frame of each detected object in the image frame corresponding to the point cloud data frame is calculated, that is, the corresponding value of each object is calculated. The intersection area between the annotation frame and the two-dimensional detection frame of each detected object in the image frame corresponding to the point cloud data frame; The ratio between the union areas between the detection frames; for each ratio, compare the ratio with the preset intersection and union ratio threshold to determine the size of the ratio and the preset intersection and union ratio threshold, if the ratio exceeds the preset intersection and union ratio If the ratio threshold is determined, the object corresponding to the position information of the projection frame corresponding to the ratio and the detected object corresponding to the two-dimensional position information corresponding to the ratio are the same object. Accordingly, the position information of the projection frame corresponding to the ratio and the corresponding two-dimensional position information are the same object. The position information is the matched projection frame position information and two-dimensional position information.
举例而言,点云数据帧A中对应的物体包括物体1、物体2以及物体3;该点云数据帧A对应的图像帧a对应的检测物体包括检测物体1、检测物体2、检测物体3以及检测物体4;针对点云数据帧A中对应的物体1,基于物体1对应的投影框位置信息和检测物体1的二维位置信息,计算物体1对应的投影框与检测物体1对应的二维检测框之间的交并比;基于物体1对应的投影框位置信息和检测物体2的二维位置信息,计算物体1对应的投影框与检测物体2对应的二维检测框之间的交并比;基于物体1对应的投影框位置信息和检测物体3的二维位置信息,计算物体1对应的投影框与检测物体3对应的二维检测框之间的交并比;基于物体1对应的投影框位置信息和检测物体4的二维位置信息,计算物体1对应的投影框与检测物体4对应的二维检测框之间的交并比。For example, the corresponding objects in the point cloud data frame A include object 1, object 2, and object 3; the detected objects corresponding to the image frame a corresponding to the point cloud data frame A include detected object 1, detected object 2, and detected object 3 and detection object 4; for the object 1 corresponding to the point cloud data frame A, based on the position information of the projection frame corresponding to the object 1 and the two-dimensional position information of the detection object 1, calculate the projection frame corresponding to the object 1 and the detection object 1 corresponding to the two The intersection ratio between the two-dimensional detection frames; based on the position information of the projection frame corresponding to object 1 and the two-dimensional position information of detection object 2, calculate the intersection between the projection frame corresponding to object 1 and the two-dimensional detection frame corresponding to detection object 2 Parallel ratio; based on the position information of the projection frame corresponding to object 1 and the two-dimensional position information of detection object 3, calculate the intersection ratio between the projection frame corresponding to object 1 and the two-dimensional detection frame corresponding to detection object 3; The position information of the projection frame and the two-dimensional position information of the detection object 4 are calculated, and the intersection ratio between the projection frame corresponding to the object 1 and the two-dimensional detection frame corresponding to the detection object 4 is calculated.
以此类推,针对点云数据帧A中对应的物体2,计算得到物体2对应的投影框与检测物体1对应的二维检测框之间的交并比;物体2对应的投影框与检测物体2对应的二维检测框之间的交并比;物体2对应的投影框与检测物体3对应的二维检测框之间的交并比;物体2对应的投影框与检测物体4对应的二维检测框之间的交并比。以及针对点云数据帧A中对应的物体3,计算得到物体3对应的投影框与检测物体1对应的二维检测框之间的交并比;物体3对应的投影框与检测物体2对应的二维检测框之间的交并比;物体3对应的投影框与检测物体3对应的二维检测框之间的交并比;物体3对应的投影框与检测物体4对应的二维检测框之间的交并比。By analogy, for the object 2 corresponding to the point cloud data frame A, the intersection ratio between the projection frame corresponding to the object 2 and the two-dimensional detection frame corresponding to the detection object 1 is calculated; the projection frame corresponding to the object 2 and the detection object are obtained. The intersection ratio between the two-dimensional detection frames corresponding to 2; the intersection ratio between the projection frame corresponding to object 2 and the two-dimensional detection frame corresponding to detection object 3; the projection frame corresponding to object 2 and the two-dimensional detection frame corresponding to detection object 4 The intersection ratio between the dimensional detection boxes. And for the object 3 corresponding to the point cloud data frame A, the intersection ratio between the projection frame corresponding to the object 3 and the two-dimensional detection frame corresponding to the detection object 1 is calculated; The intersection ratio between the two-dimensional detection frames; the intersection ratio between the projection frame corresponding to object 3 and the two-dimensional detection frame corresponding to detection object 3; the projection frame corresponding to object 3 and the two-dimensional detection frame corresponding to detection object 4 The intersection and comparison between.
分别判断各交并比即比值与预设交并比阈值的大小;假如:若物体1对应的投影框与检测物体3对应的二维检测框之间的交并比超过预设交并比阈值,则确定物体1对应的投影框位置信息和检测物体3对应的二维位置信息为相匹配的投影框位置信息和二维位置信息,相应的,物体1对应的真值信息和检测物体3对应的检测信息为相匹配的真 值信息和检测信息。Determine the size of each intersection ratio, that is, the ratio and the preset intersection ratio threshold; if: if the intersection ratio between the projection frame corresponding to object 1 and the two-dimensional detection frame corresponding to detection object 3 exceeds the preset intersection ratio threshold , the position information of the projection frame corresponding to object 1 and the two-dimensional position information corresponding to detection object 3 are determined to be matching projection frame position information and two-dimensional position information. Correspondingly, the true value information corresponding to object 1 corresponds to detection object 3 The detection information is the matching truth information and detection information.
若物体3对应的投影框与检测物体1对应的二维检测框之间的交并比,物体3对应的投影框与检测物体2对应的二维检测框之间的交并比,物体3对应的投影框与检测物体3对应的二维检测框之间的交并比,以及物体3对应的投影框与检测物体4对应的二维检测框之间的交并比,均未超过预设交并比阈值,则确定检测物体1-4中不存在与物体3为同一物理物体的检测物体,即物体3对应的真值信息为未匹配到检测信息的真值信息,相应的,可以称物体3为漏检物体。If the intersection ratio between the projection frame corresponding to object 3 and the two-dimensional detection frame corresponding to detection object 1, and the intersection ratio between the projection frame corresponding to object 3 and the two-dimensional detection frame corresponding to detection object 2, object 3 corresponds to The intersection ratio between the projection frame corresponding to the detection object 3 and the two-dimensional detection frame corresponding to the detection object 3, and the intersection ratio between the projection frame corresponding to the object 3 and the two-dimensional detection frame corresponding to the detection object 4 do not exceed the preset intersection ratio. Compared with the threshold value, it is determined that there is no detection object that is the same physical object as object 3 in detection objects 1-4, that is, the true value information corresponding to object 3 is the true value information that does not match the detection information. Correspondingly, it can be called an object 3 is the missing object.
若物体1对应的投影框与检测物体4对应的二维检测框之间的交并比,物体2对应的投影框与检测物体4对应的二维检测框之间的交并比,物体3对应的投影框与检测物体4对应的二维检测框之间的交并比,均未超过预设交并比阈值,则确定物体1-3中不存在与检测物体4为同一物理物体的物体,即检测物体4对应的检测信息为未匹配到真值信息的检测信息,相应的,可以称检测物体4为误检物体。If the intersection ratio between the projection frame corresponding to object 1 and the two-dimensional detection frame corresponding to detection object 4, the intersection ratio between the projection frame corresponding to object 2 and the two-dimensional detection frame corresponding to detection object 4, object 3 corresponds to If the intersection ratio between the projection frame and the two-dimensional detection frame corresponding to the detection object 4 does not exceed the preset intersection ratio threshold, it is determined that there is no object that is the same physical object as the detection object 4 in the objects 1-3. That is, the detection information corresponding to the detection object 4 is the detection information that does not match the true value information, and accordingly, the detection object 4 may be called a false detection object.
后续的,电子设备可以基于预设结果准确性评测规则、每一点云数据帧及其对应的图像帧所对应相匹配的真值信息及检测信息、未匹配到真值信息的检测信息以及未匹配到检测信息的真值信息,确定表征预设视觉感知算法的检测结果准确性的第一评测信息;基于预设算法稳定性评测规则、每一点云数据帧及其对应的图像帧所对应相匹配的真值信息及检测信息,确定表征预设视觉感知算法的算法稳定性的第二评测信息。Subsequently, the electronic device may be based on the preset result accuracy evaluation rules, the matching truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, the detection information that does not match the truth information, and the unmatched information. To the true value information of the detection information, determine the first evaluation information representing the accuracy of the detection results of the preset visual perception algorithm; based on the preset algorithm stability evaluation rules, each point cloud data frame and its corresponding image frame match The truth information and detection information of the preset visual perception algorithm determine the second evaluation information representing the algorithm stability of the preset visual perception algorithm.
在本发明的另一实施例中,检测位姿信息包括:每一图像帧对应的所检测出的各检测物体的通过其检测框信息所确定的检测位置信息和检测姿态信息;检测运动信息包括:每一图像帧对应的所检测出的各检测物体的检测速度信息和检测加速度信息;In another embodiment of the present invention, the detected pose information includes: the detected position information and the detected posture information of the detected objects corresponding to each image frame and determined by their detection frame information; the detected motion information includes: : the detected speed information and detected acceleration information of each detected object corresponding to each image frame;
标注位姿信息包括:每一点云数据帧所对应各物体的通过其标注框信息所确定的标注位置信息和标注姿态信息;物体运动信息包括:每一点云数据帧所对应各物体的标注速度信息以及标注加速度信息;The labeling pose information includes: labeling position information and labeling posture information of each object corresponding to each point cloud data frame through its labeling frame information; object motion information includes: labeling speed information of each object corresponding to each point cloud data frame and marking acceleration information;
所述033,可以包括如下步骤0331-0339:The 033 may include the following steps 0331-0339:
0331:基于每一点云数据帧及其对应的图像帧所对应相匹配的真值信息及检测信息、未匹配到真值信息的检测信息以及未匹配到检测信息的真值信息,确定所述预设视觉感知算法对应的检测结果的精确率信息和召回率信息。0331: Based on the matched truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, the detection information that does not match the truth information, and the truth information that does not match the detection information, determine the prediction. Set the precision information and recall information of the detection results corresponding to the visual perception algorithm.
0332:基于每一点云数据帧及其对应的图像帧所对应相匹配的真值信息包括的标注位置信息及检测信息包括的检测位置信息,确定相匹配的真值信息及检测信息之间的检测位置误差值。0332: Determine the detection between the matching truth information and the detection information based on the labeling position information included in the matched truth information corresponding to each point cloud data frame and its corresponding image frame and the detection position information included in the detection information Position error value.
0333:基于每一点云数据帧及其对应的图像帧所对应相匹配的真值信息包括的标注姿态信息及检测信息包括的检测姿态信息,确定相匹配的真值信息及检测信息之间的检 测姿态误差值。0333: Determine the detection between the matching truth information and the detection information based on the annotation attitude information included in the matched truth information corresponding to each point cloud data frame and its corresponding image frame and the detection attitude information included in the detection information Attitude error value.
0334:基于每一点云数据帧及其对应的图像帧所对应相匹配的真值信息包括的标注速度信息及检测信息包括的检测速度信息,确定相匹配的真值信息及检测信息之间的检测速度误差值。0334: Based on the annotation speed information included in the matched truth information corresponding to each point cloud data frame and its corresponding image frame and the detection speed information included in the detection information, determine the detection between the matching truth information and the detection information Speed error value.
0335:基于每一点云数据帧及其对应的图像帧所对应相匹配的真值信息包括的标注加速度信息及检测信息包括的检测加速度信息,确定相匹配的真值信息及检测信息之间的检测加速度误差值。0335: Based on the marked acceleration information included in the matched true value information corresponding to each point cloud data frame and its corresponding image frame, and the detected acceleration information included in the detected information, determine the detection between the matched true value information and the detected information. Acceleration error value.
0336:基于每一点云数据帧及其对应的图像帧所对应相匹配的真值信息包括的标注框信息及检测信息包括的检测框信息,确定相匹配的真值信息及检测信息之间的检测框的长宽误差值。0336: Determine the detection between the matched true value information and the detection information based on the annotation frame information included in the matched true value information corresponding to each point cloud data frame and its corresponding image frame and the detection frame information included in the detection information The length and width error value of the box.
0337:基于相匹配的真值信息及检测信息之间的目标误差值以及该目标误差值对应的预设误差阈值,绘制目标误差值对应的误差曲线。0337: Draw an error curve corresponding to the target error value based on the target error value between the matched truth information and the detection information and the preset error threshold corresponding to the target error value.
其中,误差曲线的横轴为预设误差阈值,误差曲线的纵轴为目标误差值中小于各预设误差阈值的目标误差值的数量占评测数据集的数据总量的比值,目标误差值为:检测位置误差值、检测姿态误差值、检测速度误差值、检测加速度误差值或检测框的长宽误差值;The horizontal axis of the error curve is the preset error threshold, the vertical axis of the error curve is the ratio of the number of target error values smaller than each preset error threshold among the target error values to the total amount of data in the evaluation data set, and the target error value is : Detect position error value, detect attitude error value, detect speed error value, detect acceleration error value or detect frame length and width error value;
0338:按数值的大小,对相匹配的真值信息及检测信息之间的目标误差值进行排序,得到目标误差值对应的排序序列;确定目标误差值对应的排序序列中前第一百分比的目标误差值中数值最大的第一目标误差值,以及前第二百分比的目标误差值中数值最大的第二目标误差值。0338: Sort the target error values between the matched true value information and detection information according to the size of the value to obtain a sorting sequence corresponding to the target error value; determine the first percentage in the sorting sequence corresponding to the target error value The first target error value with the largest value among the target error values of , and the second target error value with the largest value among the target error values of the first second percentage.
0339:基于预设视觉感知算法对应的检测结果的精确率信息和召回率信息、目标误差信息对应的误差曲线、第一目标误差值及第二目标误差值和/或目标误差值,确定表征预设视觉感知算法的检测结果准确性的第一评测信息。0339: Based on the precision rate information and recall rate information of the detection result corresponding to the preset visual perception algorithm, the error curve corresponding to the target error information, the first target error value and the second target error value and/or the target error value, determine the characterization prediction. Assume the first evaluation information of the accuracy of the detection result of the visual perception algorithm.
本实现方式中,利用该预设视觉感知算法可以从图像帧中检测出其中各检测物体的2D信息和3D信息,包括各检测物体在图像帧中的二维位置信息、各检测物体在指定空间直角坐标系中的检测位置信息、检测姿态信息、检测速度信息以及检测加速度信息。In this implementation, the preset visual perception algorithm can be used to detect the 2D information and 3D information of each detected object from the image frame, including the two-dimensional position information of each detected object in the image frame, the detection object in the specified space The detection position information, the detection attitude information, the detection speed information and the detection acceleration information in the Cartesian coordinate system.
为了实现对预设视觉感知算法的检测结果准确性以及算法稳定性的多维度的评测,真值信息包括所对应物体的多种维度标注参数,可以包括但不限于所对应物体的标注位置信息、标注姿态、标注速度信息以及标注加速度信息。In order to realize the multi-dimensional evaluation of the accuracy of the detection results of the preset visual perception algorithm and the stability of the algorithm, the true value information includes various dimensional labeling parameters of the corresponding object, which may include but are not limited to the labeling position information of the corresponding object, Annotate attitude, annotate velocity information, and annotate acceleration information.
预设结果准确性评测规则可以包括指示确定检测结果的精确率(precision)信息以及召回率(recall)信息的规则,相应的,电子设备可以按照预设精确率信息确定方式以及预设召回率确定方式,基于一点云数据帧及其对应的图像帧所对应相匹配的真值信息及 检测信息、未匹配到真值信息的检测信息以及未匹配到检测信息的真值信息,确定预设视觉感知算法对应的检测结果的精确率信息和召回率信息。将预设视觉感知算法对应的检测结果的精确率信息和召回率信息,作为评测预设视觉感知算法对应的检测结果准确性的一种评测指标。其中,预设精确率信息确定方式以及预设召回率确定方式可以参照相关技术中的精确率信息确定方式以及召回率确定方式,在此不做赘述。The preset result accuracy evaluation rules may include rules indicating the determination of the precision information and the recall information of the detection results. Correspondingly, the electronic device may determine the preset precision information and the preset recall rate according to the determination method. method, based on the matching truth information and detection information corresponding to the one point cloud data frame and its corresponding image frame, the detection information that does not match the truth information, and the truth information that does not match the detection information, determine the preset visual perception The precision information and recall rate information of the detection results corresponding to the algorithm. The precision rate information and recall rate information of the detection result corresponding to the preset visual perception algorithm are used as an evaluation index for evaluating the accuracy of the detection result corresponding to the preset visual perception algorithm. The method for determining the preset precision rate information and the method for determining the preset recall rate may refer to the method for determining the accuracy rate information and the method for determining the recall rate in the related art, which will not be repeated here.
本实现方式中,新增了一种预设视觉感知算法对应的检测结果准确性的评测指标,为绘制一种特殊形式的误差曲线,进而通过该误差曲线来评测预设视觉感知算法对应的检测结果准确性。另一种新增的预设视觉感知算法对应的检测结果准确性的评测指标为:统计同一维度的误差值中不同占比对应的误差值数量,进而,基于该统计的同一维度的误差值中不同占比对应的误差值数量来评测预设视觉感知算法对应的检测结果准确性。In this implementation, an evaluation index for the accuracy of the detection result corresponding to the preset visual perception algorithm is added, in order to draw a special form of error curve, and then use the error curve to evaluate the detection corresponding to the preset visual perception algorithm Accuracy of results. Another new evaluation index for the accuracy of the detection results corresponding to the preset visual perception algorithm is: counting the number of error values corresponding to different proportions of the error values in the same dimension, and then, based on the statistics, among the error values in the same dimension The number of error values corresponding to different proportions is used to evaluate the accuracy of the detection results corresponding to the preset visual perception algorithm.
具体的,首先基于相匹配的真值信息的检测信息计算出不同维度对应的各误差值:Specifically, firstly, each error value corresponding to different dimensions is calculated based on the detection information of the matched truth information:
基于每一点云数据帧及其对应的图像帧所对应相匹配的真值信息包括的标注位置信息及检测信息包括的检测位置信息,确定相匹配的真值信息及检测信息之间的检测位置误差值。即针对每一相匹配的真值信息和检测信息,统一其中的标注位置信息和检测位置信息的坐标系,进而,计算统一坐标系之后的标注位置信息和检测位置信息的之间的检测位置误差值;其中,该检测位置误差值可以为标注位置信息和检测位置信息之间的绝对误差值和/或相对误差值。Based on the annotation position information included in the matched ground truth information corresponding to each point cloud data frame and its corresponding image frame, and the detection position information included in the detection information, determine the detection position error between the matched ground truth information and the detection information value. That is, for each matched true value information and detection information, unify the coordinate system of the marked position information and the detected position information, and then calculate the detected position error between the marked position information after the unified coordinate system and the detected position information. value; wherein, the detected position error value may be an absolute error value and/or a relative error value between the marked position information and the detected position information.
基于每一点云数据帧及其对应的图像帧所对应相匹配的真值信息包括的标注姿态信息及检测信息包括的检测姿态信息,确定相匹配的真值信息及检测信息之间的检测姿态误差值,即针对每一相匹配的真值信息和检测信息,统一其中的标注姿态信息和检测姿态信息的坐标系,进而,计算统一坐标系之后的标注姿态信息和检测姿态信息的之间的检测姿态误差值;其中,该检测姿态误差值可以为标注姿态信息和检测姿态信息之间的绝对误差值和/或相对误差值。Based on the annotation attitude information included in the matched ground truth information corresponding to each point cloud data frame and its corresponding image frame, and the detection attitude information included in the detection information, the detection attitude error between the matched ground truth information and the detection information is determined. value, that is, for each matched true value information and detection information, unify the coordinate system of the marked attitude information and the detected attitude information, and then calculate the detection between the marked attitude information and the detected attitude information after the unified coordinate system. Attitude error value; wherein, the detected attitude error value may be an absolute error value and/or a relative error value between the marked attitude information and the detected attitude information.
基于每一点云数据帧及其对应的图像帧所对应相匹配的真值信息包括的标注速度信息及检测信息包括的检测速度信息,确定相匹配的真值信息及检测信息之间的检测速度误差值,即针对每一相匹配的真值信息和检测信息,统一其中的标注速度信息和检测速度信息的坐标系,进而,计算统一坐标系之后的标注速度信息和检测速度信息的之间的检测速度误差值;其中,该检测速度误差值可以为标注速度信息和检测速度信息之间的绝对误差值和/或相对误差值。Based on the annotation velocity information included in the matched ground truth information corresponding to each point cloud data frame and its corresponding image frame, and the detection velocity information included in the detection information, the detection velocity error between the matched ground truth information and the detection information is determined. value, that is, for each matched true value information and detection information, unify the coordinate system of the labeling speed information and detection speed information, and then calculate the detection speed between the labeling speed information and the detection speed information after the unified coordinate system. Speed error value; wherein, the detected speed error value may be an absolute error value and/or a relative error value between the marked speed information and the detected speed information.
基于每一点云数据帧及其对应的图像帧所对应相匹配的真值信息包括的标注加速度信息及检测信息包括的检测加速度信息,确定相匹配的真值信息及检测信息之间的检测加速度误差值,即针对每一相匹配的真值信息和检测信息,统一其中的标注加速度信息 和检测加速度信息的坐标系,进而,计算统一坐标系之后的标注加速度信息和检测加速度信息的之间的检测加速度误差值;其中,该检测加速度误差值可以为标注加速度信息和检测加速度信息之间的绝对误差值和/或相对误差值。Based on the labeled acceleration information included in the matched true value information corresponding to each point cloud data frame and its corresponding image frame, and the detected acceleration information included in the detected information, determine the detected acceleration error between the matched true value information and the detected information. value, that is, for each matched true value information and detection information, unify the coordinate system of the marked acceleration information and the detected acceleration information, and then calculate the detection between the marked acceleration information and the detected acceleration information after the unified coordinate system. Acceleration error value; wherein, the detected acceleration error value may be an absolute error value and/or a relative error value between the marked acceleration information and the detected acceleration information.
基于每一点云数据帧及其对应的图像帧所对应相匹配的真值信息包括的标注框信息及检测信息包括的检测框信息,确定相匹配的真值信息及检测信息之间的检测框的长宽误差值;即针对每一相匹配的真值信息和检测信息,统一其中的标注框信息和检测框信息之间的尺度,基于统一尺度之后的标注框信息和检测框信息分别计算标注框与检测框的长和宽之间的误差,作为检测框的长宽误差值,其中,该检测框的长宽误差值可以为标注框和检测框之间的长和宽的绝对误差值和/或相对误差值。在一种情况中,还可以计算标注框与检测框的高之间的误差。Based on the annotation frame information included in the matched ground truth information corresponding to each point cloud data frame and its corresponding image frame, and the detection frame information included in the detection information, determine the difference between the matching ground truth information and the detection frame between the detection information. Length and width error value; that is, for each matched true value information and detection information, unify the scale between the annotation frame information and detection frame information, and calculate the annotation frame based on the unified scale information and detection frame information. The error between the length and width of the detection frame, as the length and width error value of the detection frame, where the length and width error value of the detection frame can be the absolute error value of the length and width between the label frame and the detection frame and/ or relative error value. In one case, the error between the heights of the label box and the detection box can also be calculated.
一种情况,电子设备可以分别依次将上述所确定的检测位置误差值、检测姿态误差值、检测速度误差值、检测加速度误差值以及检测框的长宽误差值,作为目标误差值;基于相匹配的真值信息及检测信息之间的目标误差值以及该目标误差值对应的预设误差阈值,绘制目标误差值对应的误差曲线。具体的,可以是:针对每一预设误差阈值,统计目标误差值中小于该预设误差阈值的目标误差值的数量,并计算该目标误差值中小于该预设误差阈值的目标误差值的数量占评测数据集的数据总量的比值;进而,以预设误差阈值为目标误差值对应的误差曲线的横轴,以目标误差值中小于各预设误差阈值的目标误差值的数量为目标误差值对应的误差曲线的纵轴,绘制该目标误差值对应的误差曲线。其中,该预设误差阈值包括多个,该预设误差阈值可以从0开始设置,依次递增。In one case, the electronic device may sequentially use the above-determined detection position error value, detection attitude error value, detection speed error value, detection acceleration error value, and length and width error value of the detection frame as the target error value; The target error value between the true value information and the detection information and the preset error threshold corresponding to the target error value, and the error curve corresponding to the target error value is drawn. Specifically, for each preset error threshold, count the number of target error values that are smaller than the preset error threshold among the target error values, and calculate the number of target error values that are smaller than the preset error threshold among the target error values. The ratio of the number to the total amount of data in the evaluation data set; further, taking the preset error threshold as the horizontal axis of the error curve corresponding to the target error value, and taking the number of target error values smaller than each preset error threshold as the target The vertical axis of the error curve corresponding to the error value, and the error curve corresponding to the target error value is drawn. Wherein, the preset error threshold includes a plurality of preset error thresholds, and the preset error threshold can be set from 0 and increase sequentially.
如图2所示,为所绘制的目标误差值对应的误差曲线的一种示例图。As shown in FIG. 2 , it is an example diagram of the error curve corresponding to the drawn target error value.
另一种情况,电子设备按数值的大小,对相匹配的真值信息及检测信息之间的目标误差值进行排序,得到目标误差值对应的排序序列;确定目标误差值对应的排序序列中前第一百分比的目标误差值中数值最大的第一目标误差值,以及前第二百分比的目标误差值中数值最大的第二目标误差值。其中,第一目标误差值可以称为1sigma,该第二目标误差之可以称为2sigma。In another case, the electronic device sorts the target error values between the matched true value information and detection information according to the size of the numerical value, and obtains a sorting sequence corresponding to the target error value; The first target error value with the largest value among the target error values of the first percentage, and the second target error value with the largest value among the first second percentage of target error values. The first target error value may be referred to as 1sigma, and the second target error value may be referred to as 2sigma.
举例而言,该前第一百分比可以为排序序列的前68.26%,该前第二百分比可以为排序序列的前95.44%。For example, the top first percentage may be the top 68.26% of the sorted sequences, and the top second percentage may be the top 95.44% of the sorted sequences.
后续的,电子设备可以基于预设视觉感知算法对应的检测结果的精确率信息和召回率信息、目标误差信息对应的误差曲线、第一目标误差值及第二目标误差值和/或目标误差值,确定表征预设视觉感知算法的检测结果准确性的第一评测信息。Subsequently, the electronic device may be based on the precision rate information and recall rate information of the detection result corresponding to the preset visual perception algorithm, the error curve corresponding to the target error information, the first target error value, the second target error value and/or the target error value. , and determine the first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm.
可以理解的是,预设视觉感知算法对应的检测结果的精确率信息越高和召回率信息越高,可以表征该预设视觉感知算法对应的检测结果准确性越高。It can be understood that, the higher the precision rate information and the higher the recall rate information of the detection result corresponding to the preset visual perception algorithm, the higher the accuracy of the detection result corresponding to the preset visual perception algorithm.
对于目标误差值对应的误差曲线而言,如图2所示,该目标误差值对应的误差曲线下的面积(AUC:Area Under Curve)越大即面积值越接近于1,可以表征相匹配的真值信息和检测信息中相应维度上的信息整体上的误差越小,即预设视觉感知算法的性能越好。例如:目标误差值为检测位置误差值,则度量了该预设视觉感知算法的位置判断能力,该检测位置误差值对应的误差曲线中曲线下的面积越大,则可以表明预设视觉感知算法的位置判断能力越好。又例如:目标误差值为检测姿态误差值,则度量了该预设视觉感知算法的姿态判断能力,该检测姿态误差值对应的误差曲线中曲线下的面积越大,则可以表明预设视觉感知算法的位置判断能力越好。For the error curve corresponding to the target error value, as shown in Figure 2, the larger the area under the error curve (AUC: Area Under Curve) corresponding to the target error value, the closer the area value is to 1, which can represent the matching The smaller the overall error of the information on the corresponding dimension in the true value information and the detection information, the better the performance of the preset visual perception algorithm. For example, the target error value is the detection position error value, which measures the position judgment ability of the preset visual perception algorithm. The larger the area under the curve in the error curve corresponding to the detection position error value, it can indicate the preset visual perception algorithm. The better the location judgment ability. Another example: the target error value is the detection attitude error value, which measures the attitude judgment ability of the preset visual perception algorithm, and the larger the area under the curve in the error curve corresponding to the detection attitude error value, it can indicate the preset visual perception The better the location judgment ability of the algorithm.
对于统计同一维度的误差值中不同占比对应的误差值数量这一评测指标而言,所确定出的第一目标误差值和第二目标误差值的值越小,该预设视觉感知算法的性能越好,即检测结果的准确性越高。For the evaluation index of counting the number of error values corresponding to different proportions of the error values in the same dimension, the smaller the determined values of the first target error value and the second target error value, the smaller the value of the preset visual perception algorithm. The better the performance, the higher the accuracy of the detection results.
在一种实现中,预设视觉感知算法对应的检测结果准确性的评测指标还可以包括但不限于:P-R曲线以及检测结果中出现物体误检以及漏检的情况在整个检测结果中的占比等度量指标,其中,P-R曲线为该精确率信息和召回率信息曲线,为以召回率信息作为横坐标轴,精确率信息作为纵坐标轴所绘制的曲线。In one implementation, the evaluation index for the accuracy of the detection result corresponding to the preset visual perception algorithm may also include, but is not limited to, the PR curve and the proportion of false detections and missed detections of objects in the detection results in the entire detection results and other metrics, wherein the PR curve is the precision rate information and the recall rate information curve, which is a curve drawn with the recall rate information as the abscissa axis and the precision rate information as the ordinate axis.
为了提高对预设视觉感知算法的评测的全面性,本发明实施例中除了针对预设视觉感知算法对应的检测结果准确性进行评测,还提供了对预设视觉感知算法的算法稳定性的评测。在本发明的另一实施例中,所述034,可以包括如下步骤:In order to improve the comprehensiveness of the evaluation of the preset visual perception algorithm, the embodiment of the present invention not only evaluates the accuracy of the detection result corresponding to the preset visual perception algorithm, but also provides an evaluation of the algorithm stability of the preset visual perception algorithm . In another embodiment of the present invention, the 034 may include the following steps:
0341:基于评测数据集中的点云数据帧或图像帧之间的时序信息,从目标误差值中确定出对应同一物体对应的目标误差值;0341: Determine the target error value corresponding to the same object from the target error value based on the time series information between the point cloud data frames or the image frames in the evaluation data set;
0342:针对不同物体对应的目标误差值,基于评测数据集中的点云数据帧或图像帧之间的时序信息,该物体对应的目标误差值以及预设曲线拟合算法,拟合得到该物体对应的目标误差值对应的拟合误差曲线,0342: For the target error values corresponding to different objects, based on the time series information between point cloud data frames or image frames in the evaluation data set, the target error value corresponding to the object and the preset curve fitting algorithm, the corresponding object corresponding to the object is obtained by fitting. The fitted error curve corresponding to the target error value of ,
其中,该拟合误差曲线包括:拟合所得的该物体在各点云数据帧或图像帧所对应采集时刻对应的拟合误差;Wherein, the fitting error curve includes: fitting errors corresponding to the object obtained by fitting at the acquisition moments corresponding to each point cloud data frame or image frame;
0343:针对不同物体对应的目标误差值,基于该物体对应的目标误差值,以及该物体对应的目标误差值所对应拟合误差曲线所包含该物体在各点云数据帧或图像帧所对应采集时刻对应的拟合误差,确定表征所述预设视觉感知算法的算法稳定性的第二评测信息。0343: For the target error value corresponding to different objects, based on the target error value corresponding to the object and the fitting error curve corresponding to the target error value corresponding to the object, the object is collected in each point cloud data frame or image frame corresponding to the object. The fitting error corresponding to the moment is determined, and the second evaluation information representing the algorithm stability of the preset visual perception algorithm is determined.
本实现方式中,评测数据集中的点云数据帧之间存在连续性即存在时序上的连续,图像帧之间存在连续性即存在时序上的连续。鉴于此,电子设备可以基于评测数据集中的点云数据帧或图像帧之间的时序信息,从目标误差值中确定出对应同一物体对应的目 标误差值,其中,该同一物体对应的目标误差值按其所对应点云数据帧或图像帧之间的时序信息进行排列。In this implementation manner, the existence of continuity between the point cloud data frames in the evaluation data set means that there is continuity in time series, and the existence of continuity between image frames means that there is continuity in time series. In view of this, the electronic device can determine the target error value corresponding to the same object from the target error value based on the time series information between the point cloud data frames or the image frames in the evaluation data set, wherein the target error value corresponding to the same object Arrange according to the time series information between the corresponding point cloud data frames or image frames.
相应的,电子设备针对不同物体对应的目标误差值,基于评测数据集中的点云数据帧或图像帧之间的时序信息,该物体对应的目标误差值以及预设曲线拟合算法,拟合得到该物体对应的目标误差值对应的拟合误差曲线,该物体对应的目标误差值对应的拟合误差曲线可以包括:拟合所得的该物体在各点云数据帧或图像帧所对应采集时刻对应的拟合误差。其中,该预设曲线拟合算法可以为相关技术中的任一类型的曲线拟合算法,本发明实施例并不对此进行限定。Correspondingly, the electronic device obtains the target error value corresponding to different objects based on the point cloud data frame or the timing information between the image frames in the evaluation data set, the target error value corresponding to the object and the preset curve fitting algorithm. The fitting error curve corresponding to the target error value corresponding to the object, and the fitting error curve corresponding to the target error value corresponding to the object may include: the fitting obtained by the object corresponds to the acquisition moment corresponding to each point cloud data frame or image frame. fitting error. The preset curve fitting algorithm may be any type of curve fitting algorithm in the related art, which is not limited in this embodiment of the present invention.
后续的,电子设备针对不同物体对应的目标误差值以及该物体对应的目标误差值所对应拟合误差曲线所包含该物体在各点云数据帧或图像帧所对应采集时刻对应的拟合误差,确定表征预设视觉感知算法的算法稳定性的第二评测信息。在本发明的另一实施例中,所述0343,包括:Subsequently, the target error value corresponding to the different objects and the fitting error curve corresponding to the target error value corresponding to the object by the electronic device includes the fitting error corresponding to the object at the acquisition time corresponding to each point cloud data frame or image frame, A second evaluation information that characterizes the algorithm stability of the preset visual perception algorithm is determined. In another embodiment of the present invention, the 0343 includes:
03431:针对不同物体对应的目标误差值,基于该物体对应的目标误差值,以及该物体对应的目标误差值所对应拟合误差曲线所包含该物体在各点云数据帧或图像帧所对应采集时刻对应的拟合误差,计算对应同一采集时刻的目标误差与拟合误差的差值;03431: For the target error value corresponding to different objects, based on the target error value corresponding to the object and the fitting error curve corresponding to the target error value corresponding to the object, the object is collected in each point cloud data frame or image frame corresponding to the object. The fitting error corresponding to the moment is calculated, and the difference between the target error and the fitting error corresponding to the same acquisition moment is calculated;
03432:针对不同物体对应的目标误差值,基于该物体对应的各采集时刻的目标误差与拟合误差的差值,以及该物体所对应目标误差值对应的预设差值阈值,绘制该物体所对应目标误差对应的差值误差曲线。03432: For the target error values corresponding to different objects, based on the difference between the target error and the fitting error at each acquisition time corresponding to the object, and the preset difference threshold corresponding to the target error value corresponding to the object, draw the target error value corresponding to the object. The difference error curve corresponding to the target error.
其中,该物体所对应目标误差对应的差值误差曲线的横轴为该物体所对应目标误差值对应的预设差值阈值,该物体所对应目标误差对应的差值误差曲线的纵轴为该物体所对应目标误差值对应的差值小于各预设差值阈值的数量、占该物体所对应目标误差值总数量的比值。Wherein, the horizontal axis of the difference error curve corresponding to the target error corresponding to the object is the preset difference threshold corresponding to the target error value corresponding to the object, and the vertical axis of the difference error curve corresponding to the target error corresponding to the object is the The difference value corresponding to the target error value corresponding to the object is smaller than the number of each preset difference value threshold, and the ratio of the total number of target error values corresponding to the object.
03433:针对不同物体对应的目标误差值,按数值的大小,对该物体所对应目标误差值对应的差值进行排序,确定排序序列中前第三百分比的差值中数值最大的第一差值,以及前第四百分比的差值中数值最大的第二差值;03433: For the target error values corresponding to different objects, sort the difference values corresponding to the target error values corresponding to the objects according to the magnitude of the value, and determine the first one with the largest value among the first third percentage differences in the sorting sequence. The difference, and the second difference with the largest value among the differences in the first fourth percentile;
03434:基于各物体所对应目标误差值对应的第一差值和第二差值,和/或各物体所对应目标误差值对应的差值误差曲线,确定表征预设视觉感知算法的算法稳定性的第二评测信息。03434: Based on the first difference and the second difference corresponding to the target error value corresponding to each object, and/or the difference error curve corresponding to the target error value corresponding to each object, determine the algorithm stability that characterizes the preset visual perception algorithm of the second evaluation information.
本实现方式中,该物体对应的目标误差值对应不同的点云数据帧或图像帧,每一点云数据帧或图像帧对应有采集时刻。相应的,一种实现中,该物体对应的目标误差值对应不同的采集时刻。鉴于此,电子设备可以针对不同物体对应的目标误差值,基于该物体对应的目标误差值,以及该物体对应的目标误差值所对应拟合误差曲线所包含该物体 在各点云数据帧或图像帧所对应采集时刻对应的拟合误差,计算对应同一采集时刻的目标误差与拟合误差的差值。In this implementation manner, the target error value corresponding to the object corresponds to different point cloud data frames or image frames, and each point cloud data frame or image frame corresponds to a collection moment. Correspondingly, in one implementation, the target error values corresponding to the object correspond to different collection moments. In view of this, the electronic device can target the target error values corresponding to different objects based on the target error values corresponding to the objects and the fitted error curves corresponding to the target error values corresponding to the objects contained in the object in each point cloud data frame or image. The fitting error corresponding to the acquisition moment corresponding to the frame is calculated, and the difference between the target error and the fitting error corresponding to the same acquisition moment is calculated.
进而,针对该物体所对应目标误差值对应的不同预设差值阈值,统计该物体对应的各采集时刻的目标误差与拟合误差的差值中,小于该预设差值阈值的差值的数量;并计算小于该预设差值阈值的差值的数量与该物体所对应目标误差值总数量的比值;以该物体所对应目标误差值对应的差值小于各预设差值阈值的数量、占该物体所对应目标误差值总数量的比值为纵轴,以该物体所对应目标误差值对应的预设差值阈值为横轴,绘制该物体所对应目标误差对应的差值误差曲线。Further, with respect to different preset difference thresholds corresponding to the target error value corresponding to the object, count the difference between the target error and the fitting error at each acquisition time corresponding to the object, which are smaller than the difference between the preset difference thresholds. and calculate the ratio of the number of differences less than the preset difference threshold to the total number of target error values corresponding to the object; take the difference corresponding to the target error value corresponding to the object less than the number of preset difference thresholds , the ratio of the total number of target error values corresponding to the object is the vertical axis, and the preset difference threshold corresponding to the target error value corresponding to the object is the horizontal axis, and the difference error curve corresponding to the target error corresponding to the object is drawn.
另一种实现中,针对不同物体对应的目标误差值,电子设备可以按数值的大小,对该物体所对应目标误差值对应的差值进行排序,确定排序序列中前第三百分比的差值中数值最大的第一差值,以及前第四百分比的差值中数值最大的第二差值。其中,该第三百分比可以与第一百分比相同或不同,该第四百分比可以与第二百分比相同或不同。In another implementation, for the target error values corresponding to different objects, the electronic device can sort the difference values corresponding to the target error values corresponding to the objects according to the magnitude of the numerical value, and determine the difference of the first third percentage in the sorting sequence. The first difference with the largest value among the values, and the second difference with the largest value among the first fourth percentile differences. Wherein, the third percentage may be the same or different from the first percentage, and the fourth percentage may be the same or different from the second percentage.
后续的,电子设备可以基于各物体所对应目标误差值对应的第一差值和第二差值,和/或各物体所对应目标误差对应的差值误差曲线,确定表征预设视觉感知算法的算法稳定性的第二评测信息。其中,各物体所对应目标误差值对应的第一差值和第二差值之间的取值越小,可以表征该物体对应的各采集时刻的目标误差与拟合误差的差值越小,相应的,可以表征该预设视觉感知算法的在该目标误差值对应的维度上的算法稳定性越好。Subsequently, the electronic device may determine the value representing the preset visual perception algorithm based on the first difference value and the second difference value corresponding to the target error value corresponding to each object, and/or the difference value error curve corresponding to the target error value corresponding to each object. Second evaluation information for algorithm stability. Wherein, the smaller the value between the first difference and the second difference corresponding to the target error value corresponding to each object, the smaller the difference between the target error and the fitting error at each acquisition moment corresponding to the object can be represented, Correspondingly, it can be characterized that the preset visual perception algorithm has better algorithm stability in the dimension corresponding to the target error value.
各物体所对应目标误差值对应的差值误差曲线下的面积越大即面积值越接近于1,可以表征该物体对应的各采集时刻的目标误差与拟合误差在整体上的误差越小,即预设视觉感知算法的性能越好,该预设视觉感知算法在该目标误差值对应的维度上的检测稳定性越高。The larger the area under the difference error curve corresponding to the target error value corresponding to each object, that is, the closer the area value is to 1, the smaller the overall error between the target error and the fitting error at each acquisition time corresponding to the object can be represented. That is, the better the performance of the preset visual perception algorithm, the higher the detection stability of the preset visual perception algorithm in the dimension corresponding to the target error value.
为了提供用户体验,帮助用户理解评测信息,本发明实施例可以对评测信息以及评测过程中所产生的中间信息进行可视化展示。在本发明的另一实施例中,所述方法还包括:In order to provide user experience and help users understand the evaluation information, the embodiment of the present invention can visually display the evaluation information and the intermediate information generated in the evaluation process. In another embodiment of the present invention, the method further includes:
显示第一评测信息、第二评测信息、预设视觉感知算法对应的检测结果的精确率信息和召回率信息、目标误差信息对应的误差曲线、第一目标误差值及第二目标误差值、目标误差值、所确定出的同一物体对应的目标误差值和/或拟合所得的每一物体对应的目标误差值对应的拟合误差曲线。Display the first evaluation information, the second evaluation information, the precision rate information and recall rate information of the detection result corresponding to the preset visual perception algorithm, the error curve corresponding to the target error information, the first target error value and the second target error value, the target The error value, the determined target error value corresponding to the same object, and/or the fitting error curve corresponding to the target error value corresponding to each object obtained by fitting.
其中,电子设备可以以基于网页的可视化的方式、基于3D渲染的可视化的方式,对评测信息以及评测过程中所产生的中间信息进行直观展示,并可以提供与用户交互的功能。Among them, the electronic device can visually display the evaluation information and the intermediate information generated during the evaluation process in a web-based visualization method and a 3D rendering-based visualization method, and can provide a function of interacting with the user.
相应于上述方法实施例,本发明实施例提供了一种视觉感知算法的评测装置,如图 3所示,所述装置可以包括:Corresponding to the above method embodiment, the embodiment of the present invention provides an evaluation device for a visual perception algorithm. As shown in FIG. 3 , the device may include:
第一获得模块310,被配置为获得基于评测数据集中的每一点云数据帧确定的各物体对应的真值信息,其中,所述真值信息至少包括所对应物体的标注位姿信息以及物体运动信息,每一评测数据包括存在对应关系的点云数据帧和图像帧;The first obtaining module 310 is configured to obtain the truth value information corresponding to each object determined based on each point cloud data frame in the evaluation data set, wherein the truth value information at least includes the marked pose information of the corresponding object and the motion of the object information, each evaluation data includes point cloud data frame and image frame with corresponding relationship;
第二获得模块320,被配置为获得基于预设视觉检测算法以及所述评测数据集中每一图像帧检测出的各检测物体对应的检测信息,其中,所述检测信息至少包括所对应检测物体的检测位姿信息以及检测运动信息;The second obtaining module 320 is configured to obtain the detection information corresponding to each detected object detected based on the preset visual detection algorithm and each image frame in the evaluation data set, wherein the detection information at least includes the corresponding detected object. Detect pose information and detect motion information;
确定模块330,被配置为基于预设结果准确性评测规则、预设算法稳定性评测规则、各物体对应的真值信息中的标注位姿信息和物体运动信息以及各检测物体对应的检测信息中的检测位姿信息和检测运动信息,确定所述预设视觉感知算法对应的评测信息,其中,所述评测信息包括:表征所述预设视觉感知算法的检测结果准确性的第一评测信息以及算法稳定性的第二评测信息。The determination module 330 is configured to be based on the preset result accuracy evaluation rules, the preset algorithm stability evaluation rules, the labeled pose information and object motion information in the true value information corresponding to each object, and the detection information corresponding to each detected object. The detected pose information and detected motion information are determined, and the evaluation information corresponding to the preset visual perception algorithm is determined, wherein the evaluation information includes: first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm and Second evaluation information for algorithm stability.
应用本发明实施例,可以基于预设结果准确性评测规则、预设算法稳定性评测规则、各物体对应的真值信息中的标注位姿信息和物体运动信息以及各检测物体对应的检测信息中的检测位姿信息和检测运动信息,确定出预设视觉感知算法的多方面检测结果的准确性的第一评测信息,以及算法稳定性的第二评测信息,从多方面检测结果的准确性和算法的检测结果的稳定性来对预设视觉感知算法进行评测,实现对视觉感知算法的性能的全面评测。Applying the embodiments of the present invention, it can be based on the preset result accuracy evaluation rules, the preset algorithm stability evaluation rules, the labeled pose information and object motion information in the true value information corresponding to each object, and the detection information corresponding to each detected object. The first evaluation information for the accuracy of the multi-faceted detection results of the preset visual perception algorithm and the second evaluation information for the stability of the algorithm are determined. The accuracy of the multi-faceted detection results and the The stability of the detection results of the algorithm is used to evaluate the preset visual perception algorithm, so as to realize a comprehensive evaluation of the performance of the visual perception algorithm.
在本发明的另一实施例中,所述第一获得模块310,被具体配置为获得评测数据集;In another embodiment of the present invention, the first obtaining module 310 is specifically configured to obtain an evaluation data set;
基于预先训练的三维数据感知模型对所述评测数据集中的每一点云数据帧进行标注,标注出每一点云数据帧对应的各物体的标注框信息,以确定每一点云数据帧所对应各物体的标注位置信息和标注姿态信息,得到每一点云数据帧所对应各物体的标注位姿信息;Label each point cloud data frame in the evaluation data set based on the pre-trained 3D data perception model, and mark the label frame information of each object corresponding to each point cloud data frame, so as to determine each object corresponding to each point cloud data frame The labeled position information and labeled pose information of each point cloud data frame are obtained, and the labeled pose information of each object corresponding to each point cloud data frame is obtained;
基于所述评测数据集中每一点云数据帧所对应各物体的标注位置信息和标注姿态信息,以及所述评测数据集中每一点云数据帧之间的时序信息,确定各点云数据帧所对应各物体的标注速度信息以及标注加速度信息,以得到每一点云数据帧所对应各物体的物体运动信息,得到每一点云数据帧所对应各物体对应的真值信息。Based on the labeling position information and labeling attitude information of each object corresponding to each point cloud data frame in the evaluation data set, and the time sequence information between each point cloud data frame in the evaluation data set, determine each point cloud data frame corresponding to each object. The marked velocity information and the marked acceleration information of the object are used to obtain the object motion information of each object corresponding to each point cloud data frame, and the true value information corresponding to each object corresponding to each point cloud data frame is obtained.
在本发明的另一实施例中,所述第二获得模块320,被具体配置为基于预设视觉感知算法,对所述评测数据集中每一图像帧进行检测,得到每一图像帧对应的所检测出的各物体所对应检测框信息,以确定每一图像帧对应的所检测出的各检测物体的检测位置信息和检测姿态信息,得到每一图像帧所对应各检测物体的检测位姿信息;In another embodiment of the present invention, the second obtaining module 320 is specifically configured to detect each image frame in the evaluation data set based on a preset visual perception algorithm, and obtain the corresponding image frame of each image frame. The detection frame information corresponding to each detected object is used to determine the detection position information and detection posture information of each detected detection object corresponding to each image frame, and the detection posture information of each detection object corresponding to each image frame is obtained. ;
基于预设视觉感知算法以及每一图像帧所对应各检测物体的检测位置信息和检测姿态信息,确定出每一图像帧所对应各检测物体的检测速度信息和检测加速度信息,以得 到每一图像帧所对应各检测物体的检测运动信息,得到每一图像帧所对应各检测物体对应的检测信息。Based on the preset visual perception algorithm and the detection position information and detection attitude information of each detection object corresponding to each image frame, the detection speed information and detection acceleration information of each detection object corresponding to each image frame are determined to obtain each image frame. The detection motion information of each detection object corresponding to the frame is obtained, and the detection information corresponding to each detection object corresponding to each image frame is obtained.
在本发明的另一实施例中,所述真值信息包括每一点云数据帧所对应各物体的标注框信息,所述检测信息包括每一图像帧对应的所检测出的各检测物体的检测框信息,所述检测框信息包括所对应检测物体在图像帧中的二维位置信息;In another embodiment of the present invention, the ground truth information includes frame information of each object corresponding to each point cloud data frame, and the detection information includes detection of each detected object corresponding to each image frame Frame information, the detection frame information includes the two-dimensional position information of the corresponding detected object in the image frame;
所述确定模块330,包括:The determining module 330 includes:
第一确定单元(图中未示出),被配置为针对每一点云数据帧对应的各物体,基于该物体对应的标注框信息、点云数据帧采集设备与图像帧采集设备之间的位置转换关系以及所述图像帧采集设备的内参信息,确定该物体对应的标注框投影至该点云数据帧对应的图像帧中的投影框的投影位置信息,作为该物体对应的投影框位置信息;The first determination unit (not shown in the figure) is configured to, for each object corresponding to each point cloud data frame, based on the corresponding annotation frame information of the object, the position between the point cloud data frame acquisition device and the image frame acquisition device The conversion relationship and the internal reference information of the image frame acquisition device, determine the projection position information of the annotation frame corresponding to the object projected to the projection frame in the image frame corresponding to the point cloud data frame, as the projection frame position information corresponding to the object;
第二确定单元(图中未示出),被配置为针对每一点云数据帧对应的各物体,基于各物体对应的投影框位置信息和该点云数据帧对应的图像帧中各检测物体的二维位置信息,确定相匹配的投影框位置信息和二维位置信息,以确定每一点云数据帧及其对应的图像帧所对应相匹配的真值信息及检测信息,其中,相匹配的投影框位置信息和二维位置信息为:所对应框的交并比数值超过预设交并比阈值的投影框位置信息和二维位置信息;The second determination unit (not shown in the figure) is configured to, for each object corresponding to each point cloud data frame, based on the position information of the projection frame corresponding to each object and the position information of each detected object in the image frame corresponding to the point cloud data frame Two-dimensional position information, determine the matching projection frame position information and two-dimensional position information to determine the matching truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, wherein the matching projection The frame position information and the two-dimensional position information are: the projection frame position information and the two-dimensional position information of which the intersection ratio value of the corresponding frame exceeds the preset intersection ratio threshold;
第三确定单元(图中未示出),被配置为基于预设结果准确性评测规则、每一点云数据帧及其对应的图像帧所对应相匹配的真值信息及检测信息、未匹配到真值信息的检测信息以及未匹配到检测信息的真值信息,确定表征所述预设视觉感知算法的检测结果准确性的第一评测信息;The third determination unit (not shown in the figure) is configured to be based on the preset result accuracy evaluation rule, the matched truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, and the unmatched The detection information of the true value information and the true value information that does not match the detection information, determine the first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm;
第四确定单元(图中未示出),被配置为基于预设算法稳定性评测规则、每一点云数据帧及其对应的图像帧所对应相匹配的真值信息及检测信息,确定表征所述预设视觉感知算法的算法稳定性的第二评测信息。The fourth determination unit (not shown in the figure) is configured to determine, based on the preset algorithm stability evaluation rule, the matched truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, The second evaluation information of the algorithm stability of the preset visual perception algorithm.
在本发明的另一实施例中,所述检测位姿信息包括:每一图像帧对应的所检测出的各检测物体的通过其检测框信息所确定的检测位置信息和检测姿态信息;所述检测运动信息包括:每一图像帧对应的所检测出的各检测物体的检测速度信息和检测加速度信息;所述标注位姿信息包括:每一点云数据帧所对应各物体的通过其标注框信息所确定的标注位置信息和标注姿态信息;所述物体运动信息包括:每一点云数据帧所对应各物体的标注速度信息以及标注加速度信息;In another embodiment of the present invention, the detection posture information includes: detection position information and detection posture information determined by the detection frame information of each detected object corresponding to each image frame; the The detected motion information includes: the detected speed information and detected acceleration information of each detected object corresponding to each image frame; the labeled pose information includes: each object corresponding to each point cloud data frame through its labeled frame information The determined labeling position information and labeling attitude information; the object motion information includes: labeling speed information and labeling acceleration information of each object corresponding to each point cloud data frame;
所述第三确定单元,被具体配置为基于每一点云数据帧及其对应的图像帧所对应相匹配的真值信息及检测信息、未匹配到真值信息的检测信息以及未匹配到检测信息的真值信息,确定所述预设视觉感知算法对应的检测结果的精确率信息和召回率信息;The third determining unit is specifically configured to be based on the matching truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, the detection information not matching the truth information, and the detection information not matching The truth value information, determine the precision rate information and recall rate information of the detection result corresponding to the preset visual perception algorithm;
基于每一点云数据帧及其对应的图像帧所对应相匹配的真值信息包括的标注位置信息及检测信息包括的检测位置信息,确定相匹配的真值信息及检测信息之间的检测位置误差值;Based on the annotation position information included in the matched ground truth information corresponding to each point cloud data frame and its corresponding image frame, and the detection position information included in the detection information, determine the detection position error between the matched ground truth information and the detection information value;
基于每一点云数据帧及其对应的图像帧所对应相匹配的真值信息包括的标注姿态信息及检测信息包括的检测姿态信息,确定相匹配的真值信息及检测信息之间的检测姿态误差值;Based on the annotation attitude information included in the matched ground truth information corresponding to each point cloud data frame and its corresponding image frame, and the detection attitude information included in the detection information, the detection attitude error between the matched ground truth information and the detection information is determined. value;
基于每一点云数据帧及其对应的图像帧所对应相匹配的真值信息包括的标注速度信息及检测信息包括的检测速度信息,确定相匹配的真值信息及检测信息之间的检测速度误差值;Based on the annotation velocity information included in the matched ground truth information corresponding to each point cloud data frame and its corresponding image frame, and the detection velocity information included in the detection information, the detection velocity error between the matched ground truth information and the detection information is determined. value;
基于每一点云数据帧及其对应的图像帧所对应相匹配的真值信息包括的标注加速度信息及检测信息包括的检测加速度信息,确定相匹配的真值信息及检测信息之间的检测加速度误差值;Based on the labeled acceleration information included in the matched true value information corresponding to each point cloud data frame and its corresponding image frame, and the detected acceleration information included in the detected information, determine the detected acceleration error between the matched true value information and the detected information. value;
基于每一点云数据帧及其对应的图像帧所对应相匹配的真值信息包括的标注框信息及检测信息包括的检测框信息,确定相匹配的真值信息及检测信息之间的检测框的长宽误差值;Based on the annotation frame information included in the matched ground truth information corresponding to each point cloud data frame and its corresponding image frame, and the detection frame information included in the detection information, determine the difference between the matching ground truth information and the detection frame between the detection information. Length and width error value;
基于相匹配的真值信息及检测信息之间的目标误差值以及该目标误差值对应的预设误差阈值,绘制所述目标误差值对应的误差曲线,其中,所述误差曲线的横轴为预设误差阈值,所述误差曲线的纵轴为所述目标误差值中小于各预设误差阈值的目标误差值的数量占评测数据集的数据总量的比值,所述目标误差值为:检测位置误差值、检测姿态误差值、检测速度误差值、检测加速度误差值或检测框的长宽误差值;Based on the target error value between the matched true value information and detection information and the preset error threshold corresponding to the target error value, an error curve corresponding to the target error value is drawn, wherein the horizontal axis of the error curve is the prediction value. Set the error threshold, the vertical axis of the error curve is the ratio of the number of target error values smaller than each preset error threshold in the target error value to the total amount of data in the evaluation data set, and the target error value is: the detection position Error value, detection attitude error value, detection speed error value, detection acceleration error value or length and width error value of detection frame;
按数值的大小,对相匹配的真值信息及检测信息之间的目标误差值进行排序,得到所述目标误差值对应的排序序列;确定所述目标误差值对应的排序序列中前第一百分比的目标误差值中数值最大的第一目标误差值,以及前第二百分比的目标误差值中数值最大的第二目标误差值;Sort the target error values between the matched true value information and detection information according to the size of the value to obtain a sorting sequence corresponding to the target error value; determine the top one hundred in the sorting sequence corresponding to the target error value The first target error value with the largest numerical value among the target error values of the percentage, and the second target error value with the largest numerical value among the target error values of the first second percentage;
基于所述预设视觉感知算法对应的检测结果的精确率信息和召回率信息、所述目标误差信息对应的误差曲线、第一目标误差值及第二目标误差值和/或所述目标误差值,确定表征所述预设视觉感知算法的检测结果准确性的第一评测信息。The precision rate information and recall rate information based on the detection result corresponding to the preset visual perception algorithm, the error curve corresponding to the target error information, the first target error value and the second target error value and/or the target error value , and determine the first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm.
在本发明的另一实施例中,所述第四确定单元,被具体配置为基于所述评测数据集中的点云数据帧或图像帧之间的时序信息,从所述目标误差值中确定出对应同一物体对应的目标误差值;In another embodiment of the present invention, the fourth determining unit is specifically configured to determine from the target error value based on the time series information between point cloud data frames or image frames in the evaluation data set The target error value corresponding to the same object;
针对不同物体对应的目标误差值,基于所述评测数据集中的点云数据帧或图像帧之间的时序信息,该物体对应的目标误差值以及预设曲线拟合算法,拟合得到该物体对应 的目标误差值对应的拟合误差曲线,其中,该拟合误差曲线包括:拟合所得的该物体在各点云数据帧或图像帧所对应采集时刻对应的拟合误差;For the target error values corresponding to different objects, based on the time series information between point cloud data frames or image frames in the evaluation data set, the target error value corresponding to the object and the preset curve fitting algorithm, the corresponding object is obtained by fitting. The fitting error curve corresponding to the target error value of , wherein the fitting error curve includes: the fitting error corresponding to the object obtained by fitting at the acquisition moment corresponding to each point cloud data frame or image frame;
针对不同物体对应的目标误差值,基于该物体对应的目标误差值,以及该物体对应的目标误差值所对应拟合误差曲线所包含该物体在各点云数据帧或图像帧所对应采集时刻对应的拟合误差,确定表征所述预设视觉感知算法的算法稳定性的第二评测信息。For the target error value corresponding to different objects, based on the target error value corresponding to the object and the fitting error curve corresponding to the target error value corresponding to the object, the object corresponds to the acquisition time corresponding to each point cloud data frame or image frame. The fitting error is determined, and the second evaluation information representing the algorithm stability of the preset visual perception algorithm is determined.
在本发明的另一实施例中,所述第四确定单元,被具体配置为针对不同物体对应的目标误差值,基于该物体对应的目标误差值,以及该物体对应的目标误差值所对应拟合误差曲线所包含该物体在各点云数据帧或图像帧所对应采集时刻对应的拟合误差,计算对应同一采集时刻的目标误差与拟合误差的差值;In another embodiment of the present invention, the fourth determining unit is specifically configured to target error values corresponding to different objects, based on the target error values corresponding to the objects and the target error values corresponding to the objects. The fitting error of the object included in the fitting error curve corresponding to the acquisition time corresponding to each point cloud data frame or image frame, and the difference between the target error and the fitting error corresponding to the same acquisition time is calculated;
针对不同物体对应的目标误差值,基于该物体对应的各采集时刻的目标误差与拟合误差的差值,以及该物体所对应目标误差值对应的预设差值阈值,绘制该物体所对应目标误差对应的差值误差曲线,其中,该物体所对应目标误差对应的差值误差曲线的横轴为该物体所对应目标误差值对应的预设差值阈值,该物体所对应目标误差对应的差值误差曲线的纵轴为该物体所对应目标误差值对应的差值小于各预设差值阈值的数量、占该物体所对应目标误差值总数量的比值;For the target error values corresponding to different objects, the target corresponding to the object is drawn based on the difference between the target error and the fitting error at each acquisition time corresponding to the object, and the preset difference threshold corresponding to the target error value corresponding to the object. The difference error curve corresponding to the error, wherein the horizontal axis of the difference error curve corresponding to the target error corresponding to the object is the preset difference threshold corresponding to the target error value corresponding to the object, and the difference value corresponding to the target error corresponding to the object The vertical axis of the value error curve is the ratio of the number of differences corresponding to the target error values corresponding to the object that are smaller than the preset difference thresholds, accounting for the total number of target error values corresponding to the object;
针对不同物体对应的目标误差值,按数值的大小,对该物体所对应目标误差值对应的差值进行排序,确定排序序列中前第三百分比的差值中数值最大的第一差值,以及前第四百分比的差值中数值最大的第二差值;For the target error values corresponding to different objects, sort the difference values corresponding to the target error values corresponding to the objects according to the size of the value, and determine the first difference value with the largest value among the first third percentage differences in the sorting sequence. , and the second difference with the largest value among the differences of the first fourth percentage;
基于各物体所对应目标误差值对应的所述第一差值和所述第二差值,和/或各物体所对应目标误差值对应的差值误差曲线,确定表征所述预设视觉感知算法的算法稳定性的第二评测信息。Based on the first difference value and the second difference value corresponding to the target error value corresponding to each object, and/or the difference value error curve corresponding to the target error value corresponding to each object, it is determined to characterize the preset visual perception algorithm The second evaluation information of the algorithm stability.
在本发明的另一实施例中,所述装置还包括:In another embodiment of the present invention, the device further includes:
显示模块(图中未示出),被配置为显示所述第一评测信息、所述第二评测信息、所述预设视觉感知算法对应的检测结果的精确率信息和召回率信息、所述目标误差信息对应的误差曲线、所述第一目标误差值及所述第二目标误差值、所述目标误差值、所确定出的同一物体对应的目标误差值和/或拟合所得的每一物体对应的目标误差值对应的拟合误差曲线。A display module (not shown in the figure), configured to display the first evaluation information, the second evaluation information, the precision rate information and the recall rate information of the detection result corresponding to the preset visual perception algorithm, the The error curve corresponding to the target error information, the first target error value and the second target error value, the target error value, the determined target error value corresponding to the same object, and/or each obtained by fitting. The fitting error curve corresponding to the target error value corresponding to the object.

Claims (10)

  1. 一种视觉感知算法的评测方法,其特征在于,所述方法包括:A method for evaluating a visual perception algorithm, wherein the method comprises:
    获得基于评测数据集中的每一点云数据帧确定的各物体对应的真值信息,其中,所述真值信息至少包括所对应物体的标注位姿信息以及物体运动信息,每一评测数据包括存在对应关系的点云数据帧和图像帧;Obtain the true value information corresponding to each object determined based on each point cloud data frame in the evaluation data set, wherein the true value information at least includes the labeled pose information and object motion information of the corresponding object, and each evaluation data includes the existence of corresponding Relational point cloud data frames and image frames;
    获得基于预设视觉检测算法以及所述评测数据集中每一图像帧检测出的各检测物体对应的检测信息,其中,所述检测信息至少包括所对应检测物体的检测位姿信息以及检测运动信息;Obtaining detection information corresponding to each detection object detected based on a preset visual detection algorithm and each image frame in the evaluation data set, wherein the detection information at least includes detection pose information and detection motion information of the corresponding detection object;
    基于预设结果准确性评测规则、预设算法稳定性评测规则、各物体对应的真值信息中的标注位姿信息和物体运动信息以及各检测物体对应的检测信息中的检测位姿信息和检测运动信息,确定所述预设视觉感知算法对应的评测信息,其中,所述评测信息包括:表征所述预设视觉感知算法的检测结果准确性的第一评测信息以及算法稳定性的第二评测信息。Based on preset result accuracy evaluation rules, preset algorithm stability evaluation rules, labeled pose information and object motion information in the ground truth information corresponding to each object, and detection pose information and detection information in the detection information corresponding to each detected object motion information, and determine the evaluation information corresponding to the preset visual perception algorithm, wherein the evaluation information includes: first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm and second evaluation of the stability of the algorithm information.
  2. 如权利要求1所述的方法,其特征在于,所述获得基于评测数据集中的每一点云数据帧确定的各物体对应的真值信息的过程,包括:The method according to claim 1, wherein the process of obtaining the truth value information corresponding to each object determined based on each point cloud data frame in the evaluation data set comprises:
    获得评测数据集;Obtain the evaluation data set;
    基于预先训练的三维数据感知模型对所述评测数据集中的每一点云数据帧进行标注,标注出每一点云数据帧对应的各物体的标注框信息,以确定每一点云数据帧所对应各物体的标注位置信息和标注姿态信息,得到每一点云数据帧所对应各物体的标注位姿信息;Label each point cloud data frame in the evaluation data set based on the pre-trained 3D data perception model, and mark the label frame information of each object corresponding to each point cloud data frame, so as to determine each object corresponding to each point cloud data frame The labeled position information and labeled pose information of each point cloud data frame are obtained, and the labeled pose information of each object corresponding to each point cloud data frame is obtained;
    基于所述评测数据集中每一点云数据帧所对应各物体的标注位置信息和标注姿态信息,以及所述评测数据集中每一点云数据帧之间的时序信息,确定各点云数据帧所对应各物体的标注速度信息以及标注加速度信息,以得到每一点云数据帧所对应各物体的物体运动信息,得到每一点云数据帧所对应各物体对应的真值信息。Based on the labeling position information and labeling attitude information of each object corresponding to each point cloud data frame in the evaluation data set, and the time sequence information between each point cloud data frame in the evaluation data set, determine each point cloud data frame corresponding to each object. The marked velocity information and the marked acceleration information of the object are used to obtain the object motion information of each object corresponding to each point cloud data frame, and the true value information corresponding to each object corresponding to each point cloud data frame.
  3. 如权利要求1所述的方法,其特征在于,所述获得基于预设视觉感知算法以及所述评测数据集中每一图像帧检测出的各物体对应的检测信息的步骤,包括:The method of claim 1, wherein the step of obtaining detection information corresponding to each object detected based on a preset visual perception algorithm and each image frame in the evaluation data set comprises:
    基于预设视觉感知算法,对所述评测数据集中每一图像帧进行检测,得到每一图像帧对应的所检测出的各物体所对应检测框信息,以确定每一图像帧对应的所检测出的各检测物体的检测位置信息和检测姿态信息,得到每一图像帧所对应各检测物体的检测位姿信息;Based on a preset visual perception algorithm, each image frame in the evaluation data set is detected, and the detection frame information corresponding to each detected object corresponding to each image frame is obtained, so as to determine the detected detected object corresponding to each image frame. The detection position information and detection posture information of each detection object are obtained, and the detection posture information of each detection object corresponding to each image frame is obtained;
    基于预设视觉感知算法以及每一图像帧所对应各检测物体的检测位置信息和检测姿 态信息,确定出每一图像帧所对应各检测物体的检测速度信息和检测加速度信息,以得到每一图像帧所对应各检测物体的检测运动信息,得到每一图像帧所对应各检测物体对应的检测信息。Based on the preset visual perception algorithm and the detection position information and detection attitude information of each detection object corresponding to each image frame, the detection speed information and detection acceleration information of each detection object corresponding to each image frame are determined to obtain each image frame. The detection motion information of each detection object corresponding to the frame is obtained, and the detection information corresponding to each detection object corresponding to each image frame is obtained.
  4. 如权利要求1所述的方法,其特征在于,所述真值信息包括每一点云数据帧所对应各物体的标注框信息,所述检测信息包括每一图像帧对应的所检测出的各检测物体的检测框信息,所述检测框信息包括所对应检测物体在图像帧中的二维位置信息;The method according to claim 1, wherein the ground-truth information includes frame information of each object corresponding to each point cloud data frame, and the detection information includes each detected detection information corresponding to each image frame The detection frame information of the object, the detection frame information includes the two-dimensional position information of the corresponding detected object in the image frame;
    所述基于预设结果准确性评测规则、预设算法稳定性评测规则、各物体对应的真值信息中的标注位姿信息和物体运动信息以及各检测物体对应的检测信息中的检测位姿信息和检测运动信息,确定所述预设视觉感知算法对应的评测信息的步骤,包括:The accuracy evaluation rules based on the preset results, the preset algorithm stability evaluation rules, the labeled pose information and object motion information in the true value information corresponding to each object, and the detected pose information in the detection information corresponding to each detected object and detecting motion information, the steps of determining the evaluation information corresponding to the preset visual perception algorithm, including:
    针对每一点云数据帧对应的各物体,基于该物体对应的标注框信息、点云数据帧采集设备与图像帧采集设备之间的位置转换关系以及所述图像帧采集设备的内参信息,确定该物体对应的标注框投影至该点云数据帧对应的图像帧中的投影框的投影位置信息,作为该物体对应的投影框位置信息;For each object corresponding to each point cloud data frame, based on the corresponding annotation frame information of the object, the position conversion relationship between the point cloud data frame acquisition device and the image frame acquisition device, and the internal parameter information of the image frame acquisition device, determine the Projection position information of the annotation frame corresponding to the object projected to the projection frame in the image frame corresponding to the point cloud data frame, as the position information of the projection frame corresponding to the object;
    针对每一点云数据帧对应的各物体,基于各物体对应的投影框位置信息和该点云数据帧对应的图像帧中各检测物体的二维位置信息,确定相匹配的投影框位置信息和二维位置信息,以确定每一点云数据帧及其对应的图像帧所对应相匹配的真值信息及检测信息,其中,相匹配的投影框位置信息和二维位置信息为:所对应框的交并比数值超过预设交并比阈值的投影框位置信息和二维位置信息;For each object corresponding to each point cloud data frame, based on the position information of the projection frame corresponding to each object and the two-dimensional position information of each detected object in the image frame corresponding to the point cloud data frame, the matching position information of the projection frame and the two-dimensional position information of the detected object are determined. dimensional position information to determine the matching truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, wherein the matching projection frame position information and two-dimensional position information are: the intersection of the corresponding frame The position information and two-dimensional position information of the projection frame whose sum ratio value exceeds the preset intersection ratio threshold;
    基于预设结果准确性评测规则、每一点云数据帧及其对应的图像帧所对应相匹配的真值信息及检测信息、未匹配到真值信息的检测信息以及未匹配到检测信息的真值信息,确定表征所述预设视觉感知算法的检测结果准确性的第一评测信息;Based on the preset result accuracy evaluation rules, the matched ground truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, the detection information that does not match the true value information, and the true value that does not match the detection information information, and determine the first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm;
    基于预设算法稳定性评测规则、每一点云数据帧及其对应的图像帧所对应相匹配的真值信息及检测信息,确定表征所述预设视觉感知算法的算法稳定性的第二评测信息。Based on the preset algorithm stability evaluation rules, the matched truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, determine the second evaluation information representing the algorithm stability of the preset visual perception algorithm .
  5. 如权利要求4所述的方法,其特征在于,所述检测位姿信息包括:每一图像帧对应的所检测出的各检测物体的通过其检测框信息所确定的检测位置信息和检测姿态信息;所述检测运动信息包括:每一图像帧对应的所检测出的各检测物体的检测速度信息和检测加速度信息;所述标注位姿信息包括:每一点云数据帧所对应各物体的通过其标注框信息所确定的标注位置信息和标注姿态信息;所述物体运动信息包括:每一点云数据帧所对应各物体的标注速度信息以及标注加速度信息;The method according to claim 4, wherein the detection pose information comprises: detection position information and detection attitude information of each detected object corresponding to each image frame and determined by its detection frame information ; The detection motion information includes: the detection speed information and the detection acceleration information of the detected objects corresponding to each image frame; the marked pose information includes: the passing of each object corresponding to each point cloud data frame through its The labeling position information and labeling attitude information determined by the labeling frame information; the object motion information includes: labeling speed information and labeling acceleration information of each object corresponding to each point cloud data frame;
    所述基于预设结果准确性评测规则、每一点云数据帧及其对应的图像帧所对应相匹配的真值信息及检测信息、未匹配到真值信息的检测信息以及未匹配到检测信息的真值 信息,确定表征所述预设视觉感知算法的检测结果准确性的第一评测信息的步骤,包括:The accuracy evaluation rules based on the preset results, the matching truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, the detection information that does not match the truth information, and the detection information that does not match the detection information. True value information, the step of determining the first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm, including:
    基于每一点云数据帧及其对应的图像帧所对应相匹配的真值信息及检测信息、未匹配到真值信息的检测信息以及未匹配到检测信息的真值信息,确定所述预设视觉感知算法对应的检测结果的精确率信息和召回率信息;Based on the matched ground truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, the detected information not matched to the ground truth information, and the ground truth information not matched with the detected information, the preset visual information is determined. The precision information and recall rate information of the detection results corresponding to the perception algorithm;
    基于每一点云数据帧及其对应的图像帧所对应相匹配的真值信息包括的标注位置信息及检测信息包括的检测位置信息,确定相匹配的真值信息及检测信息之间的检测位置误差值;Based on the annotation position information included in the matched ground truth information corresponding to each point cloud data frame and its corresponding image frame, and the detection position information included in the detection information, determine the detection position error between the matched ground truth information and the detection information value;
    基于每一点云数据帧及其对应的图像帧所对应相匹配的真值信息包括的标注姿态信息及检测信息包括的检测姿态信息,确定相匹配的真值信息及检测信息之间的检测姿态误差值;Based on the annotation attitude information included in the matched ground truth information corresponding to each point cloud data frame and its corresponding image frame, and the detection attitude information included in the detection information, the detection attitude error between the matched ground truth information and the detection information is determined. value;
    基于每一点云数据帧及其对应的图像帧所对应相匹配的真值信息包括的标注速度信息及检测信息包括的检测速度信息,确定相匹配的真值信息及检测信息之间的检测速度误差值;Based on the annotation velocity information included in the matched ground truth information corresponding to each point cloud data frame and its corresponding image frame, and the detection velocity information included in the detection information, the detection velocity error between the matched ground truth information and the detection information is determined. value;
    基于每一点云数据帧及其对应的图像帧所对应相匹配的真值信息包括的标注加速度信息及检测信息包括的检测加速度信息,确定相匹配的真值信息及检测信息之间的检测加速度误差值;Based on the labeled acceleration information included in the matched true value information corresponding to each point cloud data frame and its corresponding image frame, and the detected acceleration information included in the detected information, determine the detected acceleration error between the matched true value information and the detected information. value;
    基于每一点云数据帧及其对应的图像帧所对应相匹配的真值信息包括的标注框信息及检测信息包括的检测框信息,确定相匹配的真值信息及检测信息之间的检测框的长宽误差值;Based on the annotation frame information included in the matched ground truth information corresponding to each point cloud data frame and its corresponding image frame, and the detection frame information included in the detection information, determine the difference between the matching ground truth information and the detection frame between the detection information. Length and width error value;
    基于相匹配的真值信息及检测信息之间的目标误差值以及该目标误差值对应的预设误差阈值,绘制所述目标误差值对应的误差曲线,其中,所述误差曲线的横轴为预设误差阈值,所述误差曲线的纵轴为所述目标误差值中小于各预设误差阈值的目标误差值的数量占评测数据集的数据总量的比值,所述目标误差值为:检测位置误差值、检测姿态误差值、检测速度误差值、检测加速度误差值或检测框的长宽误差值;Based on the target error value between the matched true value information and detection information and the preset error threshold corresponding to the target error value, an error curve corresponding to the target error value is drawn, wherein the horizontal axis of the error curve is the prediction value. Set the error threshold, the vertical axis of the error curve is the ratio of the number of target error values smaller than each preset error threshold in the target error value to the total amount of data in the evaluation data set, and the target error value is: the detection position Error value, detection attitude error value, detection speed error value, detection acceleration error value or length and width error value of detection frame;
    按数值的大小,对相匹配的真值信息及检测信息之间的目标误差值进行排序,得到所述目标误差值对应的排序序列;确定所述目标误差值对应的排序序列中前第一百分比的目标误差值中数值最大的第一目标误差值,以及前第二百分比的目标误差值中数值最大的第二目标误差值;Sort the target error values between the matched true value information and detection information according to the size of the value to obtain a sorting sequence corresponding to the target error value; determine the top one hundred in the sorting sequence corresponding to the target error value The first target error value with the largest numerical value among the target error values of the fractional ratio, and the second target error value with the largest numerical value among the target error values of the first second percentage;
    基于所述预设视觉感知算法对应的检测结果的精确率信息和召回率信息、所述目标误差信息对应的误差曲线、第一目标误差值及第二目标误差值和/或所述目标误差值,确定表征所述预设视觉感知算法的检测结果准确性的第一评测信息。The precision rate information and recall rate information based on the detection result corresponding to the preset visual perception algorithm, the error curve corresponding to the target error information, the first target error value and the second target error value and/or the target error value , and determine the first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm.
  6. 如权利要求5所述的方法,其特征在于,所述基于预设算法稳定性评测规则、每一点云数据帧及其对应的图像帧所对应相匹配的真值信息及检测信息,确定表征所述预设视觉感知算法的算法稳定性的第二评测信息的步骤,包括:The method according to claim 5, characterized in that, based on a preset algorithm stability evaluation rule, the matched truth information and detection information corresponding to each point cloud data frame and its corresponding image frame, it is determined to characterize the The steps of presetting the second evaluation information of the algorithm stability of the visual perception algorithm include:
    基于所述评测数据集中的点云数据帧或图像帧之间的时序信息,从所述目标误差值中确定出对应同一物体对应的目标误差值;Determine the target error value corresponding to the same object from the target error value based on the time sequence information between point cloud data frames or image frames in the evaluation data set;
    针对不同物体对应的目标误差值,基于所述评测数据集中的点云数据帧或图像帧之间的时序信息,该物体对应的目标误差值以及预设曲线拟合算法,拟合得到该物体对应的目标误差值对应的拟合误差曲线,其中,该拟合误差曲线包括:拟合所得的该物体在各点云数据帧或图像帧所对应采集时刻对应的拟合误差;For the target error values corresponding to different objects, based on the time series information between point cloud data frames or image frames in the evaluation data set, the target error value corresponding to the object and the preset curve fitting algorithm, the corresponding object is obtained by fitting. The fitting error curve corresponding to the target error value of , wherein the fitting error curve includes: the fitting error corresponding to the object obtained by fitting at the acquisition moment corresponding to each point cloud data frame or image frame;
    针对不同物体对应的目标误差值,基于该物体对应的目标误差值,以及该物体对应的目标误差值所对应拟合误差曲线所包含该物体在各点云数据帧或图像帧所对应采集时刻对应的拟合误差,确定表征所述预设视觉感知算法的算法稳定性的第二评测信息。For the target error value corresponding to different objects, based on the target error value corresponding to the object and the fitting error curve corresponding to the target error value corresponding to the object, the object corresponds to the acquisition time corresponding to each point cloud data frame or image frame. The fitting error is determined, and the second evaluation information representing the algorithm stability of the preset visual perception algorithm is determined.
  7. 如权利要求6所述的方法,其特征在于,所述针对不同物体对应的目标误差值,基于该物体对应的目标误差值,以及该物体对应的目标误差值所对应拟合误差曲线所包含该物体在各点云数据帧或图像帧所对应采集时刻对应的拟合误差,确定表征所述预设视觉感知算法的算法稳定性的第二评测信息的步骤,包括:6. The method of claim 6, wherein the target error value corresponding to different objects is based on the target error value corresponding to the object, and the fitting error curve corresponding to the target error value corresponding to the object includes the The steps of determining the second evaluation information representing the algorithm stability of the preset visual perception algorithm according to the fitting error of the object at the time of acquisition corresponding to each point cloud data frame or image frame, including:
    针对不同物体对应的目标误差值,基于该物体对应的目标误差值,以及该物体对应的目标误差值所对应拟合误差曲线所包含该物体在各点云数据帧或图像帧所对应采集时刻对应的拟合误差,计算对应同一采集时刻的目标误差与拟合误差的差值;For the target error value corresponding to different objects, based on the target error value corresponding to the object and the fitting error curve corresponding to the target error value corresponding to the object, the object corresponds to the acquisition time corresponding to each point cloud data frame or image frame. The fitting error is calculated, and the difference between the target error and the fitting error corresponding to the same acquisition moment is calculated;
    针对不同物体对应的目标误差值,基于该物体对应的各采集时刻的目标误差与拟合误差的差值,以及该物体所对应目标误差值对应的预设差值阈值,绘制该物体所对应目标误差对应的差值误差曲线,其中,该物体所对应目标误差对应的差值误差曲线的横轴为该物体所对应目标误差值对应的预设差值阈值,该物体所对应目标误差对应的差值误差曲线的纵轴为该物体所对应目标误差值对应的差值小于各预设差值阈值的数量、占该物体所对应目标误差值总数量的比值;For the target error values corresponding to different objects, the target corresponding to the object is drawn based on the difference between the target error and the fitting error at each acquisition time corresponding to the object, and the preset difference threshold corresponding to the target error value corresponding to the object. The difference error curve corresponding to the error, wherein the horizontal axis of the difference error curve corresponding to the target error corresponding to the object is the preset difference threshold corresponding to the target error value corresponding to the object, and the difference value corresponding to the target error corresponding to the object The vertical axis of the value error curve is the ratio of the number of differences corresponding to the target error values corresponding to the object that are smaller than the preset difference thresholds, accounting for the total number of target error values corresponding to the object;
    针对不同物体对应的目标误差值,按数值的大小,对该物体所对应目标误差值对应的差值进行排序,确定排序序列中前第三百分比的差值中数值最大的第一差值,以及前第四百分比的差值中数值最大的第二差值;For the target error values corresponding to different objects, sort the difference values corresponding to the target error values corresponding to the objects according to the size of the value, and determine the first difference value with the largest value among the first third percentage differences in the sorting sequence. , and the second difference with the largest value among the differences of the first fourth percentage;
    基于各物体所对应目标误差值对应的所述第一差值和所述第二差值,和/或各物体所对应目标误差值对应的差值误差曲线,确定表征所述预设视觉感知算法的算法稳定性的第二评测信息。Based on the first difference value and the second difference value corresponding to the target error value corresponding to each object, and/or the difference value error curve corresponding to the target error value corresponding to each object, it is determined to characterize the preset visual perception algorithm The second evaluation information of the algorithm stability.
  8. 如权利要求1-7任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1-7, wherein the method further comprises:
    显示所述第一评测信息、所述第二评测信息、所述预设视觉感知算法对应的检测结果的精确率信息和召回率信息、所述目标误差信息对应的误差曲线、所述第一目标误差值及所述第二目标误差值、所述目标误差值、所确定出的同一物体对应的目标误差值和/或拟合所得的每一物体对应的目标误差值对应的拟合误差曲线。Display the first evaluation information, the second evaluation information, the precision rate information and recall rate information of the detection result corresponding to the preset visual perception algorithm, the error curve corresponding to the target error information, and the first target The error value and the second target error value, the target error value, the determined target error value corresponding to the same object and/or the fitting error curve corresponding to the target error value corresponding to each object obtained by fitting.
  9. 一种视觉感知算法的评测装置,其特征在于,所述装置包括:A visual perception algorithm evaluation device, characterized in that the device comprises:
    第一获得模块,被配置为获得基于评测数据集中的每一点云数据帧确定的各物体对应的真值信息,其中,所述真值信息至少包括所对应物体的标注位姿信息以及物体运动信息,每一评测数据包括存在对应关系的点云数据帧和图像帧;The first obtaining module is configured to obtain the truth value information corresponding to each object determined based on each point cloud data frame in the evaluation data set, wherein the truth value information at least includes the marked pose information and object motion information of the corresponding object , and each evaluation data includes a point cloud data frame and an image frame that have a corresponding relationship;
    第二获得模块,被配置为获得基于预设视觉检测算法以及所述评测数据集中每一图像帧检测出的各检测物体对应的检测信息,其中,所述检测信息至少包括所对应检测物体的检测位姿信息以及检测运动信息;The second obtaining module is configured to obtain the detection information corresponding to each detection object detected based on the preset visual detection algorithm and each image frame in the evaluation data set, wherein the detection information at least includes the detection of the corresponding detection object Pose information and detection motion information;
    确定模块,被配置为基于预设结果准确性评测规则、预设算法稳定性评测规则、各物体对应的真值信息中的标注位姿信息和物体运动信息以及各检测物体对应的检测信息中的检测位姿信息和检测运动信息,确定所述预设视觉感知算法对应的评测信息,其中,所述评测信息包括:表征所述预设视觉感知算法的检测结果准确性的第一评测信息以及算法稳定性的第二评测信息。The determination module is configured to be based on the preset result accuracy evaluation rules, the preset algorithm stability evaluation rules, the marked pose information and object motion information in the true value information corresponding to each object, and the detection information corresponding to each detected object. Detecting pose information and detecting motion information, and determining evaluation information corresponding to the preset visual perception algorithm, wherein the evaluation information includes: first evaluation information representing the accuracy of the detection result of the preset visual perception algorithm and an algorithm Second evaluation information for stability.
  10. 如权利要求9所述的装置,其特征在于,所述第一获得模块,被具体配置为The apparatus of claim 9, wherein the first obtaining module is specifically configured to
    获得评测数据集;Obtain the evaluation data set;
    基于预先训练的三维数据感知模型对所述评测数据集中的每一点云数据帧进行标注,标注出每一点云数据帧对应的各物体的标注框信息,以确定每一点云数据帧所对应各物体的标注位置信息和标注姿态信息,得到每一点云数据帧所对应各物体的标注位姿信息;Label each point cloud data frame in the evaluation data set based on the pre-trained 3D data perception model, and mark the label frame information of each object corresponding to each point cloud data frame, so as to determine each object corresponding to each point cloud data frame The labeled position information and labeled pose information of each point cloud data frame are obtained, and the labeled pose information of each object corresponding to each point cloud data frame is obtained;
    基于所述评测数据集中每一点云数据帧所对应各物体的标注位置信息和标注姿态信息,以及所述评测数据集中每一点云数据帧之间的时序信息,确定各点云数据帧所对应各物体的标注速度信息以及标注加速度信息,以得到每一点云数据帧所对应各物体的物体运动信息,得到每一点云数据帧所对应各物体对应的真值信息。Based on the labeling position information and labeling attitude information of each object corresponding to each point cloud data frame in the evaluation data set, and the time sequence information between each point cloud data frame in the evaluation data set, determine each point cloud data frame corresponding to each object. The marked velocity information and the marked acceleration information of the object are used to obtain the object motion information of each object corresponding to each point cloud data frame, and the true value information corresponding to each object corresponding to each point cloud data frame.
PCT/CN2021/109529 2020-08-20 2021-07-30 Visual perception algorithm evaluation method and device WO2022037387A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010841528.7A CN114170448A (en) 2020-08-20 2020-08-20 Evaluation method and device for visual perception algorithm
CN202010841528.7 2020-08-20

Publications (1)

Publication Number Publication Date
WO2022037387A1 true WO2022037387A1 (en) 2022-02-24

Family

ID=80323391

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/109529 WO2022037387A1 (en) 2020-08-20 2021-07-30 Visual perception algorithm evaluation method and device

Country Status (2)

Country Link
CN (1) CN114170448A (en)
WO (1) WO2022037387A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115147796A (en) * 2022-07-14 2022-10-04 小米汽车科技有限公司 Method and device for evaluating target recognition algorithm, storage medium and vehicle
CN115311885A (en) * 2022-07-29 2022-11-08 上海商汤临港智能科技有限公司 Evaluation method, evaluation system, electronic device and storage medium

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114866762A (en) * 2022-03-15 2022-08-05 中国第一汽车股份有限公司 Visual detection method, device and system of camera sensor
CN114882550A (en) * 2022-04-14 2022-08-09 支付宝(杭州)信息技术有限公司 Method, device and equipment for registering and leaving human face
CN116614621B (en) * 2023-07-17 2023-10-10 中汽智联技术有限公司 Method, device and storage medium for testing in-camera perception algorithm
CN117611788B (en) * 2024-01-19 2024-04-19 福思(杭州)智能科技有限公司 Dynamic truth value data correction method and device and storage medium
CN117784162A (en) * 2024-02-26 2024-03-29 安徽蔚来智驾科技有限公司 Target annotation data acquisition method, target tracking method, intelligent device and medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108765563A (en) * 2018-05-31 2018-11-06 北京百度网讯科技有限公司 Processing method, device and the equipment of SLAM algorithms based on AR
US20190005338A1 (en) * 2016-02-23 2019-01-03 Genki WATANABE Image processing apparatus, imaging apparatus, mobile device control system, and recording medium
CN110287832A (en) * 2019-06-13 2019-09-27 北京百度网讯科技有限公司 High-Speed Automatic Driving Scene barrier perception evaluating method and device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190005338A1 (en) * 2016-02-23 2019-01-03 Genki WATANABE Image processing apparatus, imaging apparatus, mobile device control system, and recording medium
CN108765563A (en) * 2018-05-31 2018-11-06 北京百度网讯科技有限公司 Processing method, device and the equipment of SLAM algorithms based on AR
CN110287832A (en) * 2019-06-13 2019-09-27 北京百度网讯科技有限公司 High-Speed Automatic Driving Scene barrier perception evaluating method and device

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115147796A (en) * 2022-07-14 2022-10-04 小米汽车科技有限公司 Method and device for evaluating target recognition algorithm, storage medium and vehicle
CN115311885A (en) * 2022-07-29 2022-11-08 上海商汤临港智能科技有限公司 Evaluation method, evaluation system, electronic device and storage medium
CN115311885B (en) * 2022-07-29 2024-04-12 上海商汤临港智能科技有限公司 Evaluation method, system, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN114170448A (en) 2022-03-11

Similar Documents

Publication Publication Date Title
WO2022037387A1 (en) Visual perception algorithm evaluation method and device
CN113450408B (en) Irregular object pose estimation method and device based on depth camera
CN107066953A (en) It is a kind of towards the vehicle cab recognition of monitor video, tracking and antidote and device
CN110232379A (en) A kind of vehicle attitude detection method and system
CN114359181B (en) Intelligent traffic target fusion detection method and system based on image and point cloud
CN110490099B (en) Subway public place pedestrian flow analysis method based on machine vision
CN108388871B (en) Vehicle detection method based on vehicle body regression
CN110751012B (en) Target detection evaluation method and device, electronic equipment and storage medium
CN112215154B (en) Mask-based model evaluation method applied to face detection system
WO2020237939A1 (en) Method and apparatus for constructing eyelid curve of human eye
CN103679214A (en) Vehicle detection method based on online area estimation and multi-feature decision fusion
CN113822247A (en) Method and system for identifying illegal building based on aerial image
CN114758504A (en) Online vehicle overspeed early warning method and system based on filtering correction
CN112699748B (en) Human-vehicle distance estimation method based on YOLO and RGB image
CN114359865A (en) Obstacle detection method and related device
CN112991769A (en) Traffic volume investigation method and device based on video
CN113643544B (en) Intelligent detection method and system for illegal parking in parking lot based on Internet of things
CN113362287B (en) Man-machine cooperative remote sensing image intelligent interpretation method
CN115457130A (en) Electric vehicle charging port detection and positioning method based on depth key point regression
Philipp et al. Automated 3d object reference generation for the evaluation of autonomous vehicle perception
Kim et al. Evaluation of feature-based vehicle trajectory extraction algorithms
CN114842285A (en) Roadside berth number identification method and device
CN110060339B (en) Three-dimensional modeling method based on cloud computing graphic image
Yang et al. Research on Target Detection Algorithm for Complex Scenes
TWI828368B (en) Method and system for detecting aircraft behavior on the tarmac

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21857482

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21857482

Country of ref document: EP

Kind code of ref document: A1

122 Ep: pct application non-entry in european phase

Ref document number: 21857482

Country of ref document: EP

Kind code of ref document: A1