CN116188954A

CN116188954A - Online evaluation method and system for target detection algorithm, electronic equipment and storage medium

Info

Publication number: CN116188954A
Application number: CN202211580984.6A
Authority: CN
Inventors: 王啸峰; 朱政; 叶云; 黄冠; 都大龙
Original assignee: Beijing Jianzhi Technology Co ltd
Current assignee: Beijing Jianzhi Technology Co ltd
Priority date: 2022-12-09
Filing date: 2022-12-09
Publication date: 2023-05-30

Abstract

The embodiment of the invention provides a visual 3D target detection algorithm online evaluation method and a visual 3D target detection algorithm online evaluation system, wherein the method comprises the following steps: acquiring a sample data set of a visual 3D target detection algorithm, wherein the sample data set comprises an original sample image which has a frequency of more than 2 Hz and carries labeling information; inputting a first original sample image of the sample data set into a visual 3D target detection algorithm to output a detection result; and carrying out index evaluation on the detection result and the labeling information of the second original sample image at the current moment in the sample data set. The frequency of the original sample image is higher than that of sample data in a data set nuScenes, so that the sample frequency of a visual 3D target detection algorithm is improved. Under the condition that the running time of the visual 3D target detection algorithm is stable, the sample frequency is increased, which is equivalent to reducing the displacement deviation of the target object between two sample images, and the index evaluation of the visual 3D target detection algorithm meets the requirement of real-time.

Description

Online evaluation method and system for target detection algorithm, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of computers, in particular to an online evaluation method of a visual 3D target detection algorithm, an online evaluation system of the visual 3D target detection algorithm, electronic equipment and a computer readable storage medium.

Background

In recent years, vision-centric perception algorithms have been actively developed in a variety of autonomous driving tasks, including 3-Dimensional (3D) detection, semantic map construction, motion prediction, and depth estimation. However, the latency of vision-centric perception algorithms is too high for practical deployment (e.g., the runtime of most camera-based 3D detectors is greater than 300 ms). In order to bridge the gap between ideal research and real world applications, it is necessary to quantify the trade-off between performance and efficiency.

However, existing camera-based 3D detection algorithm performance evaluation schemes mainly evaluate on the open source dataset nuScenes. And the data set nuScens only marks 2 Hz sample data, and the 2 Hz sample data cannot meet the real-time evaluation.

Disclosure of Invention

In view of the foregoing, embodiments of the present invention have been made to provide a visual 3D object detection algorithm online evaluation method and a corresponding visual 3D object detection algorithm online evaluation system that overcome or at least partially solve the foregoing problems.

In order to solve the problems, the embodiment of the invention discloses an online evaluation method of a visual 3D target detection algorithm, which comprises the following steps: acquiring a sample data set of a visual 3D target detection algorithm to be evaluated, wherein the sample data set comprises a plurality of original sample images with preset frequencies and carrying labeling information, and the preset frequencies are more than 2 Hz; inputting a first original sample image in the sample data set into the visual 3D target detection algorithm, and outputting a detection result; performing index evaluation on the detection result and the labeling information of the second original sample image in the sample data set; the second original sample image is the original sample image corresponding to the moment of outputting the detection result.

Optionally, the acquiring a sample dataset of the visual 3D object detection algorithm to be evaluated includes: acquiring a plurality of key frame sample images with the frequency of 2 Hz in the visual 3D target detection algorithm sample data set; expanding according to the annotation information of the key frame sample image to obtain the annotation information of the original sample image; and taking the original sample image and the labeling information of the original sample image as the sample data set.

Optionally, the expanding according to the annotation information of the key frame sample image to obtain the annotation information of the original sample image includes: performing interpolation processing on the labeling information of the key frame sample image to obtain an interpolation labeling data set of the original sample image; constructing a time sequence annotation data set of the original sample image according to the original sample point cloud and the key frame sample point cloud in the sample data set and a preset detector; and generating the annotation information of the original sample image according to the interpolation annotation data set and the time sequence annotation data set.

Optionally, the interpolating the labeling information of the key frame sample image to obtain an interpolated labeling data set of the original sample image includes: and when the target object in the key frame sample images simultaneously appears in two adjacent key frame sample images, generating the interpolation annotation data set according to the annotation information of the two adjacent key frame sample images.

Optionally, the constructing the time sequence labeling data set of the original sample image according to the original sample point cloud and the key frame sample point cloud in the sample data set and a preset detector includes: training the preset detector according to the key frame sample point cloud; detecting the original sample point cloud by using the trained preset detector to obtain a time sequence annotation data set of the original sample point cloud; and sampling from the time sequence labeling data set of the original sample point cloud to obtain the time sequence labeling data set of the original sample image.

Optionally, the generating the labeling information of the original sample image according to the interpolation labeling data set and the time sequence labeling data set includes: calculating the intersection ratio of the labeling information in the interpolation labeling data set and the labeling information in the time sequence labeling data set; and if the intersection ratio is smaller than the intersection ratio threshold value, merging the marking information corresponding to the intersection ratio in the time sequence marking data set into the interpolation marking data set to obtain the marking information of the original sample image.

Optionally, the method is applied to a test platform, and the method further comprises: acquiring a running time distribution diagram of the visual 3D target detection algorithm under a preset hardware resource of the test platform; sampling from the run-time profile a run-time of the visual 3D object detection algorithm; calculating the output time of the detection result of the visual 3D target detection algorithm for the target original sample image under the simulation hardware resource according to the running time and the input time of the target original sample image in the sample data set; acquiring adjacent original sample images adjacent to the detection result output time in the sample data set; and calculating the average precision mean value of the visual 3D target detection algorithm under the simulation hardware resources according to the detection result output time and the adjacent original sample images.

Optionally, the acquiring the running time distribution diagram of the visual 3D target detection algorithm under the preset hardware resource of the test platform includes: and under the condition that a target perception algorithm is operated under the preset hardware resources of the test platform, acquiring the operation time distribution diagram.

Optionally, the frequency of the key frame sample point cloud is 2 hz, the frequency of the original sample point cloud is 20 hz, the frequency of the original sample image is 12 hz, and the frequency of the key frame sample image is 2 hz.

The embodiment of the invention also discloses an online evaluation system of the visual 3D target detection algorithm, which comprises: the sample acquisition module is used for acquiring a sample data set of a visual 3D target detection algorithm to be evaluated, wherein the sample data set comprises a plurality of original sample images with preset frequencies and carrying labeling information, and the preset frequencies are more than 2 Hz; the input/output module is used for inputting a first original sample image in the sample data set into the visual 3D target detection algorithm and outputting a detection result; the index evaluation module is used for performing index evaluation on the detection result and the labeling information of the second original sample image in the sample data set; the second original sample image is the original sample image corresponding to the moment of outputting the detection result.

Optionally, the sample acquisition module includes: the key frame sample image acquisition module is used for acquiring a plurality of key frame sample images with the frequency of 2 Hz in the visual 3D target detection algorithm sample data set; the annotation information expansion module is used for expanding the annotation information of the original sample image according to the annotation information of the key frame sample image; and the sample data set determining module is used for taking the original sample image and the labeling information of the original sample image as the sample data set.

Optionally, the labeling information expansion module includes: the interpolation module is used for carrying out interpolation processing on the labeling information of the key frame sample image to obtain an interpolation labeling data set of the original sample image; the construction module is used for constructing a time sequence annotation data set of the original sample image according to the original sample point cloud and the key frame sample point cloud in the sample data set and a preset detector; and the generating module is used for generating the annotation information of the original sample image according to the interpolation annotation data set and the time sequence annotation data set.

Optionally, the interpolation module is configured to generate the interpolation annotation data set according to the annotation information of the two neighboring keyframe sample images when the target object in the keyframe sample images appears in the two neighboring keyframe sample images at the same time.

Optionally, the building module includes: the training module is used for training the preset detector according to the key frame sample point cloud; the detection module is used for detecting the original sample point cloud by using the trained preset detector to obtain a time sequence annotation data set of the original sample point cloud; and the sampling module is used for sampling from the time sequence annotation data set of the original sample point cloud to obtain the time sequence annotation data set of the original sample image.

Optionally, the generating module includes: the calculation module is used for calculating the intersection ratio of the annotation information in the interpolation annotation data set and the annotation information in the time sequence annotation data set; and the merging module is used for merging the marking information corresponding to the cross ratio in the time sequence marking data set into the interpolation marking data set if the cross ratio is smaller than a cross ratio threshold value, so as to obtain the marking information of the original sample image.

Optionally, the system is applied to a test platform, and the system further comprises: the distribution map acquisition module is used for acquiring a running time distribution map of the visual 3D target detection algorithm under the preset hardware resources of the test platform; a run-time sampling module, configured to sample from the run-time distribution map the run-time of the visual 3D object detection algorithm; the output time calculation module is used for calculating the output time of the detection result of the visual 3D target detection algorithm for the target original sample image under the simulation hardware resource according to the running time and the input time of the target original sample image in the sample data set; the adjacent image acquisition module is used for acquiring an adjacent original sample image adjacent to the detection result output time in the sample data set; and the average precision average value calculation module is used for calculating the average precision average value of the visual 3D target detection algorithm under the simulation hardware resource according to the detection result output time and the adjacent original sample images.

Optionally, the profile acquisition module is configured to acquire the running time profile under a condition that a target perception algorithm is run under a preset hardware resource of the test platform.

The embodiment of the invention also discloses an electronic device, which comprises: one or more processors; and one or more machine readable media having instructions stored thereon that, when executed by the one or more processors, cause the electronic device to perform the visual 3D object detection algorithm online assessment method as described above.

The embodiment of the invention also discloses a computer readable storage medium, and a stored computer program causes a processor to execute the visual 3D object detection algorithm online evaluation method.

The embodiment of the invention has the following advantages:

according to the visual 3D target detection algorithm online evaluation scheme provided by the embodiment of the invention, firstly, a sample data set of the visual 3D target detection algorithm to be evaluated can be obtained, the sample data set can comprise a plurality of original sample images with preset frequency, and the preset frequency is more than 2 Hz. Then, a first original sample image in the sample data set is input to a visual 3D target detection algorithm, and a detection result of the first original sample image is output. And performing index evaluation on the detection result of the first original sample image and the labeling information of the second original sample image in the sample data set. The second original sample image is an original sample image corresponding to the moment of outputting the detection result of the first original sample image. That is, index evaluation is performed on the detection result of the first original sample image and the annotation information of the second original sample image at the current time corresponding to the output detection result.

In the embodiment of the invention, the frequency of the original sample image of the sample data set is higher than that of the sample data in the data set nuScenes, namely, the sample frequency of the visual 3D target detection algorithm is improved. Under the condition that the running time of the visual 3D target detection algorithm is stable, the sample frequency is improved, which is equivalent to reducing the displacement deviation between the position of the target object in the first original sample image and the position of the target object in the second original sample image, so that the index evaluation of the visual 3D target detection algorithm meets the requirement of real-time.

Drawings

FIG. 1 is a schematic diagram of the relationship between the detection result of a visual 3D object detection algorithm and a frame image;

FIG. 2 is a flowchart illustrating steps of a visual 3D object detection algorithm online evaluation method according to an embodiment of the present invention;

FIG. 3 is a flowchart illustrating steps for generating annotation information for an original sample image according to an embodiment of the present invention;

FIG. 4 is a run time profile of an embodiment of the present invention;

FIG. 5 is a flow chart of a real-time performance evaluation scheme of a visual 3D object detection algorithm according to an embodiment of the present invention;

fig. 6 is a block diagram of a visual 3D object detection algorithm online evaluation system according to an embodiment of the present invention.

Detailed Description

In order that the above-recited objects, features and advantages of the present invention will become more readily apparent, a more particular description of the invention will be rendered by reference to the appended drawings and appended detailed description.

The visual 3D target detection algorithm in the embodiment of the invention can detect the target object in the data set nuScens, and can specifically detect and obtain the position information of the target object in the image. The location information may be a 3D frame. Since the target object may be moving in real time in the dataset nuScenes, and the visual 3D target detection algorithm needs a certain runtime to detect the 3D frame of the target object from a frame of image. Therefore, after the visual 3D object detection algorithm detects the object in one frame of image to obtain the 3D frame, the object has moved to another position, that is, the object has appeared in another frame of image of the dataset nuScenes. Referring to fig. 1, a schematic diagram of the relationship between the detection result of a visual 3D object detection algorithm and a frame image is shown. In fig. 1, frame image a (Frame a), frame image B (Frame B), and Frame image C (Frame C) are three continuous images of a target object (car) in an actual scene. The frame image a, the frame image B, and the frame image C are sequentially Input to a visual 3D object detection algorithm (Input time) as Input items to the visual 3D object detection algorithm. When the frame image a is input to the visual 3D object detection algorithm (Output time), the visual 3D object detection algorithm processes (processes a) the frame image a, and if the detection speed of the visual 3D object detection algorithm is not fast enough, when the target object moves to the position in the frame image B, even when the target object moves to the position in the frame image C, the visual 3D object detection algorithm outputs the position of the target object in the frame image a (Prediction of a, the 3D frame in the figure is the predicted position of the target object). At this time, the target object has appeared at a position in the frame image C. If the position output by the visual 3D object detection algorithm is evaluated in index with the actual position of the target object in the frame image C, it is obvious that the visual 3D object detection algorithm outputs a predicted position of the target object in the frame image a, i.e. the predicted position (Previous prediction) of the target object before the frame image C has far behind the actual position of the target object in the frame image C. Therefore, the visual 3D object detection algorithm is evaluated by adopting the dataset nuScenes, and the requirement of real-time cannot be met.

Referring to fig. 2, a flowchart of steps of a visual 3D object detection algorithm online evaluation method according to an embodiment of the present invention is shown. The visual 3D target detection algorithm online evaluation method specifically comprises the following steps:

step 201, a sample dataset of a visual 3D object detection algorithm to be evaluated is obtained.

In an embodiment of the present invention, the acquired sample dataset contains a plurality of original sample images with preset frequencies and carrying labeling information. Wherein the preset frequency is greater than 2 hz. The carried annotation information may be the position of the target object in the original sample image. The location may be a 3D frame or the like.

In an embodiment of the invention, a plurality of original sample images in a sample dataset may constitute a continuous video stream. The target object may appear in one or more original sample images of the video stream. Moreover, the position of the target object in the plurality of original sample images may also be varied. The embodiment of the invention does not particularly limit the number, format, size, capacity and the like of the original sample images in the sample data set. The type, the number, the size, the position of the target object in the original sample image and the like of the target object are not particularly limited.

Step 202, inputting a first original sample image in the sample data set into a visual 3D target detection algorithm, and outputting a detection result.

In an embodiment of the invention, the first raw sample image may be any raw sample image in the sample dataset. In general, the first raw sample image may contain a target object. The output detection result is the predicted position of the target object in the first original sample image.

And 203, performing index evaluation on the detection result and the labeling information of the second original sample image in the sample data set.

In the embodiment of the present invention, the second original sample image is an original sample image corresponding to the time when the detection result is output. That is, the second original sample image is the original sample image at the current time. For example, when the first original sample image is the image a and the detection result of the image a is output by the visual 3D target detection algorithm, and when the original sample image corresponding to the sample data set is the image c, performing index evaluation on the detection result of the image a and the labeling information of the image c.

In an exemplary embodiment of the present invention, one implementation of obtaining a sample dataset of a visual 3D target detection algorithm to be evaluated is to obtain a plurality of keyframe sample images with a frequency of 2 hz in the visual 3D target detection algorithm sample dataset; expanding according to the annotation information of the key frame sample image to obtain the annotation information of the original sample image; the original sample image and the labeling information of the original sample image are taken as a sample data set. In practical applications, the sample dataset may contain key frame sample images and raw sample images. The key frame sample images may be 2 hz in frequency and the original sample images may be 12 hz in frequency. Moreover, the key frame sample image also carries labeling information. However, the original sample image does not carry labeling information. Therefore, the embodiment of the invention expands to obtain the labeling information of the original sample image according to the labeling information of the key frame sample image.

In an exemplary embodiment of the present invention, one implementation manner of obtaining the labeling information of the original sample image according to the expansion of the labeling information of the key frame sample image is to perform interpolation processing on the labeling information of the key frame sample image to obtain an interpolation labeling data set of the original sample image. And constructing a time sequence labeling data set of the original sample image according to the original sample point cloud and the key frame sample point cloud in the sample data set and a preset detector. And generating the annotation information of the original sample image according to the interpolation annotation data set and the time sequence annotation data set.

In practical application, one implementation way to obtain the interpolation annotation data set of the original sample image by performing interpolation processing on the annotation information of the key frame sample image is that when the object pair in the key frame sample imageWhen the interpolation annotation data set appears in two adjacent key frame sample images at the same time, the interpolation annotation data set is generated according to the annotation information of the two adjacent key frame sample images. For example, the target object is simultaneously present at t _e Key frame image neutralization t corresponding to moment _s And in the key frame image corresponding to the moment. Also, t _e Key frame image and t corresponding to moment _s The key frame images corresponding to the moments are two adjacent key frame images. The labeling information of two adjacent key frame sample images can be pose information (3-dimensional position and 3-dimensional rotation angle) of the target object. And generating pose information of the target object at the moment t according to the pose information of the two adjacent key frame sample images. Wherein t is _e <t<t _s 。

Wherein Tr (t) represents the 3-dimensional position of the target object at time t, tr (t) _s ) Representing t _s 3-dimensional position of time target object, tr (t _e ) Representing t _e 3-dimensional position of the time target object. R (t) represents the 3-dimensional rotation angle of the target object at the moment t, R (t) _s ) Representing t _s 3-dimensional rotation angle of the time target object, R (t _e ) Representing t _e 3-dimensional rotation angle of the time target object. Fs is a spherical linear interpolation equation. Tr (t) and R (t) are used as labeling information of a target object at the moment t.

By the interpolation mode, the interpolation annotation data set of the original sample image with the frequency of 12 Hz can be obtained by expanding the annotation information of the key frame sample image with the frequency of 2 Hz.

Theoretically, the annotation information of the 2 Hz key frame sample image can be expanded into the annotation information of the 12 Hz original sample image through the interpolation processing. However, when the target object does not exist in the two front and rear keyframe sample images of 2 hz at the same time, the target object cannot obtain the labeling information of the middle moment corresponding to the two front and rear keyframe sample images through interpolation processing. To address this problem, embodiments of the present invention construct a time-stamped dataset.

In practical application, one embodiment of constructing the time sequence labeling data set of the original sample image according to the original sample point cloud and the key frame sample point cloud in the sample data set and the preset detector is to train the preset detector according to the key frame sample point cloud. And detecting the original sample point cloud by using a trained preset detector to obtain a time sequence labeling data set of the original sample point cloud. And sampling from the time sequence labeling data set of the original sample point cloud to obtain the time sequence labeling data set of the original sample image. For example, the preset detector may employ a centrpoint (a 3D detector based on a point cloud). The sample dataset may also contain a keyframe sample point cloud with a frequency of 2 hertz and a raw sample point cloud with a frequency of 20 hertz. There are 10 frames of original sample point clouds between every two key frame sample images.

Referring to FIG. 3, a flowchart of steps for generating annotation information for an original sample image is shown, according to an embodiment of the present invention.

Step 301, training the center point on a keyframe sample point cloud.

Step 302, performing target object detection on the trained centrPoint on the original sample point cloud to obtain a time sequence annotation data set of the original sample point cloud.

The time sequence labeling data set of the original sample point cloud is a 20 Hz time sequence labeling data set. The time sequence labeling data set of the original sample point cloud comprises a plurality of moments and 3D frames corresponding to each moment. The time and the corresponding 3D frame can be understood as time sequence annotation data.

Step 303, sampling from the 20 hz time sequence labeling data set to obtain a 12 hz time sequence labeling data set of the original sample image.

The frequency of 20 hz is distributed over time in one second [0,1/20,2/20,3/20 ], 1], and the frequency of 12 hz is distributed over time in one second [0,1/12,2/12,3/12 ], 1]. In order to sample the time sequence labeling data set of 12 Hz from the time sequence labeling data set of 20 Hz, a method for taking the time sequence labeling data of the nearest moment can be adopted. For example, for 1/12 time in the 12 hz time distribution, the time point closest to the time point in the 20 hz time distribution is 2/20, and then the time marking data corresponding to the 2/20 time in the 20 hz time distribution is sampled as the time marking data corresponding to the 1/12 time in the 12 hz time distribution.

And step 304, removing redundant data in the time sequence marking data set and the interpolation marking data set according to the merging ratio, and finally merging to obtain marking information of the original sample image with the frequency of 12 Hz.

In practical application, one implementation mode of generating the labeling information of the original sample image according to the interpolation labeling data set and the time sequence labeling data set is that calculating the intersection ratio of the labeling information in the interpolation labeling data set and the labeling information in the time sequence labeling data set; if the cross-over ratio is smaller than the cross-over ratio threshold, merging the marking information corresponding to the cross-over ratio in the time sequence marking data set into the interpolation marking data set to obtain the marking information of the original sample image. For example, first, a 3D frame is selected from the 12 hz time series labeling data set, and the intersection ratio of the 3D frame and each 3D frame in the 12 hz interpolation labeling data set is calculated (Intersection over Union, ioU). If IoU is greater than or equal to the cross-over threshold (e.g., 0.7), then the 3D frame for which the cross-over is calculated is considered to occur in both the time series labeled dataset and the interpolation labeled dataset, and the 3D frame in the time series labeled dataset need not be merged into the interpolation labeled dataset. If IoU is less than 0.7, the 3D frame in the time series annotation data set is deemed not to appear in the interpolation annotation data set and the 3D frame is merged into the interpolation annotation data set. Equivalent to

In one exemplary embodiment of the present invention, a visual 3D object detection algorithm online evaluation scheme may be applied to a test platform. Specific hardware resources may be set for the test platform. For example, a graphics card (GPU) of a specified model is set for the test platform. And operating the visual 3D target detection algorithm under a specific hardware resource environment, and acquiring an operation time distribution diagram of the visual 3D target detection algorithm. The run time profile is shown in fig. 4. Wherein the abscissa represents the run time of the visual 3D object detection algorithm and the ordinate represents the probability of occurrence of the run time of the visual 3D object detection algorithm. In the actual performance evaluation process, the visual 3D target detection algorithm is not required to be evaluated in various real environments, and the performance evaluation can be performed in a simulation environment, namely, under simulation hardware resources. Specifically, the running time of the visual 3D target detection algorithm is obtained by random sampling from the running time distribution diagram, and then the detection result output time of the visual 3D target detection algorithm for the target original sample image under the simulation hardware resource is calculated according to the running time and the input time of the target original sample image in the sample data set. Acquiring adjacent original sample images adjacent to the output time of the detection result in a sample data set; and calculating an average precision mean value (mean Average Precision, mAP for short) of the visual 3D target detection algorithm under the simulation hardware resources according to the detection result output time and the adjacent original sample images. The method comprises the steps that the running time obtained through random sampling is assumed to be m, the input time corresponding to a target original sample image is assumed to be t, and the output time of a visual 3D target detection algorithm obtained through calculation aiming at a detection result of the target original sample image under simulation hardware resources is assumed to be m+t. Then the original sample image (i.e., the adjacent original sample image) at the time closest to m+t is acquired in the 12 hz sample dataset.

In an exemplary embodiment of the invention, the test platform may also be shared with other perception algorithms, with the performance of the visual 3D object detection algorithm being evaluated in real time with the test platform shared. That is, when acquiring the run time distribution diagram of the visual 3D target detection algorithm under the preset hardware resource of the test platform, the target perception algorithm (such as the classification algorithm, the 2D detection algorithm, the object segmentation algorithm, etc.) may be run simultaneously under the preset hardware resource. The running speed of the visual 3D target detection algorithm can be slowed down, at this time, the running time distribution diagram of the visual 3D target detection algorithm is obtained, and then random sampling is carried out on the running time distribution diagram, and real-time performance evaluation under simulation hardware resources is carried out.

Based on the above description about an embodiment of an online evaluation method for a visual 3D object detection algorithm, a real-time performance evaluation scheme for a visual 3D object detection algorithm is described below. Referring to fig. 5, a flow diagram of a real-time performance evaluation scheme of a visual 3D object detection algorithm according to an embodiment of the present invention is shown.

Step 501, inputting a set of looking-around images with a frequency of 12 hertz to a visual 3D object detection algorithm.

Step 502, setting hardware resources of a test platform where a visual 3D target detection algorithm is located.

Step 503, obtaining a detection result of the visual 3D target detection algorithm for a panoramic image at a certain moment.

And 504, performing index evaluation on the detection result and the annotation information of the looking-around image corresponding to the current moment.

Step 505, it is determined whether there is a next frame of looking around image.

If the next frame of looking around image does not exist, the scheme is ended; if there is a next frame of the look-around image, step 506 is performed.

Step 506, the next frame of looking-around image is input to the visual 3D object detection algorithm. The loop proceeds to step 503.

The embodiment of the invention provides a scheme for carrying out real-time evaluation on a visual 3D target detection algorithm, and compared with offline performance evaluation, the real-time evaluation is more suitable for the performance of the visual 3D target detection algorithm when the visual 3D target detection algorithm is applied in the real world.

According to the embodiment of the invention, the labeling information of the 2 Hz key frame sample image in the data set nuScens is expanded to the labeling information of the 12 Hz original sample image, so that the labeling information is provided for the sample image with higher frequency, and the sample data set more suitable for real-time evaluation is formed.

According to the embodiment of the invention, the performance evaluation of the visual 3D target detection algorithm is performed under the fixed hardware resource environment, and then random sampling is performed according to the running time distribution diagram, so that the performance evaluation of the visual 3D target detection algorithm is performed under the simulation hardware resource environment.

It should be noted that, for simplicity of description, the method embodiments are shown as a series of acts, but it should be understood by those skilled in the art that the embodiments are not limited by the order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred embodiments, and that the acts are not necessarily required by the embodiments of the invention.

Referring to fig. 6, a structural block diagram of a visual 3D object detection algorithm online evaluation system according to an embodiment of the present invention is shown, where the visual 3D object detection algorithm online evaluation system may specifically include the following modules.

The sample acquisition module 61 is configured to acquire a sample dataset of a visual 3D target detection algorithm to be evaluated, where the sample dataset includes a plurality of original sample images with preset frequencies and carrying labeling information, and the preset frequencies are greater than 2 hz;

an input-output module 62, configured to input a first original sample image in the sample dataset into the visual 3D target detection algorithm, and output a detection result;

An index evaluation module 63, configured to perform index evaluation on the detection result and the labeling information of the second original sample image in the sample dataset;

the second original sample image is the original sample image corresponding to the moment of outputting the detection result.

In an exemplary embodiment of the present invention, the sample acquisition module 61 includes:

the key frame sample image acquisition module is used for acquiring a plurality of key frame sample images with the frequency of 2 Hz in the visual 3D target detection algorithm sample data set;

the annotation information expansion module is used for expanding the annotation information of the original sample image according to the annotation information of the key frame sample image;

and the sample data set determining module is used for taking the original sample image and the labeling information of the original sample image as the sample data set.

In an exemplary embodiment of the present invention, the labeling information extension module includes:

the interpolation module is used for carrying out interpolation processing on the labeling information of the key frame sample image to obtain an interpolation labeling data set of the original sample image;

the construction module is used for constructing a time sequence annotation data set of the original sample image according to the original sample point cloud and the key frame sample point cloud in the sample data set and a preset detector;

And the generating module is used for generating the annotation information of the original sample image according to the interpolation annotation data set and the time sequence annotation data set.

In an exemplary embodiment of the present invention, the interpolation module is configured to generate the interpolation annotation data set according to annotation information of two neighboring keyframe sample images when a target object in the keyframe sample images appears in the two neighboring keyframe sample images at the same time.

In an exemplary embodiment of the invention, the building block comprises:

the training module is used for training the preset detector according to the key frame sample point cloud;

the detection module is used for detecting the original sample point cloud by using the trained preset detector to obtain a time sequence annotation data set of the original sample point cloud;

and the sampling module is used for sampling from the time sequence annotation data set of the original sample point cloud to obtain the time sequence annotation data set of the original sample image.

In an exemplary embodiment of the present invention, the generating module includes:

the calculation module is used for calculating the intersection ratio of the annotation information in the interpolation annotation data set and the annotation information in the time sequence annotation data set;

And the merging module is used for merging the marking information corresponding to the cross ratio in the time sequence marking data set into the interpolation marking data set if the cross ratio is smaller than a cross ratio threshold value, so as to obtain the marking information of the original sample image.

In an exemplary embodiment of the invention, the system is applied to a test platform, the system further comprising:

the distribution map acquisition module is used for acquiring a running time distribution map of the visual 3D target detection algorithm under the preset hardware resources of the test platform;

a run-time sampling module, configured to sample from the run-time distribution map the run-time of the visual 3D object detection algorithm;

the output time calculation module is used for calculating the output time of the detection result of the visual 3D target detection algorithm for the target original sample image under the simulation hardware resource according to the running time and the input time of the target original sample image in the sample data set;

the adjacent image acquisition module is used for acquiring an adjacent original sample image adjacent to the detection result output time in the sample data set;

and the average precision average value calculation module is used for calculating the average precision average value of the visual 3D target detection algorithm under the simulation hardware resource according to the detection result output time and the adjacent original sample images.

In an exemplary embodiment of the present invention, the profile obtaining module is configured to obtain the runtime profile when the target awareness algorithm is executed under a preset hardware resource of the test platform.

In an exemplary embodiment of the present invention, the frequency of the key frame sample point cloud is 2 hz, the frequency of the original sample point cloud is 20 hz, the frequency of the original sample image is 12 hz, and the frequency of the key frame sample image is 2 hz.

For system embodiments and apparatus embodiments, the description is relatively simple as it is substantially similar to method embodiments, with reference to the description of method embodiments in part.

In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described by differences from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

It will be apparent to those skilled in the art that embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the invention may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal device to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal device, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the invention.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.

The above detailed description of the visual 3D object detection algorithm online evaluation method and the visual 3D object detection algorithm online evaluation system provided by the invention applies specific examples to illustrate the principles and embodiments of the invention, and the above examples are only used for helping to understand the method and core ideas of the invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims

1. An online evaluation method of a visual 3D target detection algorithm, comprising:

acquiring a sample data set of a visual 3D target detection algorithm to be evaluated, wherein the sample data set comprises a plurality of original sample images with preset frequencies and carrying labeling information, and the preset frequencies are more than 2 Hz;

inputting a first original sample image in the sample data set into the visual 3D target detection algorithm, and outputting a detection result;

performing index evaluation on the detection result and the labeling information of the second original sample image in the sample data set;

2. The method of claim 1, wherein the obtaining a sample dataset of visual 3D object detection algorithms to be evaluated comprises:

acquiring a plurality of key frame sample images with the frequency of 2 Hz in the visual 3D target detection algorithm sample data set;

expanding according to the annotation information of the key frame sample image to obtain the annotation information of the original sample image;

and taking the original sample image and the labeling information of the original sample image as the sample data set.

3. The method according to claim 2, wherein the expanding the annotation information of the original sample image according to the annotation information of the key frame sample image comprises:

performing interpolation processing on the labeling information of the key frame sample image to obtain an interpolation labeling data set of the original sample image;

constructing a time sequence annotation data set of the original sample image according to the original sample point cloud and the key frame sample point cloud in the sample data set and a preset detector;

and generating the annotation information of the original sample image according to the interpolation annotation data set and the time sequence annotation data set.

4. A method according to claim 3, wherein interpolating the annotation information of the key frame sample image to obtain an interpolated annotation dataset of the original sample image comprises:

and when the target object in the key frame sample images simultaneously appears in two adjacent key frame sample images, generating the interpolation annotation data set according to the annotation information of the two adjacent key frame sample images.

5. A method according to claim 3, wherein said constructing a time-series annotation dataset of said original sample image from an original sample point cloud and a key frame sample point cloud in said sample dataset and a pre-set detector comprises:

training the preset detector according to the key frame sample point cloud;

detecting the original sample point cloud by using the trained preset detector to obtain a time sequence annotation data set of the original sample point cloud;

and sampling from the time sequence labeling data set of the original sample point cloud to obtain the time sequence labeling data set of the original sample image.

6. The method of claim 3, wherein generating annotation information for the original sample image from the interpolated annotation dataset and the time series annotation dataset comprises:

Calculating the intersection ratio of the labeling information in the interpolation labeling data set and the labeling information in the time sequence labeling data set;

and if the intersection ratio is smaller than the intersection ratio threshold value, merging the marking information corresponding to the intersection ratio in the time sequence marking data set into the interpolation marking data set to obtain the marking information of the original sample image.

7. The method of claim 1, wherein the method is applied to a test platform, the method further comprising:

acquiring a running time distribution diagram of the visual 3D target detection algorithm under a preset hardware resource of the test platform;

sampling from the run-time profile a run-time of the visual 3D object detection algorithm;

calculating the output time of the detection result of the visual 3D target detection algorithm for the target original sample image under the simulation hardware resource according to the running time and the input time of the target original sample image in the sample data set;

acquiring adjacent original sample images adjacent to the detection result output time in the sample data set;

and calculating the average precision mean value of the visual 3D target detection algorithm under the simulation hardware resources according to the detection result output time and the adjacent original sample images.

8. The method of claim 7, wherein the obtaining a runtime profile of the visual 3D object detection algorithm under a preset hardware resource of the test platform comprises:

and under the condition that a target perception algorithm is operated under the preset hardware resources of the test platform, acquiring the operation time distribution diagram.

9. A method according to claim 3, wherein the frequency of the key frame sample point cloud is 2 hz, the frequency of the original sample point cloud is 20 hz, the frequency of the original sample image is 12 hz, and the frequency of the key frame sample image is 2 hz.

10. A visual 3D object detection algorithm online assessment system, the system comprising:

the sample acquisition module is used for acquiring a sample data set of a visual 3D target detection algorithm to be evaluated, wherein the sample data set comprises a plurality of original sample images with preset frequencies and carrying labeling information, and the preset frequencies are more than 2 Hz;

the input/output module is used for inputting a first original sample image in the sample data set into the visual 3D target detection algorithm and outputting a detection result;

the index evaluation module is used for performing index evaluation on the detection result and the labeling information of the second original sample image in the sample data set;

11. An electronic device, comprising:

one or more processors; and

one or more machine readable media having instructions stored thereon, which when executed by the one or more processors, cause the electronic device to perform the visual 3D object detection algorithm online assessment method of any of claims 1 to 9.

12. A computer readable storage medium, characterized in that it stores a computer program for causing a processor to execute the visual 3D object detection algorithm online evaluation method according to any one of claims 1 to 9.