CN113808186B - Training data generation method and device and electronic equipment - Google Patents

Training data generation method and device and electronic equipment Download PDF

Info

Publication number
CN113808186B
CN113808186B CN202110238897.1A CN202110238897A CN113808186B CN 113808186 B CN113808186 B CN 113808186B CN 202110238897 A CN202110238897 A CN 202110238897A CN 113808186 B CN113808186 B CN 113808186B
Authority
CN
China
Prior art keywords
frame
dimensional
projection
labeling
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110238897.1A
Other languages
Chinese (zh)
Other versions
CN113808186A (en
Inventor
安耀祖
许新玉
孔旗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Jingdong Kunpeng Jiangsu Technology Co Ltd
Original Assignee
Jingdong Kunpeng Jiangsu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Jingdong Kunpeng Jiangsu Technology Co Ltd filed Critical Jingdong Kunpeng Jiangsu Technology Co Ltd
Priority to CN202110238897.1A priority Critical patent/CN113808186B/en
Publication of CN113808186A publication Critical patent/CN113808186A/en
Application granted granted Critical
Publication of CN113808186B publication Critical patent/CN113808186B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/521Depth or shape recovery from laser ranging, e.g. using interferometry; from the projection of structured light
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/40Software arrangements specially adapted for pattern recognition, e.g. user interfaces or toolboxes therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds

Abstract

The disclosure provides a training data generation method, a training data generation device and electronic equipment. The training data generation method comprises the following steps: acquiring target three-dimensional point cloud data corresponding to a target image, wherein the target three-dimensional point cloud data comprises three-dimensional annotation frames of a plurality of objects and annotation results corresponding to each three-dimensional annotation frame; acquiring a first three-dimensional annotation frame in the identification range of the target image; processing two or more first three-dimensional annotation frames of the same object to obtain a preset number of second three-dimensional annotation frames of the same object; and labeling the target image according to the labeling result of each second three-dimensional labeling frame to generate training data, wherein the training data are used for training the target detection model. The method and the device can automatically and efficiently generate the training data for training the monocular three-dimensional target detection model.

Description

Training data generation method and device and electronic equipment
Technical Field
The disclosure relates to the field of information technology, and in particular relates to a training data generation method, a training data generation device and electronic equipment for generating training data of a monocular three-dimensional target detection model.
Background
Monocular three-dimensional target detection (Monocular 3D Object Detection) is a technology for outputting information such as the type of a target object, the precise length, width, height, rotation angle and the like of the target object in a three-dimensional space only by utilizing an image or video sequence shot by a Monocular camera, and is widely applied to the fields of automatic driving systems of vehicles, intelligent robots, intelligent video monitoring, intelligent traffic and the like. Because only one visual sensor is needed, the sensor has the advantages of simple structure and simple camera calibration, and has the huge advantages of dense information and low cost compared with the three-dimensional target detection technology realized by the multi-line laser radar in the field of automatic driving system perception.
However, an excellent and stable monocular three-dimensional object detection model requires a large number of training data sets rich in scenes to train, and in the related art, training of the ability of the monocular three-dimensional object detection model to recognize three-dimensional information of a two-dimensional image is generally accomplished by constructing a three-dimensional model (e.g., CAD model) of an object, training data is limited, and the cost of generating training data and using the training data is high.
Therefore, a method capable of producing training data of a monocular three-dimensional object detection model in a large scale at low cost is required.
It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.
Disclosure of Invention
The present disclosure is directed to a training data generating method, apparatus and electronic device for generating training data of a monocular three-dimensional object detection model, which overcome, at least in part, the problem of scarcity of training data of a monocular three-dimensional object detection model due to limitations and drawbacks of the related art.
According to a first aspect of an embodiment of the present disclosure, there is provided a training data generating method, including: acquiring target three-dimensional point cloud data corresponding to a target image, wherein the target three-dimensional point cloud data comprises three-dimensional annotation frames of a plurality of objects and annotation results corresponding to each three-dimensional annotation frame; acquiring a first three-dimensional annotation frame in the identification range of the target image; processing two or more first three-dimensional annotation frames of the same object to obtain a preset number of second three-dimensional annotation frames of the same object; and labeling the target image according to the labeling result of each second three-dimensional labeling frame to generate training data, wherein the training data are used for training the target detection model.
In an exemplary embodiment of the present disclosure, the acquiring a first three-dimensional annotation frame within the recognition range of the target image includes: acquiring a projection frame of a three-dimensional annotation frame in the target three-dimensional point cloud data on a first projection plane and a central point coordinate of the projection frame, wherein the first projection plane is a shooting plane corresponding to the target image; if one of the center point coordinates is out of the display coordinate range of the target image, deleting the three-dimensional annotation frame corresponding to the center point coordinate; if the overlapping degree of the two projection frames is larger than a first preset value, deleting the three-dimensional center points in the two three-dimensional labeling frames corresponding to the two projection frames which are far away from the first projection surface; and determining the rest three-dimensional annotation frames as the first three-dimensional annotation frame.
In an exemplary embodiment of the present disclosure, the processing two or more first three-dimensional labeling frames of the same object to obtain a preset number of second three-dimensional labeling frames of the same object includes: acquiring a first projection frame and a second projection frame of two first three-dimensional labeling frames on a second projection plane, wherein the second projection plane is a shooting overlook plane corresponding to the target image; deleting a first three-dimensional labeling frame corresponding to the smaller area in the first projection frame and the second projection frame when the overlapping degree of the first projection frame and the second projection frame is larger than a second preset value and smaller than a third preset value; when the overlapping degree of the first projection frame and the second projection frame is larger than or equal to the third preset value, a third projection frame and a fourth projection frame of the two first three-dimensional labeling frames on a first projection surface are obtained, and the first projection surface is a shooting surface corresponding to the target image; and deleting the first three-dimensional annotation frame corresponding to the third projection frame and the fourth projection frame which meet the preset height condition.
In an exemplary embodiment of the present disclosure, the deleting the first three-dimensional labeling frame corresponding to the person meeting the preset height condition in the third projection frame and the fourth projection frame includes: determining a first center point height and a first height corresponding to the third projection frame, and determining a first vertex height of the third projection frame according to the first center point height and the first height; determining a second center point height and a second height corresponding to the fourth projection frame, and determining a second vertex height of the fourth projection frame according to the second center point height and the second height; deleting a first three-dimensional annotation frame corresponding to the third projection frame when the first center point height is larger than the second vertex height; and deleting the first three-dimensional annotation frame corresponding to the fourth projection frame when the height of the second center point is larger than that of the first vertex.
In an exemplary embodiment of the present disclosure, after the obtaining a preset number of second three-dimensional labeling frames of the same object, the method further includes: and when the labeling result of the second three-dimensional labeling frame is matched with a target object, updating the labeling result according to the size of the second three-dimensional labeling frame.
In an exemplary embodiment of the disclosure, the updating the labeling result according to the size of the second three-dimensional labeling frame includes: acquiring a target projection frame of the second three-dimensional annotation frame on a second projection surface, wherein the second projection surface is a shooting overlook surface corresponding to the target image; if the length of the target projection frame in the normal direction of the second projection surface is smaller than a fourth preset value, updating the labeling result of a second three-dimensional labeling frame corresponding to the target projection frame into a first object; if the length of the target projection frame in the normal direction of the second projection surface is larger than a fifth preset value, updating the labeling result of a second three-dimensional labeling frame corresponding to the target projection frame into a second object; and if the length of the target projection frame in the normal direction of the second projection surface is larger than or equal to the fourth preset value and smaller than or equal to the fifth preset value, updating the labeling result of the second three-dimensional labeling frame corresponding to the target projection frame into a third object.
In an exemplary embodiment of the disclosure, the labeling the target image according to the labeling result of each of the second three-dimensional labeling frames to generate training data includes: obtaining a target object corresponding to each second three-dimensional annotation frame and preset parameters, wherein the preset parameters at least comprise height information and distance information; determining the position of each target object in the target image; and generating the training data according to the target image, the position of each target object and the preset parameters of each target object.
According to a second aspect of the embodiments of the present disclosure, there is provided a training data generating apparatus, including: the point cloud data acquisition module is used for acquiring target three-dimensional point cloud data corresponding to a target image, wherein the target three-dimensional point cloud data comprise three-dimensional annotation frames of a plurality of objects and annotation results corresponding to the three-dimensional annotation frames; the visual frame screening module is used for acquiring a first three-dimensional annotation frame in the identification range of the target image; the repeated frame processing module is used for processing two or more first three-dimensional annotation frames of the same object to obtain a preset number of second three-dimensional annotation frames of the same object; the data labeling module is used for labeling the target image according to the labeling result of each second three-dimensional labeling frame to generate training data, and the training data are used for training the target detection model.
According to a third aspect of the present disclosure, there is provided an electronic device comprising: a memory; and a processor coupled to the memory, the processor configured to perform the method of any of the above based on instructions stored in the memory.
According to a fourth aspect of the present disclosure, there is provided a computer-readable storage medium having stored thereon a program which, when executed by a processor, implements the training data generation method as set forth in any one of the above.
According to the embodiment of the disclosure, the existing laser point cloud data and the corresponding shooting images are processed to obtain the training data for training the monocular three-dimensional target detection model, and the corresponding relation between the three-dimensional information and the two-dimensional information of various objects can be obtained without modeling, so that the cost for generating the training data of the monocular three-dimensional target detection model is greatly reduced; in addition, the labeling frame of the laser point cloud data is directly and simply processed, so that the processing efficiency is high, the problems of high cost and low efficiency of training data for constructing the monocular three-dimensional target detection model in the related technology can be solved, and the training data of the monocular three-dimensional target detection model can be generated in a large scale at low cost.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort.
Fig. 1 is a flowchart of a training data generation method in an exemplary embodiment of the present disclosure.
Fig. 2 is a sub-flowchart of step S2 in one embodiment of the present disclosure.
Fig. 3 is a schematic diagram of step S22 in one embodiment of the present disclosure.
FIG. 4 is a flow chart of determining a first three-dimensional annotation box in one embodiment of the disclosure.
FIG. 5 is a schematic diagram of a first three-dimensional annotation box corresponding to the embodiment shown in FIG. 3, according to one embodiment of the disclosure.
Fig. 6 is a sub-flowchart of step S3 in one embodiment of the present disclosure.
FIG. 7 is a schematic view of a projection frame of a first three-dimensional annotation frame onto a second projection surface according to one embodiment of the disclosure.
FIG. 8 is a schematic view of a projection frame of the first three-dimensional labeling frame of the embodiment of FIG. 7 on a first projection surface.
FIG. 9 is a flow chart of determining a second three-dimensional annotation box in one embodiment of the disclosure.
FIG. 10 is a schematic view of a second three-dimensional annotation frame on a second projection surface, corresponding to the embodiment shown in FIG. 7, after operation with the embodiment shown in FIG. 9.
FIG. 11 is a flow chart of updating labeling results according to the dimensions of a second three-dimensional labeling frame in one embodiment of the disclosure.
Fig. 12 is another flow chart of the embodiment shown in fig. 11.
Fig. 13 is a block diagram of a training data generation apparatus in an exemplary embodiment of the present disclosure.
Fig. 14 is a block diagram of an electronic device in an exemplary embodiment of the present disclosure.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the present disclosure. One skilled in the relevant art will recognize, however, that the aspects of the disclosure may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known technical solutions have not been shown or described in detail to avoid obscuring aspects of the present disclosure.
Furthermore, the drawings are only schematic illustrations of the present disclosure, in which the same reference numerals denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.
The following describes example embodiments of the present disclosure in detail with reference to the accompanying drawings.
Fig. 1 is a flowchart of a training data generation method in an exemplary embodiment of the present disclosure.
Referring to fig. 1, the training data generation method 100 may include:
step S1, target three-dimensional point cloud data corresponding to a target image are obtained, wherein the target three-dimensional point cloud data comprise three-dimensional labeling frames of a plurality of objects and labeling results corresponding to the three-dimensional labeling frames;
s2, acquiring a first three-dimensional annotation frame in the identification range of the target image;
step S3, processing two or more first three-dimensional labeling frames of the same object to obtain a preset number of second three-dimensional labeling frames of the same object;
and S4, marking the target image according to the marking result of each second three-dimensional marking frame to generate training data, wherein the training data are used for training the target detection model.
According to the embodiment of the disclosure, the existing laser point cloud data and the corresponding shooting images are processed to obtain the training data for training the monocular three-dimensional target detection model, and the corresponding relation between the three-dimensional information and the two-dimensional information of various objects can be obtained without modeling, so that the cost for generating the training data of the monocular three-dimensional target detection model is greatly reduced; in addition, the labeling frame of the laser point cloud data is directly and simply processed, so that the processing efficiency is high, the problems of high cost and low efficiency of training data for constructing the monocular three-dimensional target detection model in the related technology can be solved, and the training data of the monocular three-dimensional target detection model can be generated in a large scale at low cost.
Next, each step of the training data generation method 100 will be described in detail.
In step S1, target three-dimensional point cloud data corresponding to a target image is obtained, where the target three-dimensional point cloud data includes three-dimensional labeling frames of a plurality of objects and labeling results corresponding to each three-dimensional labeling frame.
Because the technology of acquiring three-dimensional point cloud data through a laser radar is mature, a large number of three-dimensional point cloud data sets are disclosed in the market. In the embodiment of the disclosure, the three-dimensional point cloud data in the disclosed three-dimensional point cloud data set can be directly processed, so that an own training database is expanded by utilizing externally disclosed data, the generation cost of the training data is reduced, and the generation efficiency of the training data is improved. It is noted that a three-dimensional point cloud dataset comprising two-dimensional image information needs to be selected.
Of course, in the case that the cost of the disclosed three-dimensional point cloud data set is high, or in the case that the disclosed three-dimensional point cloud data set lacks a target object to be identified, a vehicle with laser point cloud information acquisition capability and identification capability can also be used, and two-dimensional images and three-dimensional point cloud data of the target object to be identified can be acquired in real time through a laser radar and a camera. Because the cost of the laser radar and the camera is limited, the acquisition and labeling speeds are high, even if a person skilled in the art acquires three-dimensional point cloud data and target images on site for establishing a monocular three-dimensional target detection model, the cost is far lower than that of a method for establishing a CAD model in the prior art, and the efficiency and the quantity of training data can be greatly improved.
Because the three-dimensional point cloud data are all set for training the laser radar, the marking frames are three-dimensional marking frames, and the problems that the marking frames are mutually shielded, different parts of an object are respectively marked and the like generally exist, the three-dimensional point cloud data cannot be directly applied to training of a monocular three-dimensional target detection model, and therefore the embodiment of the disclosure sets the existing three-dimensional point cloud data to be processed as follows.
In some cases, the three-dimensional point cloud data set may be a set of data corresponding to a plurality of two-dimensional images, which may be, for example, photographs, video frames, and the like. At this time, the target three-dimensional point cloud data corresponding to each image is the same set of three-dimensional point cloud data. In other cases, for example, in a scenario where three-dimensional point cloud data is collected in a custom manner, the obtained two-dimensional image may correspond to a set of three-dimensional point cloud data sets, and the target three-dimensional point cloud data corresponding to the target image may be a specific set of three-dimensional point cloud data. The correspondence between the target image and the target three-dimensional point cloud data may be different according to different data acquisition manners, which is not limited in this disclosure.
In step S2, a first three-dimensional annotation frame within the recognition range of the target image is acquired.
Fig. 2 is a sub-flowchart of step S2 in one embodiment of the present disclosure.
Referring to fig. 2, in one embodiment, step S2 may include:
step S21, a projection frame of a three-dimensional annotation frame in the target three-dimensional point cloud data on a first projection surface and a center point coordinate of the projection frame are obtained, wherein the first projection surface is a shooting surface corresponding to the target image;
step S22, deleting the three-dimensional annotation frame corresponding to the center point coordinate if one center point coordinate is out of the display coordinate range of the target image;
step S23, deleting the three-dimensional center point in the two three-dimensional labeling frames corresponding to the two projection frames from the first projection surface if the overlapping degree of the two projection frames is larger than a first preset value;
and S24, determining the rest three-dimensional annotation frames as the first three-dimensional annotation frame.
Because the display range of the target image is limited, and the spatial range of the three-dimensional point cloud data is generally larger, the embodiment of the disclosure sets to delete the three-dimensional annotation frame which cannot be displayed in the two-dimensional target image.
Fig. 3 is a schematic diagram of step S22 in one embodiment of the present disclosure.
Referring to fig. 3, the first projection plane is a photographing plane of a target image having a rectangular display range 300, and a plurality of three-dimensional labeling frames have projection frames 31 to 39 each having center point coordinates on the first projection plane.
In the calculation process, a three-dimensional labeling frame is set to be (length, width, height), the coordinates of a three-dimensional center point are (x, y, z), the coordinates of a projection frame on the shooting surface of the target image are calculated to be (x 0, y0, x1, y 1) by using a projection matrix, and the coordinates of the center point of the projection frame are (center) x ,center y ) The target image is (image width ,image height ) The projection frame center point coordinates satisfying the following formula (1) can be determined to be within the display coordinate range of the target image:
if the center point coordinates not satisfying the formula (1) are determined to be outside the display coordinate range of the target image, the three-dimensional label frame corresponding to the projection frame corresponding to the center point coordinates is deleted. In the embodiment shown in fig. 3, the three-dimensional annotation boxes corresponding to the projection boxes 31, 39 are deleted.
Because the visual imaging is unidirectional, the distant object on the two-dimensional image is very easily blocked by the near object, and thus the embodiment of the present disclosure sets to delete the three-dimensional annotation frame corresponding to the invisible object blocked for the target image in step S23.
In an embodiment of the present disclosure, the arrangement uses the degree of overlap (Intersection over Union, IOU) between projection frames to determine occlusion relationships between objects. The IOU is also called an intersection ratio, and calculates the ratio of the intersection and the union of the predicted border and the real border of the graphic object, taking two graphics as an example, firstly calculating the intersection area of the two graphics, then calculating the union area of the two graphics, and finally determining the overlapping degree (IOU) between the two graphics according to the ratio of the intersection area and the union area.
With continued reference to fig. 3, it can be observed that there is a high degree of overlap between the projection frame 33 and the projection frame 34. And if two projection frames are overlapped on the shooting surface of the target image, the corresponding two target objects are indicated to have shielding relation. Since the object located at the rear cannot be identified through the target image when the occlusion is serious, a first preset value, for example, 0.7, can be set to determine whether the occlusion relationship affects the display effect of the object in the target image, thereby affecting the identification of the object. If the overlapping degree exceeds a first preset value, the fact that the rear object is seriously blocked by the front object is indicated, the rear object does not need to be identified, and the three-dimensional marking frame corresponding to the rear object can be deleted, so that the ratio of the effective training data in all the training data is improved. The front and rear positions of the two objects can be determined according to the positions of the three-dimensional center points of the two three-dimensional labeling frames and the first projection surface, when the three-dimensional labeling frames have depth information corresponding to the image shooting surface, the depth information of the three-dimensional labeling frames corresponding to the two projection frames with the overlapping degree exceeding a first preset value can be directly obtained, and the three-dimensional labeling frames with larger depth relative to the shooting surface are deleted.
And finally, determining the rest three-dimensional annotation frames as a first three-dimensional annotation frame, namely the three-dimensional annotation frame of the object which can be accurately identified through the target picture.
FIG. 4 is a flow chart of determining a first three-dimensional annotation box in one embodiment of the disclosure.
Referring to FIG. 4, in one embodiment, determining a particular flow of a first three-dimensional annotation box can comprise:
step S400, a three-dimensional point cloud data set is obtained, wherein the three-dimensional point cloud data set comprises a plurality of three-dimensional annotation frames, an annotation result corresponding to each three-dimensional annotation frame and a plurality of pictures corresponding to the three-dimensional point cloud data set.
In step S401, coordinates of N projection center points of N three-dimensional labeling frames in the three-dimensional point cloud data corresponding to the target picture on N projection frames T1 and N projection center points of the first projection plane are obtained, where i=0 and m=n are set.
The images in the three-dimensional point cloud data set can be ordered, each image is marked in sequence, and the currently marked image is called a target image. i is a parameter used for recording serial numbers of three-dimensional labeling frames in the embodiment, and one three-dimensional labeling frame, a projection frame corresponding to the three-dimensional labeling frame and a projection center point are all set to be the same serial number; m is a parameter for recording the number of three-dimensional annotation boxes within the display range of the target picture in the present embodiment.
Step S402, judging whether the coordinates of the ith projection center point exceed the display area of the target image, if so, entering step S403, deleting the ith three-dimensional labeling frame, subtracting one from M, and entering step S404; if not, go to step S404;
step S404, judging whether i is equal to the total number N of the three-dimensional labeling frames, if not, entering step S405 to add one to i, and returning to step S402 to process the next projection frame; if yes, go to step S406;
in step S406, the i value is reset and p=m is set, where P is a parameter used to record the number of first three-dimensional label frames in this embodiment. The step is used for setting a serial number for the three-dimensional labeling frame in the display range of the target image.
It should be noted that, because there may be a layer-by-layer shielding problem between the three-dimensional labeling frames, in step S406, each three-dimensional labeling frame may be numbered sequentially from large to small according to the distance between the three-dimensional center point and the first projection plane, so that in the subsequent steps S408 to S410, the three-dimensional labeling frame corresponding to the shielded object is deleted from back to front, and it is avoided that after the three-dimensional labeling frame located in the middle of the shielding queue is deleted first, the object located at the forefront of the shielding queue and the object located at the rearmost of the shielding queue are judged as having no shielding relationship.
Step S407, judging the ith projection frame T1 i And the (i+1) th projection frame T1 i+1 Overlap IOU (T1) i ,T1 i+1 ) Whether or not to be greater than or equal to a first preset value Vth 1 If yes, step S408 is carried out, and step S409 is carried out after subtracting one from P, if the three-dimensional center points in the ith three-dimensional labeling frame and the (i+1) th three-dimensional labeling frame are far away from the first projection surface; if not, go to step S409;
step S409, judging whether i+1 is equal to M, if not, entering step S410 to add one to i, returning to step S407 to judge the next group of projection frames; if so, step S411 is entered to determine the remaining P three-dimensional annotation frames as the first three-dimensional annotation frame.
FIG. 5 is a schematic diagram of a first three-dimensional annotation box corresponding to the embodiment shown in FIG. 3 in one embodiment of the disclosure.
Referring to fig. 5, after deleting the three-dimensional annotation frame corresponding to the projection frames 31, 39 exceeding the target image display range 300 and the blocked following three-dimensional annotation frame (corresponding to the projection frame 33), the first three-dimensional annotation frame includes the three-dimensional annotation frames corresponding to the projection frames 32, 34, 35, 36, 37, 38.
In step S3, two or more first three-dimensional labeling frames of the same object are processed, so as to obtain a preset number of second three-dimensional labeling frames of the same object.
The disclosed embodiments may be used for generation of training data for monocular three-dimensional object detection models equipped for autonomous vehicles. Because three-dimensional point cloud data generally includes multiple labeling methods, for example, different parts of an object are labeled respectively, and the application scene of automatic driving field or other monocular three-dimensional target detection needs to pay attention to the object main body, in order to improve the effective duty ratio of training data, multiple labeling frames of the same object can be processed.
Fig. 6 is a sub-flowchart of step S3 in one embodiment of the present disclosure.
Referring to fig. 6, in one embodiment, step S3 may include:
step S31, a first projection frame and a second projection frame of two first three-dimensional labeling frames on a second projection plane are obtained, wherein the second projection plane is a shooting overlook plane corresponding to the target image;
step S32, deleting a first three-dimensional labeling frame corresponding to the smaller area in the first projection frame and the second projection frame when the overlapping degree of the first projection frame and the second projection frame is larger than a second preset value and smaller than a third preset value;
step S33, when the overlapping degree of the first projection frame and the second projection frame is larger than or equal to the third preset value, a third projection frame and a fourth projection frame of the two first three-dimensional labeling frames on a first projection plane are obtained, wherein the first projection plane is a shooting plane corresponding to the target image;
And step S34, deleting the first three-dimensional annotation frame corresponding to the person meeting the preset height condition in the third projection frame and the fourth projection frame.
FIG. 7 is a schematic view of a projection frame of a first three-dimensional annotation frame onto a second projection surface according to one embodiment of the disclosure.
Referring to fig. 7, the projection frame of the first three-dimensional labeling frame on the second projection surface may include, for example, projection frames 71 to 76. The inventor analyzes that, because the second projection surface is a shooting top view surface corresponding to the target image, if two three-dimensional marking frames are overlapped in a large range on the top view surface, the two three-dimensional marking frames mark parts (such as a hanger and a vehicle body of a crane) which possibly are different in height of the same object or different components (such as a shovel and a vehicle body of a bulldozer). In the embodiment of the disclosure, two projection frames with higher overlapping degree (larger than a third preset value) are judged to be projection frames with different heights of the same object; and judging that the projection frames with lower overlapping degree (between the second preset value and the third preset value) are projection frames of different components of the same object. Further, different methods are adopted for processing different situations.
For projection frames of different components, only a main part is reserved, so that the embodiment of the disclosure sets to delete the first three-dimensional labeling frame corresponding to the smaller area on the second projection surface in the two overlapped projection frames. However, in some embodiments, the main component may be in a thin and high shape with respect to the ground, and the secondary component may be in a flat shape with respect to the ground, so that the first three-dimensional labeling frame corresponding to the main component may be deleted by mistake, and therefore, the projection frames determined to be different components of the same object may be screened out according to the height of the corresponding first three-dimensional labeling frame on the projection frame of the first projection surface, so that the first three-dimensional labeling frame with a height meeting the preset condition may be screened out. In some embodiments of the present disclosure, only one three-dimensional annotation frame closest to the ground is reserved for one object, so as to facilitate application of the generated training data in the unmanned technical field; in other embodiments of the present disclosure, one or more marking boxes that remain in compliance with preset conditions for an object may also be provided, so as to apply the training data to other target technical fields. The preset conditions can be set by a person skilled in the art according to the application purpose of the training data, which is not limited in this disclosure.
FIG. 8 is a schematic view of a projection frame of the first three-dimensional labeling frame of the embodiment of FIG. 7 on a first projection surface.
Referring to fig. 8, the projection frames 81 to 86 and the projection frames 71 to 76 are a projection frame of the first three-dimensional labeling frame on the second projection plane and a projection frame on the first projection plane, respectively. The projection frames 81 to 86 correspond to the projection frames 71 to 76, respectively. As can be seen from fig. 8, the projection frames 73 and 74 with overlapping degree larger than the third preset value (for example, 0.9) on the second projection surface are different in height, at this time, if only one three-dimensional labeling frame closest to the ground is set for one object, so as to facilitate the application of the generated training data in the unmanned technical field, the first three-dimensional labeling frame corresponding to the higher projection frame 84 may be deleted; similarly, the overlapping degree of the projection frames 71 and 72 on the second projection surface is greater than a second preset value (for example, 0.1) but less than a third preset value (for example, 0.9), the heights of the projection frames 81 and 82 of the corresponding first projection surfaces are different, and the first three-dimensional labeling frame corresponding to the higher projection frame 81 can be deleted. Finally, the remaining first three-dimensional annotation frame is set as a second three-dimensional annotation frame for generating annotation information.
In one embodiment, the method for deleting the higher first three-dimensional annotation frame in fig. 8 by step S34 may include: determining a first center point height and a first height corresponding to the third projection frame, and determining a first vertex height of the third projection frame according to the first center point height and the first height; determining a second center point height and a second height corresponding to the fourth projection frame, and determining a second vertex height of the fourth projection frame according to the second center point height and the second height; deleting a first three-dimensional annotation frame corresponding to the third projection frame when the first center point height is larger than the second vertex height; and deleting the first three-dimensional annotation frame corresponding to the fourth projection frame when the height of the second center point is larger than that of the first vertex.
FIG. 9 is a flow chart of determining a second three-dimensional annotation box in one embodiment of the disclosure.
Referring to FIG. 9, in one embodiment, the process of determining the second three-dimensional annotation box can include:
step S901, obtaining P projection frames T2 of the P first three-dimensional labeling frames on the second projection surface, where j=0 and q=p are set; the present embodiment records the number of second three-dimensional projection frames using Q.
Step S902, judging the projection frame j T2 of the jth first three-dimensional labeling frame on the second projection surface j And the j+1th projection frame T2 j+1 Overlap IOU (T2) j ,T2 j+1 ) Whether or not to be at the second preset value Vth 2 And a third preset value Vth 3 If yes, enter step S903 and delete the first three-dimensional label frame corresponding to the j-th projection frame and the smaller area in the j+1-th projection frame, enter step S911 after subtracting one to Q; if not, go to step S904;
step S904, determine IOU (T2 j ,T2 j+1 ) Whether or not to be greater than or equal to a third preset value Vth 3 If yes, go to step S905, if no, go to step S911;
step S905, obtaining the j-th first three-dimensional labeling frame and the projection frame T1 of the j+1th first three-dimensional labeling frame on the first projection surface j And projection frame T1 j+1
Step S906, the acquired projection frame T1 j Is used for obtaining a projection frame T1 by the first center point height and the first peak point height j+1 A second center point height and a second vertex height;
step S907, judging whether the first center point height is larger than the second vertex height, if so, entering step S908 to delete the j-th first three-dimensional labeling frame, and entering step S911 after subtracting one operation for Q; if not, go to step S909;
step S909, judging whether the second center point height is greater than the first vertex height, if so, entering step S910 to delete the j+1th first three-dimensional labeling frame, and entering step S911 after subtracting one from Q; if not, go to step S911;
step S911, judging whether j+1 is equal to P, if not, proceeding to step S912 to add one operation to j, returning to step S902 to judge the next group of projection frames; if so, the process proceeds to step S913 to determine the remaining Q first three-dimensional annotation frames as second three-dimensional annotation frames.
FIG. 10 is a schematic view of a second three-dimensional annotation frame on a second projection surface, corresponding to the embodiment shown in FIG. 7, after operation with the embodiment shown in FIG. 9.
Referring to fig. 10, after deleting the first three-dimensional label frame corresponding to the smaller projection frame 71 and the first three-dimensional label frame corresponding to the higher projection frame 74, the second three-dimensional label frame is the three-dimensional label frame corresponding to the projection frames 72, 73, 75, 76.
In another embodiment of the present disclosure, if the three-dimensional point cloud data is from the public dataset, the labeling category is thicker, and the labeling result of each second three-dimensional labeling frame can be updated, so as to achieve labeling more in line with the detection purpose of the monocular three-dimensional target detection model.
For example, the labeling result may be adjusted when the labeling result of one of the second three-dimensional labeling frames matches the target object. In some embodiments, for example, the public dataset labels all vehicles as vehicles, at which time the labeling results may be "bicycle", "small motor vehicle", "large motor vehicle", etc. according to the size of the three-dimensional labeling frame; in other embodiments, if the labeling result is "large-scale motor vehicle", the classification model (e.g. trained neural network model) may be further used to further classify the object corresponding to the second three-dimensional labeling frame into "bus", "truck", and so on.
FIG. 11 is a flow chart of updating labeling results according to the dimensions of a second three-dimensional labeling frame in one embodiment of the disclosure.
Referring to FIG. 11, in one embodiment, the process of updating the annotation result based on the size of the second three-dimensional annotation frame may comprise:
Step S111, obtaining a target projection frame of the second three-dimensional labeling frame on a second projection surface, wherein the second projection surface is a shooting overlook surface corresponding to the target image;
step S112, if the length of the target projection frame in the normal direction of the second projection surface is smaller than a fourth preset value, updating the labeling result of a second three-dimensional labeling frame corresponding to the target projection frame to a first object;
step S113, if the length of the target projection frame in the normal direction of the second projection surface is greater than a fifth preset value, updating the labeling result of a second three-dimensional labeling frame corresponding to the target projection frame to a second object;
step S114, if the length of the target projection frame in the normal direction of the second projection surface is greater than or equal to the fourth preset value and less than or equal to the fifth preset value, updating the labeling result of the second three-dimensional labeling frame corresponding to the target projection frame to a third object.
Fig. 12 is another flow chart of the embodiment shown in fig. 11.
Referring to fig. 12, first, k=0 is set at step S121, k being a sequence number representing a second three-dimensional label frame.
Step S122, determining whether the labeling result of the kth second three-dimensional labeling frame is matched with the target object, if not, entering step S128; if yes, go to step S123;
Step S123, judging whether the length of the kth second three-dimensional labeling frame in the normal direction of the second projection surface is smaller than a fourth preset value, if so, entering step S124, updating the labeling result of the kth second three-dimensional labeling frame into a first object, and entering step S128; if not, go to step S125;
step S125, judging whether the length of the kth second three-dimensional labeling frame in the normal direction of the second projection surface is larger than a fifth preset value, if so, entering step S126, updating the labeling result of the kth second three-dimensional labeling frame into a second object, and entering step S128; if not, the step S127 is entered, and the labeling result of the kth second three-dimensional labeling frame is updated to a third object, and then the step S128 is entered;
step S128, judging whether k is equal to the number Q of the second three-dimensional labeling frames, if not, entering step S129 to add one operation to k, and returning to step S122 to judge the next second three-dimensional labeling frame; if not, step S4 is entered.
The target object in the embodiment shown in fig. 11 and 12 may be, for example, a vehicle, the first object may be, for example, a bicycle, the second object may be, for example, a large motor vehicle, and the third object may be, for example, a small motor vehicle. After updating the labeling result of the target object according to the size, a classification model may be further used to classify a part or all of the second three-dimensional labeling frames in a finer granularity, and update the labeling result again, which is not particularly limited in the present disclosure.
And in step S4, labeling the target image according to the labeling result of each second three-dimensional labeling frame to generate training data, wherein the training data is used for training the target detection model.
The process of labeling the target image according to the labeling result of each of the second three-dimensional labeling frames in the embodiments of the present disclosure may, for example, include: obtaining a target object corresponding to each second three-dimensional annotation frame and preset parameters, wherein the preset parameters at least comprise height information and distance information; determining the position of each target object in the target image; and generating the training data according to the target image, the position of each target object and the preset parameters of each target object.
Through the labeling process, each identifiable object in each image in the generated training data has the height information and the distance information, so that the monocular three-dimensional target detection model trained by using the training data can judge the three-dimensional information of the target object according to the two-dimensional image, for example, the distance of the object is estimated according to the height information of the object, and then corresponding operation is realized according to the distance of the object.
The embodiment of the disclosure can be used for directly cleaning the public data set to generate the training data of the monocular three-dimensional target detection model in a large quantity at low cost, so that a large quantity of training data meeting the training requirements can be obtained rapidly, the generalization capability of the training data set and the model can be enhanced, and the labor and financial cost can be saved greatly.
Corresponding to the above method embodiments, the present disclosure further provides a training data generating device, which may be used to perform the above method embodiments.
Fig. 13 is a block diagram of a training data generation apparatus in an exemplary embodiment of the present disclosure.
Referring to fig. 13, the training data generating apparatus 1300 may include:
the point cloud data acquisition module 131 is configured to acquire target three-dimensional point cloud data corresponding to a target image, where the target three-dimensional point cloud data includes three-dimensional labeling frames of a plurality of objects and labeling results corresponding to each of the three-dimensional labeling frames;
a visual frame screening module 132 configured to acquire a first three-dimensional annotation frame within an identification range of the target image;
the repeated frame processing module 133 is configured to process two or more first three-dimensional labeling frames of the same object to obtain a preset number of second three-dimensional labeling frames of the same object;
The data labeling module 134 is configured to label the target image according to the labeling result of each of the second three-dimensional labeling frames to generate training data, where the training data is used to train the target detection model.
In one exemplary embodiment of the present disclosure, the visual box screening module 132 is configured to: acquiring a projection frame of a three-dimensional annotation frame in the target three-dimensional point cloud data on a first projection plane and a central point coordinate of the projection frame, wherein the first projection plane is a shooting plane corresponding to the target image; if one of the center point coordinates is out of the display coordinate range of the target image, deleting the three-dimensional annotation frame corresponding to the center point coordinate; if the overlapping degree of the two projection frames is larger than a first preset value, deleting the three-dimensional center points in the two three-dimensional labeling frames corresponding to the two projection frames which are far away from the first projection surface; and determining the rest three-dimensional annotation frames as the first three-dimensional annotation frame.
In one exemplary embodiment of the present disclosure, the repeat block processing module 133 is configured to: acquiring a first projection frame and a second projection frame of two first three-dimensional labeling frames on a second projection plane, wherein the second projection plane is a shooting overlook plane corresponding to the target image; deleting a first three-dimensional labeling frame corresponding to the smaller area in the first projection frame and the second projection frame when the overlapping degree of the first projection frame and the second projection frame is larger than a second preset value and smaller than a third preset value; when the overlapping degree of the first projection frame and the second projection frame is larger than or equal to the third preset value, a third projection frame and a fourth projection frame of the two first three-dimensional labeling frames on a first projection surface are obtained, and the first projection surface is a shooting surface corresponding to the target image; and deleting the first three-dimensional annotation frame corresponding to the third projection frame and the fourth projection frame which meet the preset height condition.
In one exemplary embodiment of the present disclosure, the repeat block processing module 133 is configured to: determining a first center point height and a first height corresponding to the third projection frame, and determining a first vertex height of the third projection frame according to the first center point height and the first height; determining a second center point height and a second height corresponding to the fourth projection frame, and determining a second vertex height of the fourth projection frame according to the second center point height and the second height; deleting a first three-dimensional annotation frame corresponding to the third projection frame when the first center point height is larger than the second vertex height; and deleting the first three-dimensional annotation frame corresponding to the fourth projection frame when the height of the second center point is larger than that of the first vertex.
In an exemplary embodiment of the present disclosure, the method further includes a classification module 135 configured to: and when the labeling result of the second three-dimensional labeling frame is matched with a target object, updating the labeling result according to the size of the second three-dimensional labeling frame.
In one exemplary embodiment of the present disclosure, classification module 135 is configured to: acquiring a target projection frame of the second three-dimensional annotation frame on a second projection surface, wherein the second projection surface is a shooting overlook surface corresponding to the target image; if the length of the target projection frame in the normal direction of the second projection surface is smaller than a fourth preset value, updating the labeling result of a second three-dimensional labeling frame corresponding to the target projection frame into a first object; if the length of the target projection frame in the normal direction of the second projection surface is larger than a fifth preset value, updating the labeling result of a second three-dimensional labeling frame corresponding to the target projection frame into a second object; and if the length of the target projection frame in the normal direction of the second projection surface is larger than or equal to the fourth preset value and smaller than or equal to the fifth preset value, updating the labeling result of the second three-dimensional labeling frame corresponding to the target projection frame into a third object.
In one exemplary embodiment of the present disclosure, the data annotation module 134 is configured to: obtaining a target object corresponding to each second three-dimensional annotation frame and preset parameters, wherein the preset parameters at least comprise height information and distance information; determining the position of each target object in the target image; and generating the training data according to the target image, the position of each target object and the preset parameters of each target object.
Since the functions of the apparatus 1300 are described in detail in the corresponding method embodiments, the disclosure is not repeated here.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
In an exemplary embodiment of the present disclosure, an electronic device capable of implementing the above method is also provided.
Those skilled in the art will appreciate that the various aspects of the invention may be implemented as a system, method, or program product. Accordingly, aspects of the invention may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.
An electronic device 1400 according to such an embodiment of the invention is described below with reference to fig. 14. The electronic device 1400 shown in fig. 14 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.
As shown in fig. 14, the electronic device 1400 is embodied in the form of a general purpose computing device. Components of electronic device 1400 may include, but are not limited to: the at least one processing unit 1410, the at least one memory unit 1420, and a bus 1430 connecting the different system components (including the memory unit 1420 and the processing unit 1410).
Wherein the storage unit stores program code that is executable by the processing unit 1410 such that the processing unit 1410 performs steps according to various exemplary embodiments of the present invention described in the above section of the "exemplary method" of the present specification. For example, the processing unit 1410 may perform the methods as shown in the embodiments of the present disclosure.
The memory unit 1420 may include readable media in the form of volatile memory units, such as Random Access Memory (RAM) 14201 and/or cache memory 14202, and may further include Read Only Memory (ROM) 14203.
The memory unit 1420 may also include a program/utility 14204 having a set (at least one) of program modules 14205, such program modules 14205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.
Bus 1430 may be a local bus representing one or more of several types of bus structures including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or using any of a variety of bus architectures.
The electronic device 1400 may also communicate with one or more external devices 1500 (e.g., keyboard, pointing device, bluetooth device, etc.), one or more devices that enable a user to interact with the electronic device 1400, and/or any device (e.g., router, modem, etc.) that enables the electronic device 1400 to communicate with one or more other computing devices. Such communication may occur through an input/output (I/O) interface 1450. Also, electronic device 1400 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the Internet, through network adapter 1460. As shown, the network adapter 1460 communicates with other modules of the electronic device 1400 via the bus 1430. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with electronic device 1400, including, but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, including several instructions to cause a computing device (may be a personal computer, a server, a terminal device, or a network device, etc.) to perform the method according to the embodiments of the present disclosure.
In an exemplary embodiment of the present disclosure, a computer-readable storage medium having stored thereon a program product capable of implementing the method described above in the present specification is also provided. In some possible embodiments, the various aspects of the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the invention as described in the "exemplary methods" section of this specification, when said program product is run on the terminal device.
The program product for implementing the above-described method according to an embodiment of the present invention may employ a portable compact disc read-only memory (CD-ROM) and include program code, and may be run on a terminal device such as a personal computer. However, the program product of the present invention is not limited thereto, and in this document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium can be, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable signal medium may include a data signal propagated in baseband or as part of a carrier wave with readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A readable signal medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
Furthermore, the above-described drawings are only schematic illustrations of processes included in the method according to the exemplary embodiment of the present invention, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.
Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

Claims (9)

1. A training data generation method for a monocular three-dimensional object detection model, comprising:
acquiring target three-dimensional point cloud data corresponding to a target image, wherein the target three-dimensional point cloud data comprises three-dimensional annotation frames of a plurality of objects and annotation results corresponding to each three-dimensional annotation frame;
Acquiring a first three-dimensional annotation frame in the identification range of the target image;
processing two or more first three-dimensional annotation frames of the same object to obtain a preset number of second three-dimensional annotation frames of the same object;
labeling the target image according to the labeling result of each second three-dimensional labeling frame to generate training data, wherein the training data are used for training the target detection model;
the processing the two or more first three-dimensional labeling frames of the same object to obtain a preset number of second three-dimensional labeling frames of the same object comprises the following steps:
acquiring a first projection frame and a second projection frame of two first three-dimensional labeling frames on a second projection plane, wherein the second projection plane is a shooting overlook plane corresponding to the target image;
deleting a first three-dimensional labeling frame corresponding to the smaller area in the first projection frame and the second projection frame when the overlapping degree of the first projection frame and the second projection frame is larger than a second preset value and smaller than a third preset value;
when the overlapping degree of the first projection frame and the second projection frame is larger than or equal to the third preset value, a third projection frame and a fourth projection frame of the two first three-dimensional labeling frames on a first projection surface are obtained, and the first projection surface is a shooting surface corresponding to the target image;
And deleting the first three-dimensional annotation frame corresponding to the third projection frame and the fourth projection frame which meet the preset height condition.
2. The training data generation method of claim 1, wherein the acquiring a first three-dimensional annotation box within the recognition range of the target image comprises:
acquiring a projection frame of a three-dimensional annotation frame in the target three-dimensional point cloud data on a first projection plane and a central point coordinate of the projection frame, wherein the first projection plane is a shooting plane corresponding to the target image;
if one of the center point coordinates is out of the display coordinate range of the target image, deleting the three-dimensional annotation frame corresponding to the center point coordinate;
if the overlapping degree of the two projection frames is larger than a first preset value, deleting the three-dimensional center points in the two three-dimensional labeling frames corresponding to the two projection frames which are far away from the first projection surface;
and determining the rest three-dimensional annotation frames as the first three-dimensional annotation frame.
3. The training data generating method as claimed in claim 1, wherein said deleting the first three-dimensional labeling frame corresponding to the person meeting the preset height condition in the third projection frame and the fourth projection frame includes:
Determining a first center point height and a first height corresponding to the third projection frame, and determining a first vertex height of the third projection frame according to the first center point height and the first height;
determining a second center point height and a second height corresponding to the fourth projection frame, and determining a second vertex height of the fourth projection frame according to the second center point height and the second height;
deleting a first three-dimensional annotation frame corresponding to the third projection frame when the first center point height is larger than the second vertex height;
and deleting the first three-dimensional annotation frame corresponding to the fourth projection frame when the height of the second center point is larger than that of the first vertex.
4. The training data generating method according to claim 1, further comprising, after the obtaining a predetermined number of second three-dimensional annotation frames of the same object:
and when the labeling result of the second three-dimensional labeling frame is matched with a target object, updating the labeling result according to the size of the second three-dimensional labeling frame.
5. The training data generation method of claim 4, wherein the updating the annotation result based on the size of the second three-dimensional annotation frame comprises:
Acquiring a target projection frame of the second three-dimensional annotation frame on a second projection surface, wherein the second projection surface is a shooting overlook surface corresponding to the target image;
if the length of the target projection frame in the normal direction of the second projection surface is smaller than a fourth preset value, updating the labeling result of a second three-dimensional labeling frame corresponding to the target projection frame into a first object;
if the length of the target projection frame in the normal direction of the second projection surface is larger than a fifth preset value, updating the labeling result of a second three-dimensional labeling frame corresponding to the target projection frame into a second object;
and if the length of the target projection frame in the normal direction of the second projection surface is larger than or equal to the fourth preset value and smaller than or equal to the fifth preset value, updating the labeling result of the second three-dimensional labeling frame corresponding to the target projection frame into a third object.
6. The training data generation method of claim 1, wherein labeling the target image according to the labeling result of each of the second three-dimensional labeling frames to generate training data comprises:
obtaining a target object corresponding to each second three-dimensional annotation frame and preset parameters, wherein the preset parameters at least comprise height information and distance information;
Determining the position of each target object in the target image;
and generating the training data according to the target image, the position of each target object and the preset parameters of each target object.
7. A training data generation apparatus for a monocular three-dimensional object detection model, comprising:
the point cloud data acquisition module is used for acquiring target three-dimensional point cloud data corresponding to a target image, wherein the target three-dimensional point cloud data comprise three-dimensional annotation frames of a plurality of objects and annotation results corresponding to the three-dimensional annotation frames;
the visual frame screening module is used for acquiring a first three-dimensional annotation frame in the identification range of the target image;
the repeated frame processing module is used for processing two or more first three-dimensional annotation frames of the same object to obtain a preset number of second three-dimensional annotation frames of the same object;
the data labeling module is used for labeling the target image according to the labeling result of each second three-dimensional labeling frame to generate training data, and the training data are used for training the target detection model;
wherein the repeating frame processing module is configured to: acquiring a first projection frame and a second projection frame of two first three-dimensional labeling frames on a second projection plane, wherein the second projection plane is a shooting overlook plane corresponding to the target image; deleting a first three-dimensional labeling frame corresponding to the smaller area in the first projection frame and the second projection frame when the overlapping degree of the first projection frame and the second projection frame is larger than a second preset value and smaller than a third preset value; when the overlapping degree of the first projection frame and the second projection frame is larger than or equal to the third preset value, a third projection frame and a fourth projection frame of the two first three-dimensional labeling frames on a first projection surface are obtained, and the first projection surface is a shooting surface corresponding to the target image; and deleting the first three-dimensional annotation frame corresponding to the third projection frame and the fourth projection frame which meet the preset height condition.
8. An electronic device, comprising:
a memory; and
a processor coupled to the memory, the processor configured to perform the training data generation method of any of claims 1-6 based on instructions stored in the memory.
9. A computer readable storage medium having stored thereon a program which, when executed by a processor, implements the training data generation method of any of claims 1-6.
CN202110238897.1A 2021-03-04 2021-03-04 Training data generation method and device and electronic equipment Active CN113808186B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110238897.1A CN113808186B (en) 2021-03-04 2021-03-04 Training data generation method and device and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110238897.1A CN113808186B (en) 2021-03-04 2021-03-04 Training data generation method and device and electronic equipment

Publications (2)

Publication Number Publication Date
CN113808186A CN113808186A (en) 2021-12-17
CN113808186B true CN113808186B (en) 2024-01-16

Family

ID=78892886

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110238897.1A Active CN113808186B (en) 2021-03-04 2021-03-04 Training data generation method and device and electronic equipment

Country Status (1)

Country Link
CN (1) CN113808186B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116469025B (en) * 2022-12-30 2023-11-24 以萨技术股份有限公司 Processing method for identifying task, electronic equipment and storage medium

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108960174A (en) * 2018-07-12 2018-12-07 广东工业大学 A kind of object detection results optimization method and device
CN110163904A (en) * 2018-09-11 2019-08-23 腾讯大地通途(北京)科技有限公司 Object marking method, control method for movement, device, equipment and storage medium
WO2019196130A1 (en) * 2018-04-12 2019-10-17 广州飒特红外股份有限公司 Classifier training method and device for vehicle-mounted thermal imaging pedestrian detection
CN110443212A (en) * 2019-08-12 2019-11-12 睿魔智能科技(深圳)有限公司 Positive sample acquisition methods, device, equipment and storage medium for target detection
CN110796201A (en) * 2019-10-31 2020-02-14 深圳前海达闼云端智能科技有限公司 Method for correcting label frame, electronic equipment and storage medium
WO2020102944A1 (en) * 2018-11-19 2020-05-28 深圳市大疆创新科技有限公司 Point cloud processing method and device and storage medium
CN111310667A (en) * 2020-02-18 2020-06-19 北京小马慧行科技有限公司 Method, device, storage medium and processor for determining whether annotation is accurate
CN111523390A (en) * 2020-03-25 2020-08-11 杭州易现先进科技有限公司 Image recognition method and augmented reality AR icon recognition system
CN111563450A (en) * 2020-04-30 2020-08-21 北京百度网讯科技有限公司 Data processing method, device, equipment and storage medium
CN111583663A (en) * 2020-04-26 2020-08-25 宁波吉利汽车研究开发有限公司 Monocular perception correction method and device based on sparse point cloud and storage medium
CN111797734A (en) * 2020-06-22 2020-10-20 广州视源电子科技股份有限公司 Vehicle point cloud data processing method, device, equipment and storage medium
CN112183180A (en) * 2019-07-02 2021-01-05 通用汽车环球科技运作有限责任公司 Method and apparatus for three-dimensional object bounding of two-dimensional image data
CN112287860A (en) * 2020-11-03 2021-01-29 北京京东乾石科技有限公司 Training method and device of object recognition model, and object recognition method and system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110264468B (en) * 2019-08-14 2019-11-19 长沙智能驾驶研究院有限公司 Point cloud data mark, parted pattern determination, object detection method and relevant device
CN111652113B (en) * 2020-05-29 2023-07-25 阿波罗智联(北京)科技有限公司 Obstacle detection method, device, equipment and storage medium
CN112329846A (en) * 2020-11-03 2021-02-05 武汉光庭信息技术股份有限公司 Laser point cloud data high-precision marking method and system, server and medium

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019196130A1 (en) * 2018-04-12 2019-10-17 广州飒特红外股份有限公司 Classifier training method and device for vehicle-mounted thermal imaging pedestrian detection
CN108960174A (en) * 2018-07-12 2018-12-07 广东工业大学 A kind of object detection results optimization method and device
CN110163904A (en) * 2018-09-11 2019-08-23 腾讯大地通途(北京)科技有限公司 Object marking method, control method for movement, device, equipment and storage medium
WO2020102944A1 (en) * 2018-11-19 2020-05-28 深圳市大疆创新科技有限公司 Point cloud processing method and device and storage medium
CN112183180A (en) * 2019-07-02 2021-01-05 通用汽车环球科技运作有限责任公司 Method and apparatus for three-dimensional object bounding of two-dimensional image data
CN110443212A (en) * 2019-08-12 2019-11-12 睿魔智能科技(深圳)有限公司 Positive sample acquisition methods, device, equipment and storage medium for target detection
CN110796201A (en) * 2019-10-31 2020-02-14 深圳前海达闼云端智能科技有限公司 Method for correcting label frame, electronic equipment and storage medium
CN111310667A (en) * 2020-02-18 2020-06-19 北京小马慧行科技有限公司 Method, device, storage medium and processor for determining whether annotation is accurate
CN111523390A (en) * 2020-03-25 2020-08-11 杭州易现先进科技有限公司 Image recognition method and augmented reality AR icon recognition system
CN111583663A (en) * 2020-04-26 2020-08-25 宁波吉利汽车研究开发有限公司 Monocular perception correction method and device based on sparse point cloud and storage medium
CN111563450A (en) * 2020-04-30 2020-08-21 北京百度网讯科技有限公司 Data processing method, device, equipment and storage medium
CN111797734A (en) * 2020-06-22 2020-10-20 广州视源电子科技股份有限公司 Vehicle point cloud data processing method, device, equipment and storage medium
CN112287860A (en) * 2020-11-03 2021-01-29 北京京东乾石科技有限公司 Training method and device of object recognition model, and object recognition method and system

Also Published As

Publication number Publication date
CN113808186A (en) 2021-12-17

Similar Documents

Publication Publication Date Title
Sahu et al. Artificial intelligence (AI) in augmented reality (AR)-assisted manufacturing applications: a review
CN108830894B (en) Remote guidance method, device, terminal and storage medium based on augmented reality
US10854006B2 (en) AR-enabled labeling using aligned CAD models
CN108335353B (en) Three-dimensional reconstruction method, device and system of dynamic scene, server and medium
JP2021515939A (en) Monocular depth estimation method and its devices, equipment and storage media
CN108898676B (en) Method and system for detecting collision and shielding between virtual and real objects
US20220076072A1 (en) System and method using augmented reality for efficient collection of training data for machine learning
JP2021089724A (en) 3d auto-labeling with structural and physical constraints
JP7422105B2 (en) Obtaining method, device, electronic device, computer-readable storage medium, and computer program for obtaining three-dimensional position of an obstacle for use in roadside computing device
CN111666876B (en) Method and device for detecting obstacle, electronic equipment and road side equipment
CN111695497B (en) Pedestrian recognition method, medium, terminal and device based on motion information
CN114565916A (en) Target detection model training method, target detection method and electronic equipment
CN113808186B (en) Training data generation method and device and electronic equipment
CN111401190A (en) Vehicle detection method, device, computer equipment and storage medium
CN111784842B (en) Three-dimensional reconstruction method, device, equipment and readable storage medium
CN113378605B (en) Multi-source information fusion method and device, electronic equipment and storage medium
CN116978010A (en) Image labeling method and device, storage medium and electronic equipment
CN117036607A (en) Automatic driving scene data generation method and system based on implicit neural rendering
CN114648639B (en) Target vehicle detection method, system and device
WO2023283929A1 (en) Method and apparatus for calibrating external parameters of binocular camera
CN115222815A (en) Obstacle distance detection method, obstacle distance detection device, computer device, and storage medium
CN112381873A (en) Data labeling method and device
Syntakas et al. Object Detection and Navigation of a Mobile Robot by Fusing Laser and Camera Information
CN115410012B (en) Method and system for detecting infrared small target in night airport clear airspace and application
WO2024087962A1 (en) Truck bed orientation recognition system and method, and electronic device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant