CN113795847A

CN113795847A - 3D frame marking method, device and computer-readable storage medium

Info

Publication number: CN113795847A
Application number: CN202080033702.3A
Authority: CN
Inventors: 陈创荣; 徐斌; 陈晓智
Original assignee: SZ DJI Technology Co Ltd
Current assignee: SZ DJI Technology Co Ltd
Priority date: 2020-07-21
Filing date: 2020-07-21
Publication date: 2021-12-14
Also published as: WO2022016368A1

Abstract

A3D frame marking method, equipment and computer readable storage medium, the method marks the 3D frame directly on the image level, does not need to rely on extra depth sensor, lower costs, and the method only marks the 2D frame of the target object and a corner point on the 2D frame on the two-dimensional image containing the target object, can obtain the 3D frame of the target object, the processing procedure is simple, reduce the work load, in addition, the 3D frame that this application embodiment obtains differs slightly with the actual size of the object, solve the problem that the actual size of the 3D frame and object obtained based on the point cloud data marking of the existing is different greatly, thus, guarantee the accurate training to the neural network subsequently, improve the recognition effect of the neural network.

Description

3D frame marking method, device and computer-readable storage medium

Technical Field

Embodiments of the present disclosure relate to image processing technologies, and in particular, to a method and an apparatus for marking a 3D frame, and a computer-readable storage medium.

Background

With the rapid development of techniques in the field of Artificial Intelligence (AI) from neural networks to deep learning, people have been able to use these AI techniques to implement a perception function for the surrounding environment. For example, in automatic driving, a neural network may be used to identify images captured by a camera mounted on a vehicle, so as to obtain 2D or 3D information of surrounding target objects (such as surrounding vehicles, pedestrians, trees, etc.). However, in order to obtain a high accuracy of the recognition result, the neural network used needs to be trained first. For example, if a neural network is required to identify an image to obtain a target object and 3D information thereof, the neural network needs to be trained first by using a training image and a known target object in the image and 3D information thereof.

Obtaining the known target object and its 3D information (e.g. a pseudo 3D frame, i.e. a projection of the 3D frame on the two-dimensional image plane, which is described as a 3D frame in the following) in the image usually requires manual annotation. One of the existing labeling methods is a labeling method that relies on an external lidar sensor or an active depth sensor such as a depth camera. The method includes the steps of obtaining depth information by means of an external depth sensor, generating point cloud data, directly marking an actual 3D frame of an object in a 3D space, and projecting the 3D frame onto an image according to a coordinate conversion relation between the sensors, so that the 3D frame is obtained.

However, the labeling method has a complex process and high cost, and the actual size difference between the 3D frame obtained based on the point cloud data labeling and the object is large, which affects the subsequent training of the neural network, thereby affecting the recognition effect of the neural network. Therefore, it is necessary to provide a labeling method that is more efficient in preparation.

Disclosure of Invention

Embodiments of the present application provide a 3D frame labeling method, device, and computer-readable storage medium to overcome at least one of the above problems.

In a first aspect, an embodiment of the present application provides a 3D frame marking method, including:

acquiring a 2D frame marking operation, and determining a 2D frame of a target object on a two-dimensional image containing the target object according to the 2D frame marking operation;

Acquiring a corner marking operation, wherein the corner is positioned on one side of the 2D frame, and marking the corner on the 2D frame according to the corner marking operation;

and determining and displaying a 3D frame of the target object based on the 2D frame and the corner points.

In a second aspect, an embodiment of the present application provides another 3D frame marking method, including:

determining a 2D frame of a target object on a two-dimensional image containing the target object;

acquiring a marked corner point, wherein the corner point is positioned on one edge of the 2D frame;

and determining a 3D frame of the target object based on the 2D frame and the corner points.

In a third aspect, an embodiment of the present application provides a 3D frame labeling apparatus, including a memory, a processor, an interaction unit, and computer instructions stored in the memory and executable on the processor, where the processor implements the following steps when executing the computer instructions:

acquiring a 2D frame marking operation through the interaction unit, and determining a 2D frame of a target object on a two-dimensional image containing the target object according to the 2D frame marking operation;

acquiring a corner marking operation through the interaction unit, wherein the corner is positioned on one side of the 2D frame, and marking the corner on the 2D frame according to the corner marking operation;

And determining a 3D frame of the target object based on the 2D frame and the corner points, and displaying through the interaction unit.

In a fourth aspect, an embodiment of the present application provides another 3D frame marking device, including a memory, a processor, and computer instructions stored in the memory and executable on the processor, where the processor executes the computer instructions to implement the following steps:

In a fifth aspect, an embodiment of the present application provides a neural network training method, including:

training of the neural network is performed using the 3D frame of the target object determined by the 3D frame labeling method according to the first aspect and various possible designs of the first aspect, and the two-dimensional image containing the target object.

In a sixth aspect, an embodiment of the present application provides another neural network training method, including:

the training of the neural network is performed by using the 3D frame of the target object determined by the 3D frame labeling method according to the second aspect and various possible designs of the second aspect, and the two-dimensional image containing the target object.

In a seventh aspect, an embodiment of the present application provides a computer-readable storage medium, where computer instructions are stored, and when a processor executes the computer instructions, the 3D frame labeling method according to the first aspect and various possible designs of the first aspect is implemented.

In an eighth aspect, the present application provides another computer-readable storage medium, where computer instructions are stored, and when a processor executes the computer instructions, the 3D frame annotation method according to the second aspect and various possible designs of the second aspect is implemented.

The 3D frame marking method, the device and the computer readable storage medium provided by the embodiment of the application have the advantages that the method directly marks the 3D frame on the image level, an additional depth sensor is not needed, the cost is reduced, the 3D frame of the target object can be obtained only by marking the 2D frame of the target object on the two-dimensional image containing the target object and one corner point on the 2D frame, the processing process is simple, the workload is reduced, in addition, the difference between the 3D frame obtained by the embodiment of the application and the actual size of the object is smaller, the problem that the difference between the 3D frame obtained by the existing point cloud data marking and the actual size of the object is larger is solved, the follow-up accurate training of the neural network is ensured, and the recognition effect of the neural network is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

Fig. 1 is a schematic diagram of a 3D frame marking system architecture according to an embodiment of the present application;

fig. 2 is a schematic flowchart of a 3D frame marking method according to an embodiment of the present disclosure;

fig. 3 is a schematic view of a 2D frame provided in an embodiment of the present application;

fig. 4 is a schematic diagram of a corner point on a 2D frame according to an embodiment of the present disclosure;

fig. 5 is a schematic diagram of a 3D frame according to an embodiment of the present application;

fig. 6 is a schematic flowchart of another 3D frame marking method provided in the embodiment of the present application;

fig. 7 is a schematic flowchart of another 3D frame marking method according to an embodiment of the present disclosure;

fig. 8 is a schematic flowchart of another 3D frame marking method provided in the embodiment of the present application;

fig. 9 is a schematic flowchart of another 3D frame marking method provided in the embodiment of the present application;

fig. 10 is a schematic flowchart of another 3D frame marking method provided in the embodiment of the present application;

fig. 11 is a schematic diagram of a correspondence relationship between a 2D frame and a 3D frame according to an embodiment of the present application;

fig. 12 is a schematic flowchart of another 3D frame marking method provided in the embodiment of the present application;

Fig. 13 is a schematic structural diagram of a 3D frame marking apparatus according to an embodiment of the present application;

fig. 14 is a schematic structural diagram of another 3D frame marking apparatus provided in an embodiment of the present application;

fig. 15 is a basic hardware architecture of a 3D frame marking device according to an embodiment of the present disclosure;

fig. 16 is a basic hardware architecture of another 3D frame labeling apparatus provided in the embodiment of the present application.

With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terms "first," "second," "third," and "fourth," if any, in the description and claims of this application and the above-described figures are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

The 3D frame labeling method provided in the embodiment of the present application may be applied to early stage data labeling of neural network training, wherein the neural system may be used to obtain 2D or 3D information of a target object (such as a vehicle, a house, and the like), and the embodiment of the present application does not particularly limit this. It should be noted that the 3D frame labeled in the embodiment of the present application refers to a projection of a three-dimensional frame of an object on a two-dimensional image, not a three-dimensional frame in an actual three-dimensional space.

Optionally, the 3D frame marking method provided in the embodiment of the present application may be applied to an application scenario as shown in fig. 1. Fig. 1 only describes, by way of example, one possible application scenario of the 3D frame annotation method provided in the embodiment of the present application, and the application scenario of the 3D frame annotation method provided in the embodiment of the present application is not limited to the application scenario shown in fig. 1.

Fig. 1 is a schematic diagram of a 3D frame labeling system architecture. In fig. 1, the example of obtaining 3D information of surrounding vehicles is taken. The above architecture comprises a processing device 11 and a plurality of cameras, here exemplified by a first camera 12, a second camera 13 and a third camera 14.

It is to be understood that the illustrated structure of the embodiment of the present application does not constitute a specific limitation to the 3D frame labeling architecture. In other possible embodiments of the present application, the foregoing architecture may include more or less components than those shown in the drawings, or combine some components, or split some components, or arrange different components, which may be determined according to practical application scenarios, and is not limited herein. The components shown in fig. 1 may be implemented in hardware, software, or a combination of software and hardware.

In a specific implementation process, the first camera 12, the second camera 13, and the third camera 14 in the embodiment of the present application may respectively capture images of surrounding vehicles. In the above application scenario, after the first camera 12, the second camera 13, and the third camera 14 capture images, the captured images may be sent to the processing device 11. The processing device 11 uses the received image as sample data, and the sample data can be used for training a neural network after being labeled. After the image layer acquires the basic operation of the user, the processing device 11 may directly generate the labeled 3D frame, so as to train the neural network by using the known vehicle and the 3D information thereof in the image.

The 3D frame marking method and the device have the advantages that the 3D frame of the target object is marked directly on the image layer, the 3D frame of the actual 3D frame in the three-dimensional space of the target object projected on the two-dimensional image is marked, namely the 2D frame of the target object is marked on the two-dimensional image containing the target object and an angular point on the 2D frame, the 3D frame of the target object is obtained, the dependence on an additional depth sensor is not needed, the cost is reduced, the processing process is simple, and the workload is reduced.

In addition, the system architecture and the service scenario described in the embodiment of the present application are for more clearly illustrating the technical solution of the embodiment of the present application, and do not constitute a limitation to the technical solution provided in the embodiment of the present application, and it can be known by a person skilled in the art that along with the evolution of the system architecture and the appearance of a new service scenario, the technical solution provided in the embodiment of the present application is also applicable to similar technical problems.

The 3D frame marking method provided by the embodiments of the present application is described in detail below with reference to the accompanying drawings. The execution subject of the method may be the processing device 11 in fig. 1. The workflow of the processing device 11 mainly includes a 2D frame stage and a 3D frame stage. In the 2D frame stage, the processing device 11 acquires a 2D frame of the target object marked on the two-dimensional image including the target object and a corner point on the 2D frame. In the 3D frame stage, the processing device 11 generates a 3D frame of the target object according to the 2D frame and one corner point on the 2D frame, without relying on an additional depth sensor, which reduces the cost, has a simple processing process, and reduces the workload.

The technical solutions of the present application are described below with several embodiments as examples, and the same or similar concepts or processes may not be described in detail in some embodiments.

Fig. 2 is a flowchart illustrating a 3D frame marking method according to an embodiment of the present disclosure, where an execution subject of the embodiment may be the processing device 11 in fig. 1, and a specific execution subject may be determined according to an actual application scenario. As shown in fig. 2, on the basis of the application scenario shown in fig. 1, the 3D frame labeling method provided in the embodiment of the present application includes the following steps:

s201: and acquiring a 2D frame marking operation, and determining a 2D frame of the target object on the two-dimensional image containing the target object according to the 2D frame marking operation.

Here, the target object may be determined according to actual conditions, such as a vehicle, a house, and the like, and this is not particularly limited by the embodiment of the present application.

In the embodiment of the present application, taking a target object as an example of a vehicle, after the processing device 11 obtains a 2D frame marking operation, it determines a 2D frame of the vehicle on a two-dimensional image including the vehicle. For example, the 2D frame marking operation may be performed by a marker, as shown in fig. 3, the 2D frame of the vehicle completely frames the vehicle on a two-dimensional image including the vehicle, and the specific size of the 2D frame may be determined according to actual needs, for example, the size of the 2D frame is as close as possible to the size of the vehicle on the two-dimensional image, which is not particularly limited in the embodiment of the present application.

Further, the processing device 11 may acquire a 2D frame adjustment operation after determining the 2D frame of the target object, and adjust the 2D frame, for example, the size and position of the 2D frame, according to the operation.

S202: and acquiring a corner marking operation, wherein the corner is positioned on one side of the 2D frame, and marking the corner on the 2D frame according to the corner marking operation.

In the embodiment of the present application, a corner point refers to a point on a two-dimensional image, where a certain corner of an actual three-dimensional frame of an object to be labeled is projected. For example, when the object is a vehicle, the actual three-dimensional frame is a rectangular solid, and the marked corner points are the projected points of six corners of the rectangular solid on the two-dimensional image. The corner point is specifically determined on which side of the 2D frame, which may be determined according to actual situations, and this is not particularly limited in this embodiment of the application.

Exemplarily, taking the target object as a vehicle as an example, as shown in fig. 4, the corner points are located on the bottom side of the 2D frame.

S203: and determining and displaying the 3D frame of the target object based on the 2D frame and the corner points.

Here, taking the target object as a vehicle as an example, as shown in fig. 5, a 3D frame of the vehicle is obtained on the basis of the 2D frame and the corner points of the vehicle.

This application embodiment, directly mark the 3D frame at the image aspect, need not to rely on extra depth sensor, and reduce cost, and this method only through mark the 2D frame of target object on the two-dimensional image that contains the target object and an angular point on this 2D frame, just can obtain the 3D frame of target object, and the processing procedure is simple, and reduce work load, in addition, the 3D frame that this application embodiment obtained differs lessly with object actual size, solve the current 3D frame that obtains based on point cloud data marking and the actual size of object and differ great problem, thereby, guarantee follow-up accurate training to neural network, improve neural network's recognition effect.

In addition, in the embodiment of the present application, before the 3D frame of the target object is determined and displayed, the corner point number of the corner point is also obtained. Fig. 6 is a flowchart illustrating another 3D frame marking method according to an embodiment of the present disclosure. As shown in fig. 6, the method includes:

s601: and acquiring a 2D frame marking operation, and determining a 2D frame of the target object on the two-dimensional image containing the target object according to the 2D frame marking operation.

S602: and acquiring a corner marking operation, wherein the corner is positioned on one side of the 2D frame, and marking the corner on the 2D frame according to the corner marking operation.

The steps S601 to S602 are the same as the steps S201 to S202, and are not described herein again.

S603: and acquiring the corner number of the corner, wherein the corner number is used for indicating the position of the corner relative to the target object.

Here, the order of the corner point numbers may not be limited as long as there is a mapping relationship between the corner point and its corner point number.

The mapping relationship may be understood as a correspondence between positions of corner points relative to the object and corner point numbers thereof, for example, four corner points located on the bottom side of the 2D frame of the object may be determined as p0, p1, p2, and p3 from the right position of the back clockwise. If a corner point is located behind and to the left of the bottom edge of the 2D frame of the object, its corner point is numbered p 1.

The processing device 11 may pre-store the mapping relationship, so as to obtain, based on the mapping relationship, a corner number of the corner, where the corner number is used to indicate a position of the corner relative to the target object.

In addition, the corner point number may be input by a user or may be configured in advance, which is not particularly limited in this embodiment of the application.

S604: and determining and displaying the 3D frame of the target object based on the corner point number, the 2D frame and the corner points.

According to the embodiment of the application, before the 3D frame of the target object is determined and displayed, the corner point number of the corner point is further obtained, and then the 3D frame of the target object is accurately determined and displayed based on the corner point number, the 2D frame and the corner point, so that the application requirement is met. For example, when the target object is a vehicle, the vehicle may be set to face straight ahead by default since the vehicle is generally located ahead and travels toward the front; after determining a corner number of a 3D frame of the vehicle, each corner of the 3D frame on the image is located on each side of the labeled 2D frame, so that the 3D frame of the vehicle can be directly generated according to a vanishing point of a three-dimensional object under a 2D view and the determined corner number and mapping relation. The embodiment of the application directly marks the 3D frame at the image level, does not need to rely on an additional depth sensor, and reduces the cost, the method can obtain the 3D frame of the target object only by marking the 2D frame of the target object on the two-dimensional image containing the target object and one corner point on the 2D frame, the processing process is simple, the workload is reduced, and compared with the manual marking of the 3D frame of the target object, the method can meet the geometric relation constraint of the projection of the actual three-dimensional frame on the two-dimensional image, and the error or the error caused by the manual marking is avoided, so that the accuracy and the usability of marking data are improved. The 3D frame that this application embodiment obtained differs less with object actual size, solves the current 3D frame that obtains based on the point cloud data mark and the problem that object's actual size differs great to, guarantee follow-up accurate training to neural network, improve neural network's recognition effect.

In addition, in other embodiments, the present application further obtains an orientation angle of the target object in the 3D space before determining and displaying the 3D frame of the target object. Fig. 7 is a flowchart illustrating a 3D frame labeling method according to an embodiment of the present application. As shown in fig. 7, the method includes:

s701: and acquiring a 2D frame marking operation, and determining a 2D frame of the target object on the two-dimensional image containing the target object according to the 2D frame marking operation.

S702: and acquiring a corner marking operation, wherein the corner is positioned on one side of the 2D frame, and marking the corner on the 2D frame according to the corner marking operation.

The steps S701 to S702 are the same as the steps S201 to S202, and are not described herein again.

S703: and acquiring an orientation angle of the target object in the 3D space, wherein the orientation angle is used for indicating the orientation of the target object.

The orientation angle may be input by a user or preset, which is not particularly limited in the embodiment of the present application, for example, taking the target object as a vehicle, the user may not input the orientation angle, and the default orientation angle is 0 degree, where the target object faces the front of the vehicle. Of course the user may input other angles of orientation. For example, when the target object is a vehicle, when the vehicle is traveling straight ahead, then a default orientation angle may be selected; when some vehicles running obliquely or reversely occur, the actual orientation angle can be input according to the actual situation.

S704: and determining and displaying the 3D frame of the target object based on the orientation angle, the 2D frame and the corner point.

According to the embodiment of the application, before the 3D frame of the target object is determined and displayed, the orientation angle of the target object in the 3D space is further acquired, and then the 3D frame of the target object is determined and displayed based on the orientation angle, the 2D frame and the corner points, so that the acquired 3D frame is more consistent with the reality. Specifically, when the orientation angle is set to be default, for example, 0 degree, it indicates that the orientation of the target object is right ahead, and the vanishing point corresponding to the target object is located in the center of the image, so that the positions of other corner points of the target object can be determined according to the vanishing point to complete the generation of the 3D frame; on the other hand, when the orientation angle is the other numerical value input by the user, it indicates that the target object is oriented in the other direction, and the vanishing point is located at the other position at this time. This application embodiment directly marks the 3D frame at the image aspect, need not to rely on extra depth sensor, reduce cost, this method only through the 2D frame that marks the target object on the two-dimensional image that contains the target object and an angular point on this 2D frame, just can obtain the 3D frame of target object, the processing procedure is simple, reduce work load, in addition, the 3D frame that this application embodiment obtained differs less with object actual size, solve the current 3D frame that obtains based on point cloud data mark and the actual size of object and differ great problem, thereby, guarantee follow-up accurate training to neural network, improve neural network's recognition effect.

In addition, after the 3D frame of the target object is determined and displayed, the 2D frame may be adjusted. Fig. 8 is a flowchart illustrating another 3D frame marking method according to an embodiment of the present disclosure. As shown in fig. 8, the method includes:

s801: and acquiring a 2D frame marking operation, and determining a 2D frame of the target object on the two-dimensional image containing the target object according to the 2D frame marking operation.

S802: and acquiring a corner marking operation, wherein the corner is positioned on one side of the 2D frame, and marking the corner on the 2D frame according to the corner marking operation.

S803: and determining and displaying the 3D frame of the target object based on the 2D frame and the corner points.

The steps S801 to S803 are the same as the implementation of the steps S201 to S203, and are not described herein again.

S804: the 2D frame marking operation includes at least one of a frame selection operation, a movement operation, and a rotation operation, and the 2D frame is adjusted according to the 2D frame marking operation.

The 2D frame marking operation may include other operations besides the above operations, which is not particularly limited in this embodiment of the application.

According to the embodiment of the application, after the 3D frame of the target object is determined and displayed, the 2D frame can be adjusted, so that a new 3D frame can be generated based on the adjusted 2D frame, and various application requirements are met. In addition, the method directly marks the 3D frame on the image level, does not need to rely on an additional depth sensor, reduces the cost, can obtain the 3D frame of the target object only by marking the 2D frame of the target object and the angular point on the 2D frame on the two-dimensional image containing the target object, has simple processing process and reduces the workload, in addition, the 3D frame obtained by the embodiment of the application has smaller difference with the actual size of the object, solves the problem that the difference between the 3D frame obtained by the current point cloud data marking and the actual size of the object is larger, thereby ensuring the follow-up accurate training of the neural network and improving the recognition effect of the neural network.

Fig. 9 is a flowchart illustrating a further 3D frame marking method according to an embodiment of the present application, where an execution subject of the embodiment may be the processing device 11 in fig. 1, and a specific execution subject may be determined according to an actual application scenario. As shown in fig. 9, on the basis of the application scenario shown in fig. 1, the 3D frame labeling method provided in the embodiment of the present application includes the following steps:

s901: a 2D frame of the target object is determined on a two-dimensional image containing the target object.

Here, after determining the 2D frame of the target object, the processing device 11 may acquire a 2D frame adjustment operation, and adjust the 2D frame, for example, the size, the position, and the like of the 2D frame according to the operation.

S902: and acquiring a corner point of the label, wherein the corner point is positioned on one edge of the 2D frame.

In a possible implementation, the corner points are located on the bottom side of the 2D frame.

S903: and determining the 3D frame of the target object based on the 2D frame and the corner points.

In addition, in the embodiment of the present application, before the 3D frame of the target object is determined, the corner point number of the corner point is also obtained. Fig. 10 is a flowchart illustrating a 3D frame marking method according to an embodiment of the present application. As shown in fig. 10, the method includes:

s1001: a 2D frame of the target object is determined on a two-dimensional image containing the target object.

S1002: and acquiring a corner point of the label, wherein the corner point is positioned on one edge of the 2D frame.

The steps S1001 to S1002 are the same as the steps S901 to S902, and are not described herein again.

S1003: and acquiring the corner number of the corner, wherein the corner number is used for indicating the position of the corner relative to the target object.

The corner point number may be input by a user or may be pre-configured.

S1004: and determining the 3D frame of the target object based on the corner point number, the 2D frame and the corner points.

In a possible implementation manner, the processing device 11 may determine a corresponding relationship between the 2D frame and the 3D frame based on the corner point number and the corner point, and further determine the 3D frame according to the corresponding relationship and the 2D frame.

For example, taking the target object as a vehicle, the correspondence between the 2D frame and the 3D frame is, as shown in fig. 11, front represents front, and rear represents rear.

In this embodiment of the application, the determining the correspondence between the 2D frame and the 3D frame may include:

acquiring a corresponding rule of corner points of a prestored 2D frame and a prestored 3D frame of the object;

and determining the corresponding relation according to the corresponding rule, the corner point number and the corner point.

The correspondence rule may be set according to an actual situation, and is not particularly limited in the embodiment of the present application.

According to the embodiment of the application, before the 3D frame of the target object is determined, the corner point number of the corner point is further obtained, and then the 3D frame of the target object is accurately determined based on the corner point number, the 2D frame and the corner point, so that the application requirement is met. In addition, the method directly marks the 3D frame on the image level, does not need to rely on an additional depth sensor, reduces the cost, can obtain the 3D frame of the target object only by marking the 2D frame of the target object and the angular point on the 2D frame on the two-dimensional image containing the target object, has simple processing process and reduces the workload, in addition, the 3D frame obtained by the embodiment of the application has smaller difference with the actual size of the object, solves the problem that the difference between the 3D frame obtained by the current point cloud data marking and the actual size of the object is larger, thereby ensuring the follow-up accurate training of the neural network and improving the recognition effect of the neural network.

In addition, in the embodiment of the present application, before determining the 3D frame of the target object, the orientation angle of the target object in the 3D space is further obtained. Fig. 12 is a flowchart illustrating a 3D frame labeling method according to an embodiment of the present application. As shown in fig. 12, the method includes:

s1201: a 2D frame of the target object is determined on a two-dimensional image containing the target object.

S1202: and acquiring a corner point of the label, wherein the corner point is positioned on one edge of the 2D frame.

The steps S1201 to S1202 are the same as the steps S901 to S902, and are not described herein again.

S1203: and acquiring an orientation angle of the target object in the 3D space, wherein the orientation angle is used for indicating the orientation of the target object.

The orientation angle may be input by a user or may be configured in advance.

S1204: and determining the 3D frame of the target object based on the orientation angle, the 2D frame and the angular point.

In a possible implementation manner, the processing device 11 may determine three vanishing points corresponding to the 3D frame according to the orientation angle, and further determine the 3D frame according to the three vanishing points, the 2D frame and the corner point. Exemplarily, the 3D frame is determined according to the three vanishing points, the 2D frame and the corner points in combination with the parallel relationship of the sides of the cuboid.

Wherein, the determining three vanishing points corresponding to the 3D frame according to the orientation angle may include:

acquiring a projection matrix of an image acquisition device corresponding to the two-dimensional image;

and determining the three vanishing points according to the projection matrix and the projection matrix of the orientation angle.

Illustratively, the three vanishing points are determined by multiplying the projection matrix and the projection matrix of the orientation angle according to the matrix multiplication result, for example, as shown in fig. 11, three vanishing points vp0, vp1, and vp 2.

In addition, before determining the 3D frame according to the three vanishing points, the 2D frame, and the corner points, the method further includes:

and acquiring the corner number of the corner, wherein the corner number is used for indicating the position of the corner relative to the target object.

Correspondingly, the determining the 3D frame according to the three vanishing points, the 2D frame and the corner point includes:

and determining the 3D frame according to the corner point number, the three vanishing points, the 2D frame and the corner point.

For example, the processing device 11 may determine a corresponding relationship between the 2D frame and the 3D frame based on the corner number and the corner, and further, in combination with a parallel relationship of sides of the rectangular parallelepiped, solve 8 corner positions of the projection of the 3D frame, such as corners p0, p1, p2, p3, p4, p5, p6, and p7 in fig. 11, according to the corresponding relationship, the three vanishing points, and the 2D frame, thereby determining the 3D frame.

Here, the embodiment of the present application combines the intrinsic geometric relationship of the cuboid with the projection model of the image acquisition device, so that it can be ensured that the pseudo 3D frame obtained by labeling satisfies the intrinsic geometric relationship of the cuboid, and the labeling precision and the labeling consistency are higher.

According to the embodiment of the application, before the 3D frame of the target object is determined and displayed, the orientation angle of the target object in the 3D space is further acquired, and then the 3D frame of the target object is determined and displayed based on the orientation angle, the 2D frame and the corner points, so that the acquired 3D frame is more consistent with the reality. In addition, the method directly marks the 3D frame on the image level, does not need to rely on an additional depth sensor, reduces the cost, can obtain the 3D frame of the target object only by marking the 2D frame of the target object and the angular point on the 2D frame on the two-dimensional image containing the target object, has simple processing process and reduces the workload, in addition, the 3D frame obtained by the embodiment of the application has smaller difference with the actual size of the object, solves the problem that the difference between the 3D frame obtained by the current point cloud data marking and the actual size of the object is larger, thereby ensuring the follow-up accurate training of the neural network and improving the recognition effect of the neural network.

Fig. 13 is a schematic structural diagram of a 3D frame marking apparatus according to an embodiment of the present application, which corresponds to the 3D frame marking method according to the foregoing embodiment. For convenience of explanation, only portions related to the embodiments of the present application are shown. Fig. 13 is a schematic structural diagram of a 3D frame marking apparatus according to an embodiment of the present application, where the 3D frame marking apparatus 1300 includes: a first obtaining module 1301, a second obtaining module 1302, and a display module 1303. The 3D frame marking device may be the processing device 11 itself, or a chip or an integrated circuit that realizes the functions of the processing device 11. It should be noted that the division of the first obtaining module, the second obtaining module and the display module is only a division of logical functions, and the two may be integrated or independent physically.

The first obtaining module 1301 is configured to obtain a 2D frame annotation operation, and determine a 2D frame of a target object on a two-dimensional image including the target object according to the 2D frame annotation operation.

A second obtaining module 1302, configured to obtain a corner point labeling operation, where the corner point is located on one side of the 2D frame, and label the corner point on the 2D frame according to the corner point labeling operation;

And a display module 1303, configured to determine and display a 3D frame of the target object based on the 2D frame and the corner.

In a possible implementation manner, before the display module 1303 determines and displays the 3D frame of the target object, it is further configured to:

In a possible implementation manner, the display module 1303 is specifically configured to:

and determining and displaying the 3D frame based on the corner point number, the 2D frame and the corner point.

acquiring an orientation angle of the target object in a 3D space, wherein the orientation angle is used for indicating the orientation of the target object.

and determining and displaying the 3D frame based on the orientation angle, the 2D frame and the corner point.

In one possible implementation, the corner points are located on a bottom side of the 2D frame.

In one possible implementation, the 2D frame marking operation includes at least one of a frame selection operation, a movement operation, and a rotation operation.

In one possible implementation manner, after the display module 1303 determines and displays the 3D frame of the target object, the display module is further configured to:

and adjusting the 2D frame according to the 2D frame marking operation.

In a possible implementation, the corner point number is user input or pre-configured.

In one possible implementation, the orientation angle is user input or pre-configured.

The apparatus provided in the embodiment of the present application may be configured to implement the technical solution of the method embodiment, and the implementation principle and the technical effect are similar, which are not described herein again in the embodiment of the present application.

Fig. 14 is a schematic structural diagram of another 3D frame marking apparatus according to an embodiment of the present application, where the 3D frame marking apparatus 1400 includes: a first determining module 1401, a third obtaining module 1402 and a second determining module 1403. The 3D frame marking device may be the processing device 11 itself, or a chip or an integrated circuit that realizes the functions of the processing device 11. It should be noted here that the division of the first determining module, the third obtaining module, and the second determining module is only a division of a logic function, and the two may be integrated or independent physically.

The first determining module 1401 is configured to determine a 2D frame of a target object on a two-dimensional image containing the target object.

A third obtaining module 1402, configured to obtain an angle point of a label, where the angle point is located on one side of the 2D frame;

a second determining module 1403, configured to determine the 3D frame of the target object based on the 2D frame and the corner.

In one possible implementation, before the second determining module 1403 determines the 3D frame of the target object, it is further configured to:

In a possible implementation manner, the second determining module 1403 is specifically configured to:

and determining the 3D frame based on the corner point number, the 2D frame and the corner point.

In one possible implementation, the determining the 3D frame by the second determining module 1403 based on the corner number, the 2D frame, and the corner includes:

determining the corresponding relation between the 2D frame and the 3D frame based on the corner point number and the corner point;

and determining the 3D frame according to the corresponding relation and the 2D frame.

In a possible implementation manner, the determining, by the second determining module 1403, a correspondence between the 2D frame and the 3D frame includes:

and determining the corresponding relation according to the corresponding rule, the corner number and the corner.

determining the 3D box based on the orientation angle, the 2D box and the corner point.

In one possible implementation, the second determining module 1403 determines the 3D frame based on the orientation angle, the 2D frame, and the corner point, including:

determining three vanishing points corresponding to the 3D frame according to the orientation angle;

and determining the 3D frame according to the three vanishing points, the 2D frame and the corner points.

In a possible implementation manner, the determining, by the second determining module 1403, three vanishing points corresponding to the 3D frame according to the orientation angle includes:

In a possible implementation manner, before the second determining module 1403 determines the 3D frame according to the three vanishing points, the 2D frame, and the corner point, the second determining module is further configured to:

acquiring corner numbers of the corners, wherein the corner numbers are used for indicating the positions of the corners relative to the target object;

determining the 3D frame according to the three vanishing points, the 2D frame and the corner points comprises:

Alternatively, fig. 15 schematically provides a possible basic hardware architecture of the 3D frame labeling apparatus described in the present application.

Referring to fig. 15, a 3D frame labeling apparatus 1500 includes at least one processor 1501 and a memory 1502. Further optionally, a communication interface 1503 and bus 1504 may also be included.

The 3D frame labeling device 1500 may be a computer or a server, which is not particularly limited in this application. In the 3D frame labeling apparatus 1500, the number of the processors 1501 may be one or more, and fig. 15 illustrates only one of the processors 1501. Alternatively, the processor 1501 may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), or a Digital Signal Processor (DSP). If the 3D frame marking device 1500 has a plurality of processors 1501, the plurality of processors 1501 may be different in type or may be the same. Alternatively, the plurality of processors 1501 of the 3D frame labeling device 1500 may also be integrated as a multi-core processor.

The memory 1502 stores computer instructions and data; the memory 1502 may store computer instructions and data required to implement the above-described 3D frame annotation methods provided herein, e.g., the memory 1502 stores instructions for implementing the steps of the above-described 3D frame annotation methods. The memory 1502 may be any one or any combination of the following storage media: nonvolatile memory (e.g., Read Only Memory (ROM), Solid State Disk (SSD), hard disk (HDD), optical disk), volatile memory.

Communication interface 1503 may provide for the input/output of information to/from the at least one processor. Any one or any combination of the following devices may also be included: a network interface (e.g., an ethernet interface), a wireless network card, etc. having a network access function.

Optionally, communication interface 1503 may also be used for data communication of 3D frame marking device 1500 with other computing devices or terminals.

Further alternatively, fig. 15 illustrates bus 1504 with a thick line. A bus 1504 may connect the processor 1501 with the memory 1502 and the communication interface 1503. Thus, via bus 1504, processor 1501 may access memory 1502 and may also interact with other computing devices or terminals using communication interface 1503.

In this application, the processor 1501 executes the computer instructions in the memory 1502 to perform the following steps:

In one possible implementation, before the determining and displaying the 3D frame of the target object, the processor 1501 when executing the computer instructions further implements the following steps:

In one possible implementation, the determining and displaying the 3D frame of the target object includes:

In one possible implementation, after the determining and displaying the 3D frame of the target object, the processor 1501 when executing the computer instructions further implements the following steps:

and adjusting the 2D frame according to the 2D frame marking operation.

In addition, from the viewpoint of logical function division, for example, as shown in fig. 15, a first obtaining module 1301, a second obtaining module 1302, and a display module 1303 may be included in the memory 1502. The inclusion herein merely refers to that the instructions stored in the memory may implement the functions of the first obtaining module 1301, the second obtaining module 1302 and the display module 1303, respectively, when executed, and is not limited to a physical structure.

The 3D frame marking device may be implemented by software as shown in fig. 15, or may be implemented by hardware as a hardware module or a circuit unit.

Alternatively, fig. 16 schematically provides another possible basic hardware architecture of the 3D frame labeling apparatus described in the present application.

Referring to fig. 16, the 3D frame labeling apparatus 1600 includes at least one processor 1601 and a memory 1602. Further optionally, a communication interface 1603 and bus 1604 may also be included.

The 3D frame labeling device 1600 may be a computer or a server, which is not particularly limited in this application. In the 3D frame labeling apparatus 1600, the number of the processors 1601 may be one or more, and fig. 16 illustrates only one of the processors 1601. Alternatively, the processor 1601 may be a Central Processing Unit (CPU), a Graphics Processing Unit (GPU) or a Digital Signal Processor (DSP). If the 3D frame labeling apparatus 1600 has a plurality of processors 1601, the types of the plurality of processors 1601 may be different, or may be the same. Alternatively, the plurality of processors 1601 of the 3D frame labeling apparatus 1600 may also be integrated into a multi-core processor.

Memory 1602 stores computer instructions and data; the memory 1602 may store computer instructions and data required to implement the management method of the parallel execution unit provided herein, for example, the memory 1602 stores instructions for implementing the steps of the management method of the parallel execution unit. Memory 1602 may be any one or any combination of the following storage media: nonvolatile memory (e.g., Read Only Memory (ROM), Solid State Disk (SSD), hard disk (HDD), optical disk), volatile memory.

Communication interface 1603 may provide information input/output for the at least one processor. Any one or any combination of the following devices may also be included: a network interface (e.g., an ethernet interface), a wireless network card, etc. having a network access function.

Optionally, the communication interface 1603 may also be used for data communication of the 3D frame labeling apparatus 1600 with other computing devices or terminals.

Further alternatively, FIG. 16 shows bus 1604 as a thick line. The bus 1604 may connect the processor 1601 with the memory 1602 and the communication interface 1603. Thus, via the bus 1604, the processor 1601 can access the memory 1602, and can also interact with other computing devices or terminals using the communication interface 1603.

In the present application, the processor 1601 executes computer instructions in the memory 1602 to perform the following steps:

In one possible implementation, before the determining the 3D frame of the target object, the processor 1601 executes the computer instructions to further implement the following steps:

In one possible implementation, the determining the 3D frame of the target object includes:

In one possible implementation, the determining the 3D frame based on the corner number, the 2D frame, and the corner includes:

In a possible implementation manner, the determining the correspondence between the 2D frame and the 3D frame includes:

In one possible implementation, the determining the 3D frame based on the orientation angle, the 2D frame, and the corner point includes:

In a possible implementation manner, the determining three vanishing points corresponding to the 3D frame according to the orientation angle includes:

In one possible implementation, before determining the 3D frame according to the three vanishing points, the 2D frame and the corner point, the processor 1601 executes the computer instructions to further implement the following steps:

In addition, from the viewpoint of logical function division, exemplarily, as shown in fig. 16, a first determining module 1401, a third obtaining module 1402, and a second determining module 1403 may be included in the memory 1602. The inclusion herein merely refers to that instructions stored in the memory may, when executed, implement the functions of the first determining module 1401, the third obtaining module 1402 and the second determining module 1403, respectively, and is not limited to a physical structure.

The 3D frame marking device may be implemented by software as shown in fig. 16, or may be implemented by hardware as a hardware module or a circuit unit.

In addition, an embodiment of the present application provides a neural network training method, including: the training of the neural network is performed by using the 3D frame of the target object determined by the 3D frame labeling method and the two-dimensional image including the target object.

The present application provides a computer-readable storage medium comprising instructions which, when run on a computer, cause the computer to perform the above-described 3D frame labeling method.

The present application provides a computer program product, characterized in that the computer program product comprises instructions which, when run on a computer, cause the computer to perform the above-mentioned 3D frame annotation method.

The application provides a movable platform, movable platform can be smart machine or transport means, for example unmanned aerial vehicle, unmanned vehicle or robot etc. contain above-mentioned 3D frame mark equipment on it.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one logical division, and other divisions may be realized in practice, for example, a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Claims

1. A3D frame labeling method, comprising:

2. The method of claim 1, further comprising, prior to said determining and displaying a 3D frame of said target object:

3. The method of claim 2, wherein the determining and displaying the 3D frame of the target object comprises:

4. The method of claim 1, further comprising, prior to said determining and displaying a 3D frame of said target object:

5. The method of claim 4, wherein the determining and displaying the 3D frame of the target object comprises:

6. The method according to any of claims 1 to 5, wherein the corner points are located on the bottom side of the 2D frame.

7. The method of any of claims 1 to 6, wherein the 2D box marking operation comprises at least one of a box operation, a move operation, and a rotate operation.

8. The method of claim 7, further comprising, after said determining and displaying the 3D frame of the target object:

and adjusting the 2D frame according to the 2D frame marking operation.

9. A method according to claim 2 or 3, characterized in that the corner point number is user input or pre-configured.

10. The method of claim 4 or 5, wherein the orientation angle is user input or pre-configured.

11. A3D frame labeling method, comprising:

12. The method of claim 11, prior to said determining the 3D frame of the target object, further comprising:

13. The method of claim 12, wherein the determining the 3D frame of the target object comprises:

14. The method according to claim 13, wherein the determining the 3D box based on the corner number, the 2D box and the corner comprises:

15. The method of claim 14, wherein the determining the correspondence between the 2D frame and the 3D frame comprises:

16. The method of claim 11, prior to said determining the 3D frame of the target object, further comprising:

17. The method of claim 16, wherein the determining the 3D frame of the target object comprises:

18. The method of claim 17, wherein determining the 3D box based on the orientation angle, the 2D box, and the corner point comprises:

19. The method of claim 18, wherein determining three vanishing points for the 3D box based on the orientation angle comprises:

20. The method according to claim 18 or 19, wherein before said determining the 3D frame from the three vanishing points, the 2D frame and the corner point, further comprises:

21. The method according to any of the claims 11 to 20, wherein the corner points are located on the bottom side of the 2D frame.

22. Method according to any of the claims 12-15, wherein the corner point number is user input or pre-configured.

23. The method of any of claims 16 to 20, wherein the orientation angle is user input or pre-configured.

24. A 3D frame labeling apparatus comprising a memory, a processor, and computer instructions stored in the memory and executable on the processor, the processor when executing the computer instructions performing the steps of:

25. The apparatus of claim 24, wherein prior to said determining and displaying the 3D frame of the target object, the processor when executing the computer instructions further performs the steps of:

26. The apparatus of claim 25, wherein the determining and displaying the 3D frame of the target object comprises:

27. The apparatus of claim 24, wherein prior to said determining and displaying the 3D frame of the target object, the processor when executing the computer instructions further performs the steps of:

28. The apparatus of claim 27, wherein the determining and displaying the 3D frame of the target object comprises:

29. The apparatus according to any of the claims 24 to 28, wherein the corner points are located on the bottom side of the 2D frame.

30. The apparatus of any of claims 24 to 29, wherein the 2D box marking operation comprises at least one of a box selection operation, a movement operation, and a rotation operation.

31. The apparatus of claim 30, wherein after said determining and displaying the 3D frame of the target object, the processor when executing the computer instructions further performs the steps of:

and adjusting the 2D frame according to the 2D frame marking operation.

32. An arrangement according to claim 25 or 26, characterized in that said corner point number is user input or pre-configured.

33. The apparatus of claim 27 or 28, wherein the orientation angle is user input or pre-configured.

34. A 3D frame labeling apparatus comprising a memory, a processor, and computer instructions stored in the memory and executable on the processor, the processor when executing the computer instructions performing the steps of:

35. The apparatus of claim 34, wherein prior to said determining the 3D frame of the target object, the processor when executing the computer instructions further performs the steps of:

36. The apparatus of claim 35, wherein the determining the 3D frame of the target object comprises:

37. The apparatus of claim 36, wherein the determining the 3D box based on the corner number, the 2D box, and the corner comprises:

38. The apparatus of claim 37, wherein the determining the correspondence between the 2D frame and the 3D frame comprises:

39. The apparatus of claim 34, wherein prior to said determining the 3D frame of the target object, the processor when executing the computer instructions further performs the steps of:

40. The apparatus of claim 39, wherein the determining the 3D frame of the target object comprises:

41. The apparatus of claim 40, wherein the determining the 3D box based on the orientation angle, the 2D box, and the corner point comprises:

42. The apparatus of claim 41, wherein the determining three vanishing points for the 3D box according to the orientation angle comprises:

43. The apparatus according to claim 41 or 42, wherein before said determining the 3D box from the three vanishing points, the 2D box and the corner point, the processor when executing the computer instructions further performs the steps of:

44. The apparatus according to any of the claims 34 to 43, wherein the corner points are located on the bottom side of the 2D frame.

45. An arrangement according to any of claims 35-38, characterized in that said corner point number is user input or pre-configured.

46. The apparatus of any of claims 39 to 43, wherein the orientation angle is user input or pre-configured.

47. A neural network training method, comprising:

training of a neural network is performed using the 3D frame of the target object determined by the 3D frame labeling method of any one of claims 1 to 10, and a two-dimensional image containing the target object.

48. A neural network training method, comprising:

training of a neural network is performed using the 3D frame of the target object determined by the 3D frame labeling method of any one of claims 11 to 23, and a two-dimensional image containing the target object.

49. A computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the 3D frame annotation method of any one of claims 1 to 10.

50. A computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the 3D frame annotation method of any one of claims 11 to 23.