WO2024001804A1

WO2024001804A1 - Three-dimensional object detection method, computer device, storage medium, and vehicle

Info

Publication number: WO2024001804A1
Application number: PCT/CN2023/100354
Authority: WO
Inventors: 李林; 翟玉强
Original assignee: 安徽蔚来智驾科技有限公司
Priority date: 2022-06-28
Filing date: 2023-06-15
Publication date: 2024-01-04
Also published as: CN115205846A

Abstract

The present invention relates to the technical field of visual inspection. Specifically provided are a three-dimensional object detection method, a computer device, a storage medium, and a vehicle, which aim to solve the problem of improving the accuracy of object detection. To this end, the method in the present invention comprises: performing object detection on a two-dimensional image by means of a three-dimensional object detection model, so as to acquire three-dimensional information of an object to be detected in the two-dimensional image. The three-dimensional object detection model is obtained by means of performing training in the following manner: performing object detection on a two-dimensional image sample by means of a three-dimensional object detection model to be trained, so as to acquire two-dimensional detection information and three-dimensional prediction information of an object sample in the two-dimensional image sample; projecting the three-dimensional prediction information, so as to obtain two-dimensional projection information; and according to the two-dimensional detection information and the two-dimensional projection information and by using a two-dimensional information consistency loss function, performing model training on the three-dimensional object detection model to be trained, so as to obtain a trained three-dimensional object detection model. In this way, both a close-range object and a long-range object can be detected.

Description

Three-dimensional target detection methods, computer equipment, storage media and vehicles

This application claims priority to Chinese patent application CN115205846A, which was submitted on June 28, 2022 and has an invention title of "Three-dimensional target detection method, computer equipment, storage medium and vehicle". The entire content of the above Chinese patent application is incorporated by reference into this document. Applying.

Technical field

The invention relates to the field of visual detection technology, and specifically provides a three-dimensional target detection method, computer equipment, storage media and vehicles.

Background technique

In order to improve the accuracy of 3D target detection on 2D images, the joint calibration method of lidar and camera is usually used to obtain 3D information such as the position of the 3D target, and this information is used as the label of the 2D image sample containing the above 3D target. Then, the two-dimensional image samples and their labels are used to perform model training on the three-dimensional target detection model, and the trained three-dimensional target detection model is used to perform three-dimensional target detection on the two-dimensional image. However, the detection range of lidar is usually relatively small, and it can only obtain three-dimensional information such as the location of close targets. As a result, the above method can only accurately detect three-dimensional targets at close range, but cannot accurately detect three-dimensional targets at long distances. .

Accordingly, a new technical solution is needed in this field to solve the above problems.

Contents of the invention

In order to overcome the above-mentioned defects, the present invention is proposed to provide a three-dimensional target detection method, computer equipment, and storage that solve or at least partially solve the technical problem of accurately detecting short-range and long-range three-dimensional targets simultaneously to improve the accuracy of target detection. Media and vehicles.

In a first aspect, a three-dimensional target detection method is provided. The method includes:

Perform target detection on a two-dimensional image through a three-dimensional target detection model, and obtain three-dimensional information of the target to be detected in the two-dimensional image;

Wherein, the three-dimensional target detection model is trained in the following way:

Object detection is performed on two-dimensional image samples through the three-dimensional object detection model to be trained. Measure and obtain the two-dimensional detection information and three-dimensional prediction information of the target sample in the two-dimensional image sample;

Project the three-dimensional prediction information to obtain two-dimensional projection information;

According to the two-dimensional detection information and the two-dimensional projection information, the two-dimensional information consistency loss function is used to perform model training on the three-dimensional target detection model to be trained, and a trained three-dimensional target detection model is obtained.

In one technical solution of the above three-dimensional target detection method, "according to the two-dimensional detection information and the two-dimensional projection information, the two-dimensional information consistency loss function is used to perform model training on the three-dimensional target detection model to be trained, and we obtain The steps of "trained 3D target detection model" specifically include:

Determine whether each of the target samples has three-dimensional actual information according to the sample labels of the two-dimensional image samples;

If the current target sample has three-dimensional actual information, then based on the three-dimensional actual information of the current target sample and the three-dimensional prediction information, the three-dimensional information consistency loss function is used to perform model training on the three-dimensional target detection model to be trained;

If the current target sample does not have three-dimensional actual information, then based on the two-dimensional detection information of the current target sample and the two-dimensional projection information, a two-dimensional information consistency loss function is used to perform model training on the three-dimensional target detection model to be trained.

In a technical solution of the above three-dimensional target detection method, in "according to the two-dimensional detection information and the two-dimensional projection information, the two-dimensional information consistency loss function is used to perform model training on the three-dimensional target detection model to be trained, After the step of "obtaining the trained three-dimensional target detection model", the method further includes training the trained three-dimensional target detection model in the following manner to modify the trained three-dimensional target detection model:

Determine whether the sample label of the two-dimensional image sample contains the three-dimensional actual information of the target sample;

If included, then use the three-dimensional information consistency loss function to perform model training on the trained three-dimensional target detection model based on the three-dimensional actual information and the three-dimensional predicted information to obtain the final three-dimensional target detection model;

If it is not included, the trained three-dimensional target detection model will not be trained.

In one technical solution of the above three-dimensional target detection method, "target detection is performed on two-dimensional image samples through the three-dimensional target detection model to be trained, and the two-dimensional detection information and three-dimensional prediction information of the target sample in the two-dimensional image sample are obtained." The specific steps include:

Target two-dimensional image samples through the three-dimensional target detection model to be trained. Target detection is performed to obtain the two-dimensional detection frame of the target sample;

According to the two-dimensional detection information and the three-dimensional prediction information of the two-dimensional detection frame, the two-dimensional detection information and the three-dimensional prediction information of the target sample are respectively determined.

In a technical solution of the above three-dimensional target detection method, the method further includes:

Using a square loss function to establish the two-dimensional information consistency loss function;

And/or, a square loss function is used to establish the three-dimensional information consistency loss function.

In a technical solution of the above three-dimensional target detection method, the three-dimensional predicted information and the three-dimensional actual information include at least the three-dimensional coordinates, size and direction angle of the target sample.

In one technical solution of the above three-dimensional target detection method, the method further includes acquiring the two-dimensional image sample through a monocular camera.

In a second aspect, a computer device is provided. The computer device includes a processor and a storage device. The storage device is adapted to store a plurality of program codes. The program codes are adapted to be loaded and run by the processor to execute the above. The three-dimensional target detection method described in any of the technical solutions of the three-dimensional target detection method.

In a third aspect, a computer-readable storage medium is provided, which stores a plurality of program codes, and the program codes are suitable for being loaded and run by a processor to execute the technical solution of the above three-dimensional target detection method. The three-dimensional target detection method described in any of the technical solutions.

In a fourth aspect, a vehicle is provided, which vehicle includes the computer device described in the above computer device technical solution.

One or more of the above technical solutions of the present invention have at least one or more of the following beneficial effects:

In the technical solution for implementing the present invention, target detection can be performed on a two-dimensional image through a three-dimensional target detection model, and the three-dimensional information of the target to be detected in the two-dimensional image can be obtained. The three-dimensional target detection model is trained in the following ways: perform target detection on two-dimensional image samples through the three-dimensional target detection model to be trained, and obtain the two-dimensional detection information and three-dimensional prediction information of the target sample in the two-dimensional image sample; project the three-dimensional prediction information , obtain the two-dimensional projection information; according to the two-dimensional detection information and the two-dimensional projection information, use the two-dimensional information consistency loss function to perform model training on the three-dimensional target detection model to be trained, and obtain the trained three-dimensional target detection model.

Through the above implementation, even if the actual three-dimensional information of the target sample in the two-dimensional image sample cannot be obtained, the three-dimensional target detection model can be trained by geometrically constraining the two-dimensional detection information and the two-dimensional projection information of the target sample, so that the training is good The three-dimensional target detection model can accurately detect the three-dimensional information of the target from the two-dimensional image, overcoming the inability to obtain the actual three-dimensional information of the long-distance target for model training in the existing technology, resulting in the inability to detect the long-distance three-dimensional target. Accurately detect defects.

Description of drawings

The disclosure of the present invention will become more understandable with reference to the accompanying drawings. Those skilled in the art can easily understand that these drawings are for illustrative purposes only and are not intended to limit the scope of the present invention. in:

Figure 1 is a schematic flowchart of the main steps of a method for obtaining a three-dimensional target detection model according to an embodiment of the present invention;

Figure 2 is a schematic flowchart of the main steps of a method for model training of a three-dimensional target detection model to be trained according to an embodiment of the present invention;

Figure 3 is a schematic flowchart of the main steps of a method for model training of a three-dimensional target detection model to be trained according to another embodiment of the present invention.

Detailed ways

Some embodiments of the invention are described below with reference to the accompanying drawings. Those skilled in the art should understand that these embodiments are only used to explain the technical principles of the present invention and are not intended to limit the scope of the present invention.

In the description of the present invention, "processor" may include hardware, software, or a combination of both. The processor may be a central processing unit, a microprocessor, an image processor, a digital signal processor, or any other suitable processor. The processor has data and/or signal processing functions. The processor can be implemented in software, hardware, or a combination of both. Non-transitory computer-readable storage media include any suitable media that can store program code, such as magnetic disks, hard disks, optical disks, flash memory, read-only memory, random access memory, etc.

In an embodiment of a three-dimensional target detection method according to the present invention, the three-dimensional target detection method can perform target detection on a two-dimensional image through a three-dimensional target detection model, and obtain three-dimensional information of the target to be detected in the two-dimensional image. The two-dimensional image may be an image obtained through a monocular camera, that is, the two-dimensional image is a monocular image. According to the embodiment of the present invention, three-dimensional target detection By using the detection model to perform target detection on a monocular image, the three-dimensional information of the target to be detected in the monocular image can be obtained. The three-dimensional information of the target to be detected at least includes the three-dimensional coordinates, size and direction cosine of the target to be detected.

The three-dimensional target detection model can be a network model built using neural networks to detect the three-dimensional information of the target from the two-dimensional image. Referring to Figure 1, in the embodiment of the present invention, after the initial three-dimensional target detection model (the three-dimensional target detection model to be trained) is constructed, the above three-dimensional target detection model to be trained can be model trained through the following steps S101 to S103. In order to use the trained three-dimensional target detection model to detect the target in the two-dimensional image, and obtain the three-dimensional information of the target to be detected in the two-dimensional image.

Step S101: Perform target detection on the two-dimensional image sample through the three-dimensional target detection model to be trained, and obtain the two-dimensional detection information and three-dimensional prediction information of the target sample in the two-dimensional image sample.

The two-dimensional image sample can also be an image obtained through a monocular camera, that is, the two-dimensional image sample is also a monocular image. The two-dimensional detection information at least includes the two-dimensional coordinates of the target sample, and the three-dimensional prediction information at least includes the three-dimensional coordinates, size and direction angle of the target sample.

In some implementations, the two-dimensional detection information and three-dimensional prediction information of the target sample can be obtained through the following steps S1011 to S1012.

Step S1011: Perform target detection on the two-dimensional image sample through the three-dimensional target detection model to be trained, and obtain the two-dimensional detection frame of the target sample. The two-dimensional detection box refers to the bounding box of the target sample on the two-dimensional image sample.

Step S1012: Determine the two-dimensional detection information and the three-dimensional prediction information of the target sample respectively based on the two-dimensional detection information and the three-dimensional prediction information of the two-dimensional detection frame.

In this embodiment, the two-dimensional detection information and the three-dimensional prediction information of the two-dimensional detection frame can be used as the two-dimensional detection information and the three-dimensional prediction information of the target sample, respectively.

Step S102: Project the three-dimensional prediction information to obtain two-dimensional projection information.

In the embodiment of the present invention, the coordinate system of the three-dimensional prediction information can be converted from the three-dimensional coordinate system to the two-dimensional image coordinate system, thereby realizing the two-dimensional projection of the three-dimensional prediction information and obtaining the two-dimensional projection information, where the three-dimensional The coordinate system can be the World Coordinate System. Specifically, in this embodiment, the coordinate system transformation relationship between the world coordinate system and the two-dimensional image coordinate system can be determined first, and then the coordinate system transformation is performed on the three-dimensional prediction information through the coordinate system transformation relationship. It should be noted that in this embodiment, the conventional coordinate system conversion relationship determination method in the field of vision technology can be used to determine the world coordinate system and the two-dimensional image coordinates. The coordinate system transformation relationship between systems, for example, the coordinate transformation relationship between the world coordinate system and the two-dimensional image coordinate system can be determined through the principle of pinhole imaging.

Step S103: Based on the two-dimensional detection information and the two-dimensional projection information, use the two-dimensional information consistency loss function to perform model training on the three-dimensional target detection model to be trained, and obtain the trained three-dimensional target detection model.

The two-dimensional detection information can represent the true value of the two-dimensional information of the target sample on the two-dimensional image sample, and the two-dimensional projection information is obtained by the projection of the three-dimensional prediction information. Therefore, the two-dimensional projection information can represent the target sample on the two-dimensional image sample. The predicted value of two-dimensional information.

Model training of the 3D target detection model to be trained through the 2D information consistency loss function can make the 2D projection information (predicted value of 2D information) continuously close to the 2D detection information (real value of 2D information). If the 2D The closer the projection information is to the two-dimensional detection information, the more accurate the three-dimensional prediction information of the target sample obtained by the three-dimensional target detection model to be trained is to detect the two-dimensional image sample.

In some implementations, a two-dimensional information consistency loss function can be established through a square loss function. For example, the two-dimensional information consistency loss function can be as shown in the following formula (1).

The meanings of each parameter in formula (1) are: L ₁ represents the loss value of the two-dimensional information consistency loss function, y ₁ represents the two-dimensional detection information, Represents two-dimensional projection information.

Through the above steps S101 to S103, even if the actual three-dimensional information of the target sample in the two-dimensional image sample cannot be obtained, the three-dimensional target detection model can be trained by geometrically constraining the two-dimensional detection information and two-dimensional projection information of the target sample. This enables the trained three-dimensional target detection model to accurately detect the three-dimensional information of the target from the two-dimensional image.

The above step S103 will be further described below.

When training a 3D target detection model to be trained, a large batch of two-dimensional image samples are usually used for model training, where each two-dimensional image sample contains at least one target sample. The sample labels of two-dimensional image samples may be labeled with the actual three-dimensional information of all target samples, or may be labeled with the actual three-dimensional information of only a part of the target samples. In order to further improve the accuracy and efficiency of model training, for target samples labeled with three-dimensional actual information, the three-dimensional actual information can be used for model training. For target samples without labeled three-dimensional actual information, the two-dimensional detection information of the target sample can be combined with the two-dimensional Projection information for model training. Specifically, referring to FIG. 2 , in some implementations of the above step S103 , the three-dimensional target detection model to be trained can be trained through the following steps S1031 to S1033 .

Step S1031: Determine whether each target sample has three-dimensional actual information according to the sample label of the two-dimensional image sample.

If the current target sample has three-dimensional actual information, go to step S1032;

If the current target sample does not have three-dimensional actual information, go to step S1033.

Step S1032: Based on the three-dimensional actual information and three-dimensional predicted information of the current target sample, use the three-dimensional information consistency loss function to perform model training on the three-dimensional target detection model to be trained. Model training of the 3D target detection model to be trained through the 3D information consistency loss function can make the 3D prediction information continuously close to the 3D actual information. If the 3D prediction information is closer to the 3D actual information, it means that the 3D target detection model to be trained is more accurate for the 2D target detection model. The more accurate the three-dimensional prediction information of the target sample obtained by target detection on dimensional image samples.

In some implementations, the three-dimensional information consistency loss function can be established through a square loss function. For example, the three-dimensional information consistency loss function can be shown in the following formula (2).

The meanings of each parameter in formula (2) are: L ₂ represents the loss value of the three-dimensional information consistency loss function, y ₂ represents the three-dimensional actual information, Represents three-dimensional prediction information.

Step S1033: Based on the two-dimensional detection information and two-dimensional projection information of the current target sample, use the two-dimensional information consistency loss function to perform model training on the three-dimensional target detection model to be trained. It should be noted that the specific method of step S1033 is similar to the method described in step S103 in the foregoing method embodiment, and will not be described again.

Through the above steps S1031 to S1033, target samples labeled with three-dimensional actual information and target samples not labeled with three-dimensional actual information can be used for model training of the three-dimensional target detection model to be trained, which significantly improves the accuracy and efficiency of model training.

In addition, in other implementations of the above step S103, based on the two-dimensional detection information and the two-dimensional projection information, the two-dimensional information consistency loss function is used to perform model training on the three-dimensional target detection model to be trained, and the trained three-dimensional target detection is obtained. After the model is created, the trained 3D target detection model can be trained again using the 3D actual information of the target sample to correct the trained 3D target detection model and further improve the target detection accuracy of the 3D target detection model. Specifically, referring to Figure 3, in this embodiment, it can be The trained three-dimensional target detection model is trained through the following steps S104 to step S106 to modify the trained three-dimensional target detection model.

Step S104: Determine whether the sample label of the two-dimensional image sample contains the actual three-dimensional information of the target sample; if it does, go to step S105; if it does not, go to step S106.

Step S105: Based on the three-dimensional actual information and the three-dimensional predicted information, use the three-dimensional information consistency loss function to perform model training on the trained three-dimensional target detection model to obtain the final three-dimensional target detection model.

It should be noted that the specific method of step S105 is similar to the method described in step S1032 in the foregoing method embodiment, and will not be described again.

Step S106: The trained three-dimensional target detection model is not trained.

It should be pointed out that although the various steps are described in a specific order in the above embodiments, those skilled in the art can understand that in order to achieve the effects of the present invention, different steps do not have to be executed in such an order. They can be executed simultaneously (in parallel) or in other sequences, and these changes are within the scope of the present invention.

Those skilled in the art can understand that the present invention can implement all or part of the process in the method of the above-mentioned embodiment, and can also be completed by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable file. In the storage medium, when the computer program is executed by the processor, the steps of each of the above method embodiments can be implemented. Wherein, the computer program includes computer program code, which may be in the form of source code, object code, executable file or some intermediate form. The computer-readable storage medium may include: any entity or device capable of carrying the computer program code, media, USB flash drive, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory, random access memory, electrical carrier wave signals, telecommunications signals, and software distribution media, etc. It should be noted that the content contained in the computer-readable storage medium can be appropriately added or deleted according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, computer-readable storage media Storage media does not include electrical carrier signals and telecommunications signals.

Furthermore, the present invention also provides a computer device. In one embodiment of the computer equipment according to the present invention, the computer equipment includes a processor and a storage device. The storage device can be configured to store a program for executing the three-dimensional target detection method of the above method embodiment. The processor can be configured to execute The program in the storage device includes but is not limited to a program for executing the three-dimensional target detection method of the above method embodiment. For convenience of explanation, only the embodiments of the present invention are shown For parts related to the examples, if the specific technical details are not disclosed, please refer to the method part of the embodiments of the present invention. The computer device may be a control device device including various electronic devices.

Furthermore, the present invention also provides a computer-readable storage medium. In an embodiment of a computer-readable storage medium according to the present invention, the computer-readable storage medium can be configured to store a program for executing the three-dimensional target detection method of the above method embodiment. The program can be loaded and run by a processor to implement the above. Three-dimensional target detection method. For ease of explanation, only the parts related to the embodiments of the present invention are shown. If specific technical details are not disclosed, please refer to the method part of the embodiments of the present invention. The computer-readable storage medium may be a storage device formed by various electronic devices. Optionally, in the embodiment of the present invention, the computer-readable storage medium is a non-transitory computer-readable storage medium.

Furthermore, the present invention also provides a vehicle. In a vehicle embodiment according to the present invention, the vehicle may include a computer device as described in the above computer device embodiment. In this embodiment, the vehicle may be an autonomous vehicle, an unmanned vehicle, or other vehicles. In addition, according to the type of power source, the vehicle in this embodiment may be a fuel vehicle, an electric vehicle, a hybrid vehicle that mixes electric energy with fuel, or a vehicle that uses other new energy sources.

So far, the technical solution of the present invention has been described with reference to an embodiment shown in the drawings. However, those skilled in the art can easily understand that the protection scope of the present invention is obviously not limited to these specific embodiments. Without departing from the principles of the present invention, those skilled in the art can make equivalent changes or substitutions to relevant technical features, and technical solutions after these modifications or substitutions will fall within the protection scope of the present invention.

Claims

A three-dimensional target detection method, characterized in that the method includes:

Perform target detection on a two-dimensional image through a three-dimensional target detection model, and obtain three-dimensional information of the target to be detected in the two-dimensional image;

Wherein, the three-dimensional target detection model is trained in the following way:

Perform target detection on the two-dimensional image sample through the three-dimensional target detection model to be trained, and obtain the two-dimensional detection information and three-dimensional prediction information of the target sample in the two-dimensional image sample;

Project the three-dimensional prediction information to obtain two-dimensional projection information;

According to the two-dimensional detection information and the two-dimensional projection information, the two-dimensional information consistency loss function is used to perform model training on the three-dimensional target detection model to be trained, and a trained three-dimensional target detection model is obtained.
The three-dimensional target detection method according to claim 1, characterized in that, "according to the two-dimensional detection information and the two-dimensional projection information, a two-dimensional information consistency loss function is used to perform the three-dimensional target detection model to be trained. Model training, the steps to obtain a trained 3D target detection model specifically include:

Determine whether each of the target samples has three-dimensional actual information according to the sample labels of the two-dimensional image samples;

If the current target sample has three-dimensional actual information, then based on the three-dimensional actual information of the current target sample and the three-dimensional prediction information, the three-dimensional information consistency loss function is used to perform model training on the three-dimensional target detection model to be trained;

If the current target sample does not have three-dimensional actual information, then based on the two-dimensional detection information of the current target sample and the two-dimensional projection information, a two-dimensional information consistency loss function is used to perform model training on the three-dimensional target detection model to be trained.
The three-dimensional target detection method according to claim 1, characterized in that, "according to the two-dimensional detection information and the two-dimensional projection information, a two-dimensional information consistency loss function is used to train the three-dimensional target detection model to be trained After the step of "carrying out model training to obtain a trained three-dimensional target detection model", the method further includes training the trained three-dimensional target detection model in the following manner to modify the trained three-dimensional target detection model:

Determine whether the sample label of the two-dimensional image sample contains the three-dimensional real object of the target sample international information;

If included, then use the three-dimensional information consistency loss function to perform model training on the trained three-dimensional target detection model based on the three-dimensional actual information and the three-dimensional predicted information to obtain the final three-dimensional target detection model;

If it is not included, the trained three-dimensional target detection model will not be trained.
The three-dimensional target detection method according to claim 1, characterized in that: "Perform target detection on two-dimensional image samples through the three-dimensional target detection model to be trained, and obtain the two-dimensional detection information and three-dimensional target samples in the two-dimensional image samples. The steps for "predicting information" specifically include:

Perform target detection on two-dimensional image samples through the three-dimensional target detection model to be trained, and obtain the two-dimensional detection frame of the target sample;

According to the two-dimensional detection information and the three-dimensional prediction information of the two-dimensional detection frame, the two-dimensional detection information and the three-dimensional prediction information of the target sample are respectively determined.
The three-dimensional target detection method according to claim 2 or 3, characterized in that the method further includes:

Using a square loss function to establish the two-dimensional information consistency loss function;

and / or,

A square loss function is used to establish the three-dimensional information consistency loss function.
The three-dimensional target detection method according to claim 2 or 3, characterized in that both the three-dimensional predicted information and the three-dimensional actual information include at least the three-dimensional coordinates, size and direction angle of the target sample.
The three-dimensional target detection method according to any one of claims 1 to 4, characterized in that the method further includes acquiring the two-dimensional image sample through a monocular camera.
A computer device, comprising a processor and a storage device, the storage device being adapted to store a plurality of program codes, characterized in that the program codes are adapted to be loaded and run by the processor to execute claims 1 to 7 The three-dimensional target detection method described in any one of the above.
A computer-readable storage medium in which multiple pieces of program code are stored, characterized by: Therefore, the program code is adapted to be loaded and run by a processor to perform the three-dimensional target detection method according to any one of claims 1 to 7.
A vehicle, characterized in that the vehicle includes the computer device of claim 8.