WO2024001804A1 - Three-dimensional object detection method, computer device, storage medium, and vehicle - Google Patents

Three-dimensional object detection method, computer device, storage medium, and vehicle Download PDF

Info

Publication number
WO2024001804A1
WO2024001804A1 PCT/CN2023/100354 CN2023100354W WO2024001804A1 WO 2024001804 A1 WO2024001804 A1 WO 2024001804A1 CN 2023100354 W CN2023100354 W CN 2023100354W WO 2024001804 A1 WO2024001804 A1 WO 2024001804A1
Authority
WO
WIPO (PCT)
Prior art keywords
dimensional
information
target detection
trained
target
Prior art date
Application number
PCT/CN2023/100354
Other languages
French (fr)
Chinese (zh)
Inventor
李林
翟玉强
Original Assignee
安徽蔚来智驾科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 安徽蔚来智驾科技有限公司 filed Critical 安徽蔚来智驾科技有限公司
Publication of WO2024001804A1 publication Critical patent/WO2024001804A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the invention relates to the field of visual detection technology, and specifically provides a three-dimensional target detection method, computer equipment, storage media and vehicles.
  • the joint calibration method of lidar and camera is usually used to obtain 3D information such as the position of the 3D target, and this information is used as the label of the 2D image sample containing the above 3D target. Then, the two-dimensional image samples and their labels are used to perform model training on the three-dimensional target detection model, and the trained three-dimensional target detection model is used to perform three-dimensional target detection on the two-dimensional image.
  • the detection range of lidar is usually relatively small, and it can only obtain three-dimensional information such as the location of close targets. As a result, the above method can only accurately detect three-dimensional targets at close range, but cannot accurately detect three-dimensional targets at long distances. .
  • the present invention is proposed to provide a three-dimensional target detection method, computer equipment, and storage that solve or at least partially solve the technical problem of accurately detecting short-range and long-range three-dimensional targets simultaneously to improve the accuracy of target detection.
  • Media and vehicles are proposed to provide a three-dimensional target detection method, computer equipment, and storage that solve or at least partially solve the technical problem of accurately detecting short-range and long-range three-dimensional targets simultaneously to improve the accuracy of target detection.
  • a three-dimensional target detection method includes:
  • the three-dimensional target detection model is trained in the following way:
  • Object detection is performed on two-dimensional image samples through the three-dimensional object detection model to be trained. Measure and obtain the two-dimensional detection information and three-dimensional prediction information of the target sample in the two-dimensional image sample;
  • the two-dimensional information consistency loss function is used to perform model training on the three-dimensional target detection model to be trained, and a trained three-dimensional target detection model is obtained.
  • the two-dimensional information consistency loss function is used to perform model training on the three-dimensional target detection model to be trained, and we obtain
  • the steps of "trained 3D target detection model” specifically include:
  • the three-dimensional information consistency loss function is used to perform model training on the three-dimensional target detection model to be trained;
  • a two-dimensional information consistency loss function is used to perform model training on the three-dimensional target detection model to be trained.
  • the method further includes training the trained three-dimensional target detection model in the following manner to modify the trained three-dimensional target detection model:
  • the trained three-dimensional target detection model will not be trained.
  • target detection is performed on two-dimensional image samples through the three-dimensional target detection model to be trained, and the two-dimensional detection information and three-dimensional prediction information of the target sample in the two-dimensional image sample are obtained.
  • the specific steps include:
  • Target two-dimensional image samples through the three-dimensional target detection model to be trained.
  • Target detection is performed to obtain the two-dimensional detection frame of the target sample;
  • the two-dimensional detection information and the three-dimensional prediction information of the two-dimensional detection frame are respectively determined.
  • the method further includes:
  • a square loss function is used to establish the three-dimensional information consistency loss function.
  • the three-dimensional predicted information and the three-dimensional actual information include at least the three-dimensional coordinates, size and direction angle of the target sample.
  • the method further includes acquiring the two-dimensional image sample through a monocular camera.
  • a computer device in a second aspect, includes a processor and a storage device.
  • the storage device is adapted to store a plurality of program codes.
  • the program codes are adapted to be loaded and run by the processor to execute the above.
  • the three-dimensional target detection method described in any of the technical solutions of the three-dimensional target detection method.
  • a computer-readable storage medium which stores a plurality of program codes, and the program codes are suitable for being loaded and run by a processor to execute the technical solution of the above three-dimensional target detection method.
  • a vehicle which vehicle includes the computer device described in the above computer device technical solution.
  • target detection can be performed on a two-dimensional image through a three-dimensional target detection model, and the three-dimensional information of the target to be detected in the two-dimensional image can be obtained.
  • the three-dimensional target detection model is trained in the following ways: perform target detection on two-dimensional image samples through the three-dimensional target detection model to be trained, and obtain the two-dimensional detection information and three-dimensional prediction information of the target sample in the two-dimensional image sample; project the three-dimensional prediction information , obtain the two-dimensional projection information; according to the two-dimensional detection information and the two-dimensional projection information, use the two-dimensional information consistency loss function to perform model training on the three-dimensional target detection model to be trained, and obtain the trained three-dimensional target detection model.
  • the three-dimensional target detection model can be trained by geometrically constraining the two-dimensional detection information and the two-dimensional projection information of the target sample, so that the training is good
  • the three-dimensional target detection model can accurately detect the three-dimensional information of the target from the two-dimensional image, overcoming the inability to obtain the actual three-dimensional information of the long-distance target for model training in the existing technology, resulting in the inability to detect the long-distance three-dimensional target. Accurately detect defects.
  • Figure 1 is a schematic flowchart of the main steps of a method for obtaining a three-dimensional target detection model according to an embodiment of the present invention
  • Figure 2 is a schematic flowchart of the main steps of a method for model training of a three-dimensional target detection model to be trained according to an embodiment of the present invention
  • Figure 3 is a schematic flowchart of the main steps of a method for model training of a three-dimensional target detection model to be trained according to another embodiment of the present invention.
  • processor may include hardware, software, or a combination of both.
  • the processor may be a central processing unit, a microprocessor, an image processor, a digital signal processor, or any other suitable processor.
  • the processor has data and/or signal processing functions.
  • the processor can be implemented in software, hardware, or a combination of both.
  • Non-transitory computer-readable storage media include any suitable media that can store program code, such as magnetic disks, hard disks, optical disks, flash memory, read-only memory, random access memory, etc.
  • the three-dimensional target detection method can perform target detection on a two-dimensional image through a three-dimensional target detection model, and obtain three-dimensional information of the target to be detected in the two-dimensional image.
  • the two-dimensional image may be an image obtained through a monocular camera, that is, the two-dimensional image is a monocular image.
  • three-dimensional target detection By using the detection model to perform target detection on a monocular image, the three-dimensional information of the target to be detected in the monocular image can be obtained.
  • the three-dimensional information of the target to be detected at least includes the three-dimensional coordinates, size and direction cosine of the target to be detected.
  • the three-dimensional target detection model can be a network model built using neural networks to detect the three-dimensional information of the target from the two-dimensional image.
  • the above three-dimensional target detection model to be trained can be model trained through the following steps S101 to S103. In order to use the trained three-dimensional target detection model to detect the target in the two-dimensional image, and obtain the three-dimensional information of the target to be detected in the two-dimensional image.
  • Step S101 Perform target detection on the two-dimensional image sample through the three-dimensional target detection model to be trained, and obtain the two-dimensional detection information and three-dimensional prediction information of the target sample in the two-dimensional image sample.
  • the two-dimensional image sample can also be an image obtained through a monocular camera, that is, the two-dimensional image sample is also a monocular image.
  • the two-dimensional detection information at least includes the two-dimensional coordinates of the target sample
  • the three-dimensional prediction information at least includes the three-dimensional coordinates, size and direction angle of the target sample.
  • the two-dimensional detection information and three-dimensional prediction information of the target sample can be obtained through the following steps S1011 to S1012.
  • Step S1011 Perform target detection on the two-dimensional image sample through the three-dimensional target detection model to be trained, and obtain the two-dimensional detection frame of the target sample.
  • the two-dimensional detection box refers to the bounding box of the target sample on the two-dimensional image sample.
  • Step S1012 Determine the two-dimensional detection information and the three-dimensional prediction information of the target sample respectively based on the two-dimensional detection information and the three-dimensional prediction information of the two-dimensional detection frame.
  • the two-dimensional detection information and the three-dimensional prediction information of the two-dimensional detection frame can be used as the two-dimensional detection information and the three-dimensional prediction information of the target sample, respectively.
  • Step S102 Project the three-dimensional prediction information to obtain two-dimensional projection information.
  • the coordinate system of the three-dimensional prediction information can be converted from the three-dimensional coordinate system to the two-dimensional image coordinate system, thereby realizing the two-dimensional projection of the three-dimensional prediction information and obtaining the two-dimensional projection information, where the three-dimensional
  • the coordinate system can be the World Coordinate System.
  • the coordinate system transformation relationship between the world coordinate system and the two-dimensional image coordinate system can be determined first, and then the coordinate system transformation is performed on the three-dimensional prediction information through the coordinate system transformation relationship.
  • the conventional coordinate system conversion relationship determination method in the field of vision technology can be used to determine the world coordinate system and the two-dimensional image coordinates.
  • the coordinate system transformation relationship between systems for example, the coordinate transformation relationship between the world coordinate system and the two-dimensional image coordinate system can be determined through the principle of pinhole imaging.
  • Step S103 Based on the two-dimensional detection information and the two-dimensional projection information, use the two-dimensional information consistency loss function to perform model training on the three-dimensional target detection model to be trained, and obtain the trained three-dimensional target detection model.
  • the two-dimensional detection information can represent the true value of the two-dimensional information of the target sample on the two-dimensional image sample, and the two-dimensional projection information is obtained by the projection of the three-dimensional prediction information. Therefore, the two-dimensional projection information can represent the target sample on the two-dimensional image sample. The predicted value of two-dimensional information.
  • Model training of the 3D target detection model to be trained through the 2D information consistency loss function can make the 2D projection information (predicted value of 2D information) continuously close to the 2D detection information (real value of 2D information). If the 2D The closer the projection information is to the two-dimensional detection information, the more accurate the three-dimensional prediction information of the target sample obtained by the three-dimensional target detection model to be trained is to detect the two-dimensional image sample.
  • a two-dimensional information consistency loss function can be established through a square loss function.
  • the two-dimensional information consistency loss function can be as shown in the following formula (1).
  • L 1 represents the loss value of the two-dimensional information consistency loss function
  • y 1 represents the two-dimensional detection information
  • the three-dimensional target detection model can be trained by geometrically constraining the two-dimensional detection information and two-dimensional projection information of the target sample. This enables the trained three-dimensional target detection model to accurately detect the three-dimensional information of the target from the two-dimensional image.
  • step S103 will be further described below.
  • each two-dimensional image sample contains at least one target sample.
  • the sample labels of two-dimensional image samples may be labeled with the actual three-dimensional information of all target samples, or may be labeled with the actual three-dimensional information of only a part of the target samples.
  • the three-dimensional actual information can be used for model training.
  • the two-dimensional detection information of the target sample can be combined with the two-dimensional Projection information for model training.
  • the three-dimensional target detection model to be trained can be trained through the following steps S1031 to S1033 .
  • Step S1031 Determine whether each target sample has three-dimensional actual information according to the sample label of the two-dimensional image sample.
  • step S1032 If the current target sample has three-dimensional actual information, go to step S1032;
  • Step S1032 Based on the three-dimensional actual information and three-dimensional predicted information of the current target sample, use the three-dimensional information consistency loss function to perform model training on the three-dimensional target detection model to be trained.
  • Model training of the 3D target detection model to be trained through the 3D information consistency loss function can make the 3D prediction information continuously close to the 3D actual information. If the 3D prediction information is closer to the 3D actual information, it means that the 3D target detection model to be trained is more accurate for the 2D target detection model. The more accurate the three-dimensional prediction information of the target sample obtained by target detection on dimensional image samples.
  • the three-dimensional information consistency loss function can be established through a square loss function.
  • the three-dimensional information consistency loss function can be shown in the following formula (2).
  • L 2 represents the loss value of the three-dimensional information consistency loss function
  • y 2 represents the three-dimensional actual information
  • Step S1033 Based on the two-dimensional detection information and two-dimensional projection information of the current target sample, use the two-dimensional information consistency loss function to perform model training on the three-dimensional target detection model to be trained. It should be noted that the specific method of step S1033 is similar to the method described in step S103 in the foregoing method embodiment, and will not be described again.
  • target samples labeled with three-dimensional actual information and target samples not labeled with three-dimensional actual information can be used for model training of the three-dimensional target detection model to be trained, which significantly improves the accuracy and efficiency of model training.
  • the two-dimensional information consistency loss function is used to perform model training on the three-dimensional target detection model to be trained, and the trained three-dimensional target detection is obtained.
  • the trained 3D target detection model can be trained again using the 3D actual information of the target sample to correct the trained 3D target detection model and further improve the target detection accuracy of the 3D target detection model.
  • it can be The trained three-dimensional target detection model is trained through the following steps S104 to step S106 to modify the trained three-dimensional target detection model.
  • Step S104 Determine whether the sample label of the two-dimensional image sample contains the actual three-dimensional information of the target sample; if it does, go to step S105; if it does not, go to step S106.
  • Step S105 Based on the three-dimensional actual information and the three-dimensional predicted information, use the three-dimensional information consistency loss function to perform model training on the trained three-dimensional target detection model to obtain the final three-dimensional target detection model.
  • step S105 is similar to the method described in step S1032 in the foregoing method embodiment, and will not be described again.
  • Step S106 The trained three-dimensional target detection model is not trained.
  • the present invention can implement all or part of the process in the method of the above-mentioned embodiment, and can also be completed by instructing relevant hardware through a computer program.
  • the computer program can be stored in a computer-readable file. In the storage medium, when the computer program is executed by the processor, the steps of each of the above method embodiments can be implemented.
  • the computer program includes computer program code, which may be in the form of source code, object code, executable file or some intermediate form.
  • the computer-readable storage medium may include: any entity or device capable of carrying the computer program code, media, USB flash drive, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory, random access memory, electrical carrier wave signals, telecommunications signals, and software distribution media, etc.
  • computer-readable storage media Storage media does not include electrical carrier signals and telecommunications signals.
  • the present invention also provides a computer device.
  • the computer equipment includes a processor and a storage device.
  • the storage device can be configured to store a program for executing the three-dimensional target detection method of the above method embodiment.
  • the processor can be configured to execute
  • the program in the storage device includes but is not limited to a program for executing the three-dimensional target detection method of the above method embodiment.
  • the computer device may be a control device device including various electronic devices.
  • the present invention also provides a computer-readable storage medium.
  • the computer-readable storage medium can be configured to store a program for executing the three-dimensional target detection method of the above method embodiment.
  • the program can be loaded and run by a processor to implement the above. Three-dimensional target detection method.
  • the computer-readable storage medium may be a storage device formed by various electronic devices.
  • the computer-readable storage medium is a non-transitory computer-readable storage medium.
  • the present invention also provides a vehicle.
  • the vehicle may include a computer device as described in the above computer device embodiment.
  • the vehicle may be an autonomous vehicle, an unmanned vehicle, or other vehicles.
  • the vehicle in this embodiment may be a fuel vehicle, an electric vehicle, a hybrid vehicle that mixes electric energy with fuel, or a vehicle that uses other new energy sources.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

The present invention relates to the technical field of visual inspection. Specifically provided are a three-dimensional object detection method, a computer device, a storage medium, and a vehicle, which aim to solve the problem of improving the accuracy of object detection. To this end, the method in the present invention comprises: performing object detection on a two-dimensional image by means of a three-dimensional object detection model, so as to acquire three-dimensional information of an object to be detected in the two-dimensional image. The three-dimensional object detection model is obtained by means of performing training in the following manner: performing object detection on a two-dimensional image sample by means of a three-dimensional object detection model to be trained, so as to acquire two-dimensional detection information and three-dimensional prediction information of an object sample in the two-dimensional image sample; projecting the three-dimensional prediction information, so as to obtain two-dimensional projection information; and according to the two-dimensional detection information and the two-dimensional projection information and by using a two-dimensional information consistency loss function, performing model training on the three-dimensional object detection model to be trained, so as to obtain a trained three-dimensional object detection model. In this way, both a close-range object and a long-range object can be detected.

Description

三维目标检测方法、计算机设备、存储介质及车辆Three-dimensional target detection methods, computer equipment, storage media and vehicles
本申请要求2022年06月28日提交的、发明名称为“三维目标检测方法、计算机设备、存储介质及车辆”的中国专利申请CN115205846A的优先权,上述中国专利申请的全部内容通过引用并入本申请中。This application claims priority to Chinese patent application CN115205846A, which was submitted on June 28, 2022 and has an invention title of "Three-dimensional target detection method, computer equipment, storage medium and vehicle". The entire content of the above Chinese patent application is incorporated by reference into this document. Applying.
技术领域Technical field
本发明涉及视觉检测技术领域,具体提供一种三维目标检测方法、计算机设备、存储介质及车辆。The invention relates to the field of visual detection technology, and specifically provides a three-dimensional target detection method, computer equipment, storage media and vehicles.
背景技术Background technique
为了提高对二维图像进行三维目标检测的准确性,通常会采用激光雷达和摄像机联合标定的方法获取三维目标的位置等三维信息,将这些信息作为包含上述三维目标的二维图像样本的标签,进而使用二维图像样本及其标签对三维目标检测模型进行模型训练,采用训练好的三维目标检测模型对二维图像进行三维目标检测。然而,激光雷达的探测距离通常比较小,只能获取近距离目标的位置等三维信息,从而导致上述方法仅能够对近距离的三维目标进行准确检测,而无法对远距离的三维目标进行准确检测。In order to improve the accuracy of 3D target detection on 2D images, the joint calibration method of lidar and camera is usually used to obtain 3D information such as the position of the 3D target, and this information is used as the label of the 2D image sample containing the above 3D target. Then, the two-dimensional image samples and their labels are used to perform model training on the three-dimensional target detection model, and the trained three-dimensional target detection model is used to perform three-dimensional target detection on the two-dimensional image. However, the detection range of lidar is usually relatively small, and it can only obtain three-dimensional information such as the location of close targets. As a result, the above method can only accurately detect three-dimensional targets at close range, but cannot accurately detect three-dimensional targets at long distances. .
相应地,本领域需要一种新的技术方案来解决上述问题。Accordingly, a new technical solution is needed in this field to solve the above problems.
发明内容Contents of the invention
为了克服上述缺陷,提出了本发明,以提供解决或至少部分地解决同时对近距离和远距离三维目标进行准确检测,以提高目标检测准确性的技术问题的三维目标检测方法、计算机设备、存储介质及车辆。In order to overcome the above-mentioned defects, the present invention is proposed to provide a three-dimensional target detection method, computer equipment, and storage that solve or at least partially solve the technical problem of accurately detecting short-range and long-range three-dimensional targets simultaneously to improve the accuracy of target detection. Media and vehicles.
第一方面,提供一种三维目标检测方法,所述方法包括:In a first aspect, a three-dimensional target detection method is provided. The method includes:
通过三维目标检测模型对二维图像进行目标检测,获取所述二维图像中待检测目标的三维信息;Perform target detection on a two-dimensional image through a three-dimensional target detection model, and obtain three-dimensional information of the target to be detected in the two-dimensional image;
其中,所述三维目标检测模型通过下列方式训练得到:Wherein, the three-dimensional target detection model is trained in the following way:
通过待训练三维目标检测模型对二维图像样本进行目标检 测,获取所述二维图像样本中目标样本的二维检测信息和三维预测信息;Object detection is performed on two-dimensional image samples through the three-dimensional object detection model to be trained. Measure and obtain the two-dimensional detection information and three-dimensional prediction information of the target sample in the two-dimensional image sample;
对所述三维预测信息进行投影,得到二维投影信息;Project the three-dimensional prediction information to obtain two-dimensional projection information;
根据所述二维检测信息与所述二维投影信息,采用二维信息一致性损失函数对所述待训练三维目标检测模型进行模型训练,得到训练好的三维目标检测模型。According to the two-dimensional detection information and the two-dimensional projection information, the two-dimensional information consistency loss function is used to perform model training on the three-dimensional target detection model to be trained, and a trained three-dimensional target detection model is obtained.
在上述三维目标检测方法的一个技术方案中,“根据所述二维检测信息与所述二维投影信息,采用二维信息一致性损失函数对所述待训练三维目标检测模型进行模型训练,得到训练好的三维目标检测模型”的步骤具体包括:In one technical solution of the above three-dimensional target detection method, "according to the two-dimensional detection information and the two-dimensional projection information, the two-dimensional information consistency loss function is used to perform model training on the three-dimensional target detection model to be trained, and we obtain The steps of "trained 3D target detection model" specifically include:
根据所述二维图像样本的样本标签分别确定每个所述目标样本是否有三维实际信息;Determine whether each of the target samples has three-dimensional actual information according to the sample labels of the two-dimensional image samples;
若当前目标样本有三维实际信息,则根据所述当前目标样本的三维实际信息与所述三维预测信息,采用三维信息一致性损失函数对所述待训练三维目标检测模型进行模型训练;If the current target sample has three-dimensional actual information, then based on the three-dimensional actual information of the current target sample and the three-dimensional prediction information, the three-dimensional information consistency loss function is used to perform model training on the three-dimensional target detection model to be trained;
若当前目标样本没有三维实际信息,则根据所述当前目标样本的二维检测信息与所述二维投影信息,采用二维信息一致性损失函数对所述待训练三维目标检测模型进行模型训练。If the current target sample does not have three-dimensional actual information, then based on the two-dimensional detection information of the current target sample and the two-dimensional projection information, a two-dimensional information consistency loss function is used to perform model training on the three-dimensional target detection model to be trained.
在上述三维目标检测方法的一个技术方案中,在“根据所述二维检测信息与所述二维投影信息,采用二维信息一致性损失函数对所述待训练三维目标检测模型进行模型训练,得到训练好的三维目标检测模型”的步骤之后,所述方法还包括通过下列方式对所述训练好的三维目标检测模型进行训练,以修正所述训练好的三维目标检测模型:In a technical solution of the above three-dimensional target detection method, in "according to the two-dimensional detection information and the two-dimensional projection information, the two-dimensional information consistency loss function is used to perform model training on the three-dimensional target detection model to be trained, After the step of "obtaining the trained three-dimensional target detection model", the method further includes training the trained three-dimensional target detection model in the following manner to modify the trained three-dimensional target detection model:
判断所述二维图像样本的样本标签是否包含所述目标样本的三维实际信息;Determine whether the sample label of the two-dimensional image sample contains the three-dimensional actual information of the target sample;
若包含,则根据所述三维实际信息与所述三维预测信息,采用三维信息一致性损失函数对所述训练好的三维目标检测模型进行模型训练,得到最终的三维目标检测模型;If included, then use the three-dimensional information consistency loss function to perform model training on the trained three-dimensional target detection model based on the three-dimensional actual information and the three-dimensional predicted information to obtain the final three-dimensional target detection model;
若不包含,则不对所述训练好的三维目标检测模型进行训练。If it is not included, the trained three-dimensional target detection model will not be trained.
在上述三维目标检测方法的一个技术方案中,“通过待训练三维目标检测模型对二维图像样本进行目标检测,获取所述二维图像样本中目标样本的二维检测信息和三维预测信息”的步骤具体包括:In one technical solution of the above three-dimensional target detection method, "target detection is performed on two-dimensional image samples through the three-dimensional target detection model to be trained, and the two-dimensional detection information and three-dimensional prediction information of the target sample in the two-dimensional image sample are obtained." The specific steps include:
通过所述待训练三维目标检测模型对二维图像样本进行目 标检测,得到所述目标样本的二维检测框;Target two-dimensional image samples through the three-dimensional target detection model to be trained. Target detection is performed to obtain the two-dimensional detection frame of the target sample;
根据所述二维检测框的二维检测信息和三维预测信息,分别确定所述目标样本的二维检测信息和三维预测信息。According to the two-dimensional detection information and the three-dimensional prediction information of the two-dimensional detection frame, the two-dimensional detection information and the three-dimensional prediction information of the target sample are respectively determined.
在上述三维目标检测方法的一个技术方案中,所述方法还包括:In a technical solution of the above three-dimensional target detection method, the method further includes:
采用平方损失函数建立所述二维信息一致性损失函数;Using a square loss function to establish the two-dimensional information consistency loss function;
和/或,采用平方损失函数建立所述三维信息一致性损失函数。And/or, a square loss function is used to establish the three-dimensional information consistency loss function.
在上述三维目标检测方法的一个技术方案中,所述三维预测信息与所述三维实际信息均至少包括目标样本的三维坐标、尺寸和方向角。In a technical solution of the above three-dimensional target detection method, the three-dimensional predicted information and the three-dimensional actual information include at least the three-dimensional coordinates, size and direction angle of the target sample.
在上述三维目标检测方法的一个技术方案中,所述方法还包括通过单目相机获取所述二维图像样本。In one technical solution of the above three-dimensional target detection method, the method further includes acquiring the two-dimensional image sample through a monocular camera.
在第二方面,提供一种计算机设备,该计算机设备包括处理器和存储装置,所述存储装置适于存储多条程序代码,所述程序代码适于由所述处理器加载并运行以执行上述三维目标检测方法的技术方案中任一项技术方案所述的三维目标检测方法。In a second aspect, a computer device is provided. The computer device includes a processor and a storage device. The storage device is adapted to store a plurality of program codes. The program codes are adapted to be loaded and run by the processor to execute the above. The three-dimensional target detection method described in any of the technical solutions of the three-dimensional target detection method.
在第三方面,提供一种计算机可读存储介质,该计算机可读存储介质其中存储有多条程序代码,所述程序代码适于由处理器加载并运行以执行上述三维目标检测方法的技术方案中任一项技术方案所述的三维目标检测方法。In a third aspect, a computer-readable storage medium is provided, which stores a plurality of program codes, and the program codes are suitable for being loaded and run by a processor to execute the technical solution of the above three-dimensional target detection method. The three-dimensional target detection method described in any of the technical solutions.
在第四方面,提供一种车辆,所述车辆包括上述计算机设备技术方案所述的计算机设备。In a fourth aspect, a vehicle is provided, which vehicle includes the computer device described in the above computer device technical solution.
本发明上述一个或多个技术方案,至少具有如下一种或多种有益效果:One or more of the above technical solutions of the present invention have at least one or more of the following beneficial effects:
在实施本发明的技术方案中,可以通过三维目标检测模型对二维图像进行目标检测,获取二维图像中待检测目标的三维信息。三维目标检测模型通过下列方式训练得到:通过待训练三维目标检测模型对二维图像样本进行目标检测,获取二维图像样本中目标样本的二维检测信息和三维预测信息;对三维预测信息进行投影,得到二维投影信息;根据二维检测信息与二维投影信息,采用二维信息一致性损失函数对待训练三维目标检测模型进行模型训练,得到训练好的三维目标检测模型。 In the technical solution for implementing the present invention, target detection can be performed on a two-dimensional image through a three-dimensional target detection model, and the three-dimensional information of the target to be detected in the two-dimensional image can be obtained. The three-dimensional target detection model is trained in the following ways: perform target detection on two-dimensional image samples through the three-dimensional target detection model to be trained, and obtain the two-dimensional detection information and three-dimensional prediction information of the target sample in the two-dimensional image sample; project the three-dimensional prediction information , obtain the two-dimensional projection information; according to the two-dimensional detection information and the two-dimensional projection information, use the two-dimensional information consistency loss function to perform model training on the three-dimensional target detection model to be trained, and obtain the trained three-dimensional target detection model.
通过上述实施方式,即使无法获取二维图像样本中目标样本的三维实际信息,也可以通过对目标样本的二维检测信息与二维投影信息进行几何约束,来训练三维目标检测模型,使得训练好的三维目标检测模型能够准确地从二维图像中检测出目标的三维信息,克服了现有技术中由于无法获取远距离目标的三维实际信息进行模型训练,而导致无法对远距离的三维目标进行准确检测的缺陷。Through the above implementation, even if the actual three-dimensional information of the target sample in the two-dimensional image sample cannot be obtained, the three-dimensional target detection model can be trained by geometrically constraining the two-dimensional detection information and the two-dimensional projection information of the target sample, so that the training is good The three-dimensional target detection model can accurately detect the three-dimensional information of the target from the two-dimensional image, overcoming the inability to obtain the actual three-dimensional information of the long-distance target for model training in the existing technology, resulting in the inability to detect the long-distance three-dimensional target. Accurately detect defects.
附图说明Description of drawings
参照附图,本发明的公开内容将变得更易理解。本领域技术人员容易理解的是:这些附图仅仅用于说明的目的,而并非意在对本发明的保护范围组成限制。其中:The disclosure of the present invention will become more understandable with reference to the accompanying drawings. Those skilled in the art can easily understand that these drawings are for illustrative purposes only and are not intended to limit the scope of the present invention. in:
图1是根据本发明的一个实施例的获取三维目标检测模型的方法的主要步骤流程示意图;Figure 1 is a schematic flowchart of the main steps of a method for obtaining a three-dimensional target detection model according to an embodiment of the present invention;
图2是根据本发明的一个实施例的对待训练三维目标检测模型进行模型训练的方法的主要步骤流程示意图;Figure 2 is a schematic flowchart of the main steps of a method for model training of a three-dimensional target detection model to be trained according to an embodiment of the present invention;
图3是根据本发明的另一个实施例的对待训练三维目标检测模型进行模型训练的方法的主要步骤流程示意图。Figure 3 is a schematic flowchart of the main steps of a method for model training of a three-dimensional target detection model to be trained according to another embodiment of the present invention.
具体实施方式Detailed ways
下面参照附图来描述本发明的一些实施方式。本领域技术人员应当理解的是,这些实施方式仅仅用于解释本发明的技术原理,并非旨在限制本发明的保护范围。Some embodiments of the invention are described below with reference to the accompanying drawings. Those skilled in the art should understand that these embodiments are only used to explain the technical principles of the present invention and are not intended to limit the scope of the present invention.
在本发明的描述中,“处理器”可以包括硬件、软件或者两者的组合。处理器可以是中央处理器、微处理器、图像处理器、数字信号处理器或者其他任何合适的处理器。处理器具有数据和/或信号处理功能。处理器可以以软件方式实现、硬件方式实现或者二者结合方式实现。非暂时性的计算机可读存储介质包括任何合适的可存储程序代码的介质,比如磁碟、硬盘、光碟、闪存、只读存储器、随机存取存储器等等。In the description of the present invention, "processor" may include hardware, software, or a combination of both. The processor may be a central processing unit, a microprocessor, an image processor, a digital signal processor, or any other suitable processor. The processor has data and/or signal processing functions. The processor can be implemented in software, hardware, or a combination of both. Non-transitory computer-readable storage media include any suitable media that can store program code, such as magnetic disks, hard disks, optical disks, flash memory, read-only memory, random access memory, etc.
在根据本发明的一个三维目标检测方法实施例中,该三维目标检测方法可以通过三维目标检测模型对二维图像进行目标检测,获取二维图像中待检测目标的三维信息。二维图像可以是通过单目相机获取到的图像,即这个二维图像是单目图像。本发明实施例通过三维目标检 测模型对一个单目图像进行目标检测,就可以得到这个单目图像中待检测目标的三维信息。待检测目标的三维信息至少包括待检测目标的三维坐标、尺寸和方向角(direction cosine)。In an embodiment of a three-dimensional target detection method according to the present invention, the three-dimensional target detection method can perform target detection on a two-dimensional image through a three-dimensional target detection model, and obtain three-dimensional information of the target to be detected in the two-dimensional image. The two-dimensional image may be an image obtained through a monocular camera, that is, the two-dimensional image is a monocular image. According to the embodiment of the present invention, three-dimensional target detection By using the detection model to perform target detection on a monocular image, the three-dimensional information of the target to be detected in the monocular image can be obtained. The three-dimensional information of the target to be detected at least includes the three-dimensional coordinates, size and direction cosine of the target to be detected.
三维目标检测模型可以是利用神经网络(Neural Networks)构建的用于从二维图像中检测出目标的三维信息的网络模型。参阅附图1,在本发明实施例中在构建好初始的三维目标检测模型(待训练三维目标检测模型)之后,可以通过下列步骤S101至步骤S103对上述待训练三维目标检测模型进行模型训练,以便于利用训练好的三维目标检测模型对二维图像进行目标检测,来获取二维图像中待检测目标的三维信息。The three-dimensional target detection model can be a network model built using neural networks to detect the three-dimensional information of the target from the two-dimensional image. Referring to Figure 1, in the embodiment of the present invention, after the initial three-dimensional target detection model (the three-dimensional target detection model to be trained) is constructed, the above three-dimensional target detection model to be trained can be model trained through the following steps S101 to S103. In order to use the trained three-dimensional target detection model to detect the target in the two-dimensional image, and obtain the three-dimensional information of the target to be detected in the two-dimensional image.
步骤S101:通过待训练三维目标检测模型对二维图像样本进行目标检测,获取二维图像样本中目标样本的二维检测信息和三维预测信息。Step S101: Perform target detection on the two-dimensional image sample through the three-dimensional target detection model to be trained, and obtain the two-dimensional detection information and three-dimensional prediction information of the target sample in the two-dimensional image sample.
二维图像样本同样可以是通过单目相机获取到的图像,即二维图像样本也是单目图像。二维检测信息至少包括目标样本的二维坐标,三维预测信息至少包括目标样本的三维坐标、尺寸和方向角。The two-dimensional image sample can also be an image obtained through a monocular camera, that is, the two-dimensional image sample is also a monocular image. The two-dimensional detection information at least includes the two-dimensional coordinates of the target sample, and the three-dimensional prediction information at least includes the three-dimensional coordinates, size and direction angle of the target sample.
在一些实施方式中,可以通过下列步骤S1011至步骤S1012来获取目标样本的二维检测信息和三维预测信息。In some implementations, the two-dimensional detection information and three-dimensional prediction information of the target sample can be obtained through the following steps S1011 to S1012.
步骤S1011:通过待训练三维目标检测模型对二维图像样本进行目标检测,得到目标样本的二维检测框。二维检测框是指目标样本在二维图像样本上的边界框。Step S1011: Perform target detection on the two-dimensional image sample through the three-dimensional target detection model to be trained, and obtain the two-dimensional detection frame of the target sample. The two-dimensional detection box refers to the bounding box of the target sample on the two-dimensional image sample.
步骤S1012:根据二维检测框的二维检测信息和三维预测信息,分别确定目标样本的二维检测信息和三维预测信息。Step S1012: Determine the two-dimensional detection information and the three-dimensional prediction information of the target sample respectively based on the two-dimensional detection information and the three-dimensional prediction information of the two-dimensional detection frame.
在本实施方式中可以将二维检测框的二维检测信息和三维预测信息,分别作为目标样本的二维检测信息和三维预测信息。In this embodiment, the two-dimensional detection information and the three-dimensional prediction information of the two-dimensional detection frame can be used as the two-dimensional detection information and the three-dimensional prediction information of the target sample, respectively.
步骤S102:对三维预测信息进行投影,得到二维投影信息。Step S102: Project the three-dimensional prediction information to obtain two-dimensional projection information.
在本发明实施例中可以对三维预测信息进行坐标系转换,将其由三维坐标系转换至二维图像坐标系,从而实现对三维预测信息的二维投影,得到二维投影信息,其中,三维坐标系可以是世界坐标系(World Coordinate System)。具体地,在本实施例中可以先确定世界坐标系与二维图像坐标系之间的坐标系转换关系,再通过该坐标系转换关系对三维预测信息进行坐标系转换。需要说明的是,在本实施例中可以采用视觉技术领域中常规的坐标系转换关系确定方法来确定世界坐标系与二维图像坐标 系之间的坐标系转换关系,例如可以通过小孔成像(Pinhole imaging)原理确定世界坐标系与二维图像坐标系之间的坐标转换关系。In the embodiment of the present invention, the coordinate system of the three-dimensional prediction information can be converted from the three-dimensional coordinate system to the two-dimensional image coordinate system, thereby realizing the two-dimensional projection of the three-dimensional prediction information and obtaining the two-dimensional projection information, where the three-dimensional The coordinate system can be the World Coordinate System. Specifically, in this embodiment, the coordinate system transformation relationship between the world coordinate system and the two-dimensional image coordinate system can be determined first, and then the coordinate system transformation is performed on the three-dimensional prediction information through the coordinate system transformation relationship. It should be noted that in this embodiment, the conventional coordinate system conversion relationship determination method in the field of vision technology can be used to determine the world coordinate system and the two-dimensional image coordinates. The coordinate system transformation relationship between systems, for example, the coordinate transformation relationship between the world coordinate system and the two-dimensional image coordinate system can be determined through the principle of pinhole imaging.
步骤S103:根据二维检测信息与二维投影信息,采用二维信息一致性损失函数对待训练三维目标检测模型进行模型训练,得到训练好的三维目标检测模型。Step S103: Based on the two-dimensional detection information and the two-dimensional projection information, use the two-dimensional information consistency loss function to perform model training on the three-dimensional target detection model to be trained, and obtain the trained three-dimensional target detection model.
二维检测信息可以表示目标样本在二维图像样本上的二维信息真实值,而二维投影信息是由三维预测信息投影得到的,因而二维投影信息可以表示目标样本在二维图像样本上的二维信息预测值。The two-dimensional detection information can represent the true value of the two-dimensional information of the target sample on the two-dimensional image sample, and the two-dimensional projection information is obtained by the projection of the three-dimensional prediction information. Therefore, the two-dimensional projection information can represent the target sample on the two-dimensional image sample. The predicted value of two-dimensional information.
通过二维信息一致性损失函数对待训练三维目标检测模型进行模型训练,可以使得二维投影信息(二维信息预测值)不断地接近于二维检测信息(二维信息真实值),如果二维投影信息越接近于二维检测信息,表明待训练三维目标检测模型对二维图像样本进行目标检测得到的目标样本的三维预测信息越准确。Model training of the 3D target detection model to be trained through the 2D information consistency loss function can make the 2D projection information (predicted value of 2D information) continuously close to the 2D detection information (real value of 2D information). If the 2D The closer the projection information is to the two-dimensional detection information, the more accurate the three-dimensional prediction information of the target sample obtained by the three-dimensional target detection model to be trained is to detect the two-dimensional image sample.
在一些实施方式中,可以通过平方损失函数来建立二维信息一致性损失函数,例如二维信息一致性损失函数可以如下式(1)所示。
In some implementations, a two-dimensional information consistency loss function can be established through a square loss function. For example, the two-dimensional information consistency loss function can be as shown in the following formula (1).
公式(1)中各参数含义分别是:L1表示二维信息一致性损失函数的损失值,y1表示二维检测信息,表示二维投影信息。The meanings of each parameter in formula (1) are: L 1 represents the loss value of the two-dimensional information consistency loss function, y 1 represents the two-dimensional detection information, Represents two-dimensional projection information.
通过上述步骤S101至步骤S103,即使无法获取二维图像样本中目标样本的三维实际信息,也可以通过对目标样本的二维检测信息与二维投影信息进行几何约束,来训练三维目标检测模型,使得训练好的三维目标检测模型能够准确地从二维图像中检测出目标的三维信息。Through the above steps S101 to S103, even if the actual three-dimensional information of the target sample in the two-dimensional image sample cannot be obtained, the three-dimensional target detection model can be trained by geometrically constraining the two-dimensional detection information and two-dimensional projection information of the target sample. This enables the trained three-dimensional target detection model to accurately detect the three-dimensional information of the target from the two-dimensional image.
下面对上述步骤S103作进一步说明。The above step S103 will be further described below.
在对待训练三维目标检测模型进行模型训练时通常会使用大批量的二维图像样本进行模型训练,其中,每个二维图像样本均至少包含一个目标样本。而二维图像样本的样本标签中可能标注了所有目标样本的三维实际信息,也可能只标注了一部分目标样本的三维实际信息。为了进一步提高模型训练的准确性和效率,对于标注了三维实际信息的目标样本可以使用三维实际信息进行模型训练,对于没有标注三维实际信息的目标样本可以使用目标样本的二维检测信息与二维投影信息进行模型训练。 具体而言,参阅附图2,在上述步骤S103的一些实施方式中,可以通过下列步骤S1031至步骤S1033对待训练三维目标检测模型进行模型训练。When training a 3D target detection model to be trained, a large batch of two-dimensional image samples are usually used for model training, where each two-dimensional image sample contains at least one target sample. The sample labels of two-dimensional image samples may be labeled with the actual three-dimensional information of all target samples, or may be labeled with the actual three-dimensional information of only a part of the target samples. In order to further improve the accuracy and efficiency of model training, for target samples labeled with three-dimensional actual information, the three-dimensional actual information can be used for model training. For target samples without labeled three-dimensional actual information, the two-dimensional detection information of the target sample can be combined with the two-dimensional Projection information for model training. Specifically, referring to FIG. 2 , in some implementations of the above step S103 , the three-dimensional target detection model to be trained can be trained through the following steps S1031 to S1033 .
步骤S1031:根据二维图像样本的样本标签分别确定每个目标样本是否有三维实际信息。Step S1031: Determine whether each target sample has three-dimensional actual information according to the sample label of the two-dimensional image sample.
若当前目标样本有三维实际信息,则转至步骤S1032;If the current target sample has three-dimensional actual information, go to step S1032;
若当前目标样本没有三维实际信息,则转至步骤S1033。If the current target sample does not have three-dimensional actual information, go to step S1033.
步骤S1032:根据当前目标样本的三维实际信息与三维预测信息,采用三维信息一致性损失函数对待训练三维目标检测模型进行模型训练。通过三维信息一致性损失函数对待训练三维目标检测模型进行模型训练,可以使得三维预测信息不断地接近于三维实际信息,如果三维预测信息越接近于三维实际信息,表明待训练三维目标检测模型对二维图像样本进行目标检测得到的目标样本的三维预测信息越准确。Step S1032: Based on the three-dimensional actual information and three-dimensional predicted information of the current target sample, use the three-dimensional information consistency loss function to perform model training on the three-dimensional target detection model to be trained. Model training of the 3D target detection model to be trained through the 3D information consistency loss function can make the 3D prediction information continuously close to the 3D actual information. If the 3D prediction information is closer to the 3D actual information, it means that the 3D target detection model to be trained is more accurate for the 2D target detection model. The more accurate the three-dimensional prediction information of the target sample obtained by target detection on dimensional image samples.
在一些实施方式中,可以通过平方损失函数来建立三维信息一致性损失函数,例如三维信息一致性损失函数可以如下式(2)所示。
In some implementations, the three-dimensional information consistency loss function can be established through a square loss function. For example, the three-dimensional information consistency loss function can be shown in the following formula (2).
公式(2)中各参数含义分别是:L2表示三维信息一致性损失函数的损失值,y2表示三维实际信息,表示三维预测信息。The meanings of each parameter in formula (2) are: L 2 represents the loss value of the three-dimensional information consistency loss function, y 2 represents the three-dimensional actual information, Represents three-dimensional prediction information.
步骤S1033:根据当前目标样本的二维检测信息与二维投影信息,采用二维信息一致性损失函数对待训练三维目标检测模型进行模型训练。需要说明的是,步骤S1033的具体方法与前述方法实施例中步骤S103所述的方法类似,在此不再进行赘述。Step S1033: Based on the two-dimensional detection information and two-dimensional projection information of the current target sample, use the two-dimensional information consistency loss function to perform model training on the three-dimensional target detection model to be trained. It should be noted that the specific method of step S1033 is similar to the method described in step S103 in the foregoing method embodiment, and will not be described again.
通过上述步骤S1031至步骤S1033,能够同时利用标注了三维实际信息的目标样本和没有标注三维实际信息的目标样本对待训练三维目标检测模型进行模型训练,显著提高了模型训练的准确性和效率。Through the above steps S1031 to S1033, target samples labeled with three-dimensional actual information and target samples not labeled with three-dimensional actual information can be used for model training of the three-dimensional target detection model to be trained, which significantly improves the accuracy and efficiency of model training.
此外,在上述步骤S103的另一些实施方式中,在根据二维检测信息与二维投影信息,采用二维信息一致性损失函数对待训练三维目标检测模型进行模型训练,得到训练好的三维目标检测模型之后,还可以利用目标样本的三维实际信息再次对训练好的三维目标检测模型进行训练,以修正训练好的三维目标检测模型,进一步提高三维目标检测模型的目标检测准确性。具体而言,参阅附图3,在本实施方式中可以通 过下列步骤S104至步骤S106对训练好的三维目标检测模型进行训练,以修正训练好的三维目标检测模型。In addition, in other implementations of the above step S103, based on the two-dimensional detection information and the two-dimensional projection information, the two-dimensional information consistency loss function is used to perform model training on the three-dimensional target detection model to be trained, and the trained three-dimensional target detection is obtained. After the model is created, the trained 3D target detection model can be trained again using the 3D actual information of the target sample to correct the trained 3D target detection model and further improve the target detection accuracy of the 3D target detection model. Specifically, referring to Figure 3, in this embodiment, it can be The trained three-dimensional target detection model is trained through the following steps S104 to step S106 to modify the trained three-dimensional target detection model.
步骤S104:判断二维图像样本的样本标签是否包含目标样本的三维实际信息;若包含,则转至步骤S105;若不包含,则转至步骤S106。Step S104: Determine whether the sample label of the two-dimensional image sample contains the actual three-dimensional information of the target sample; if it does, go to step S105; if it does not, go to step S106.
步骤S105:根据三维实际信息与三维预测信息,采用三维信息一致性损失函数对训练好的三维目标检测模型进行模型训练,得到最终的三维目标检测模型。Step S105: Based on the three-dimensional actual information and the three-dimensional predicted information, use the three-dimensional information consistency loss function to perform model training on the trained three-dimensional target detection model to obtain the final three-dimensional target detection model.
需要说明的是,步骤S105的具体方法与前述方法实施例中步骤S1032所述的方法类似,在此不再进行赘述。It should be noted that the specific method of step S105 is similar to the method described in step S1032 in the foregoing method embodiment, and will not be described again.
步骤S106:不对训练好的三维目标检测模型进行训练。Step S106: The trained three-dimensional target detection model is not trained.
需要指出的是,尽管上述实施例中将各个步骤按照特定的先后顺序进行了描述,但是本领域技术人员可以理解,为了实现本发明的效果,不同的步骤之间并非必须按照这样的顺序执行,其可以同时(并行)执行或以其他顺序执行,这些变化都在本发明的保护范围之内。It should be pointed out that although the various steps are described in a specific order in the above embodiments, those skilled in the art can understand that in order to achieve the effects of the present invention, different steps do not have to be executed in such an order. They can be executed simultaneously (in parallel) or in other sequences, and these changes are within the scope of the present invention.
本领域技术人员能够理解的是,本发明实现上述一实施例的方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读存储介质可以包括:能够携带所述计算机程序代码的任何实体或装置、介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器、随机存取存储器、电载波信号、电信信号以及软件分发介质等。需要说明的是,所述计算机可读存储介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读存储介质不包括电载波信号和电信信号。Those skilled in the art can understand that the present invention can implement all or part of the process in the method of the above-mentioned embodiment, and can also be completed by instructing relevant hardware through a computer program. The computer program can be stored in a computer-readable file. In the storage medium, when the computer program is executed by the processor, the steps of each of the above method embodiments can be implemented. Wherein, the computer program includes computer program code, which may be in the form of source code, object code, executable file or some intermediate form. The computer-readable storage medium may include: any entity or device capable of carrying the computer program code, media, USB flash drive, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory, random access memory, electrical carrier wave signals, telecommunications signals, and software distribution media, etc. It should be noted that the content contained in the computer-readable storage medium can be appropriately added or deleted according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, computer-readable storage media Storage media does not include electrical carrier signals and telecommunications signals.
进一步,本发明还提供了一种计算机设备。在根据本发明的一个计算机设备实施例中,计算机设备包括处理器和存储装置,存储装置可以被配置成存储执行上述方法实施例的三维目标检测方法的程序,处理器可以被配置成用于执行存储装置中的程序,该程序包括但不限于执行上述方法实施例的三维目标检测方法的程序。为了便于说明,仅示出了与本发明实施 例相关的部分,具体技术细节未揭示的,请参照本发明实施例方法部分。该计算机设备可以是包括各种电子设备形成的控制装置设备。Furthermore, the present invention also provides a computer device. In one embodiment of the computer equipment according to the present invention, the computer equipment includes a processor and a storage device. The storage device can be configured to store a program for executing the three-dimensional target detection method of the above method embodiment. The processor can be configured to execute The program in the storage device includes but is not limited to a program for executing the three-dimensional target detection method of the above method embodiment. For convenience of explanation, only the embodiments of the present invention are shown For parts related to the examples, if the specific technical details are not disclosed, please refer to the method part of the embodiments of the present invention. The computer device may be a control device device including various electronic devices.
进一步,本发明还提供了一种计算机可读存储介质。在根据本发明的一个计算机可读存储介质实施例中,计算机可读存储介质可以被配置成存储执行上述方法实施例的三维目标检测方法的程序,该程序可以由处理器加载并运行以实现上述三维目标检测方法。为了便于说明,仅示出了与本发明实施例相关的部分,具体技术细节未揭示的,请参照本发明实施例方法部分。该计算机可读存储介质可以是包括各种电子设备形成的存储装置设备,可选的,本发明实施例中计算机可读存储介质是非暂时性的计算机可读存储介质。Furthermore, the present invention also provides a computer-readable storage medium. In an embodiment of a computer-readable storage medium according to the present invention, the computer-readable storage medium can be configured to store a program for executing the three-dimensional target detection method of the above method embodiment. The program can be loaded and run by a processor to implement the above. Three-dimensional target detection method. For ease of explanation, only the parts related to the embodiments of the present invention are shown. If specific technical details are not disclosed, please refer to the method part of the embodiments of the present invention. The computer-readable storage medium may be a storage device formed by various electronic devices. Optionally, in the embodiment of the present invention, the computer-readable storage medium is a non-transitory computer-readable storage medium.
进一步,本发明还提供了一种车辆。在根据本发明的一个车辆实施例中,车辆可以包括上述计算机设备实施例所述的计算机设备。在本实施例中车辆可以是自动驾驶车辆、无人车等车辆。此外,按照动力源类型划分,本实施例中车辆可以是燃油车、电动车、电能与燃油混合的混动车或使用其他新能源的车辆等。Furthermore, the present invention also provides a vehicle. In a vehicle embodiment according to the present invention, the vehicle may include a computer device as described in the above computer device embodiment. In this embodiment, the vehicle may be an autonomous vehicle, an unmanned vehicle, or other vehicles. In addition, according to the type of power source, the vehicle in this embodiment may be a fuel vehicle, an electric vehicle, a hybrid vehicle that mixes electric energy with fuel, or a vehicle that uses other new energy sources.
至此,已经结合附图所示的一个实施方式描述了本发明的技术方案,但是,本领域技术人员容易理解的是,本发明的保护范围显然不局限于这些具体实施方式。在不偏离本发明的原理的前提下,本领域技术人员可以对相关技术特征作出等同的更改或替换,这些更改或替换之后的技术方案都将落入本发明的保护范围之内。 So far, the technical solution of the present invention has been described with reference to an embodiment shown in the drawings. However, those skilled in the art can easily understand that the protection scope of the present invention is obviously not limited to these specific embodiments. Without departing from the principles of the present invention, those skilled in the art can make equivalent changes or substitutions to relevant technical features, and technical solutions after these modifications or substitutions will fall within the protection scope of the present invention.

Claims (10)

  1. 一种三维目标检测方法,其特征在于,所述方法包括:A three-dimensional target detection method, characterized in that the method includes:
    通过三维目标检测模型对二维图像进行目标检测,获取所述二维图像中待检测目标的三维信息;Perform target detection on a two-dimensional image through a three-dimensional target detection model, and obtain three-dimensional information of the target to be detected in the two-dimensional image;
    其中,所述三维目标检测模型通过下列方式训练得到:Wherein, the three-dimensional target detection model is trained in the following way:
    通过待训练三维目标检测模型对二维图像样本进行目标检测,获取所述二维图像样本中目标样本的二维检测信息和三维预测信息;Perform target detection on the two-dimensional image sample through the three-dimensional target detection model to be trained, and obtain the two-dimensional detection information and three-dimensional prediction information of the target sample in the two-dimensional image sample;
    对所述三维预测信息进行投影,得到二维投影信息;Project the three-dimensional prediction information to obtain two-dimensional projection information;
    根据所述二维检测信息与所述二维投影信息,采用二维信息一致性损失函数对所述待训练三维目标检测模型进行模型训练,得到训练好的三维目标检测模型。According to the two-dimensional detection information and the two-dimensional projection information, the two-dimensional information consistency loss function is used to perform model training on the three-dimensional target detection model to be trained, and a trained three-dimensional target detection model is obtained.
  2. 根据权利要求1所述的三维目标检测方法,其特征在于,“根据所述二维检测信息与所述二维投影信息,采用二维信息一致性损失函数对所述待训练三维目标检测模型进行模型训练,得到训练好的三维目标检测模型”的步骤具体包括:The three-dimensional target detection method according to claim 1, characterized in that, "according to the two-dimensional detection information and the two-dimensional projection information, a two-dimensional information consistency loss function is used to perform the three-dimensional target detection model to be trained. Model training, the steps to obtain a trained 3D target detection model specifically include:
    根据所述二维图像样本的样本标签分别确定每个所述目标样本是否有三维实际信息;Determine whether each of the target samples has three-dimensional actual information according to the sample labels of the two-dimensional image samples;
    若当前目标样本有三维实际信息,则根据所述当前目标样本的三维实际信息与所述三维预测信息,采用三维信息一致性损失函数对所述待训练三维目标检测模型进行模型训练;If the current target sample has three-dimensional actual information, then based on the three-dimensional actual information of the current target sample and the three-dimensional prediction information, the three-dimensional information consistency loss function is used to perform model training on the three-dimensional target detection model to be trained;
    若当前目标样本没有三维实际信息,则根据所述当前目标样本的二维检测信息与所述二维投影信息,采用二维信息一致性损失函数对所述待训练三维目标检测模型进行模型训练。If the current target sample does not have three-dimensional actual information, then based on the two-dimensional detection information of the current target sample and the two-dimensional projection information, a two-dimensional information consistency loss function is used to perform model training on the three-dimensional target detection model to be trained.
  3. 根据权利要求1所述的三维目标检测方法,其特征在于,在“根据所述二维检测信息与所述二维投影信息,采用二维信息一致性损失函数对所述待训练三维目标检测模型进行模型训练,得到训练好的三维目标检测模型”的步骤之后,所述方法还包括通过下列方式对所述训练好的三维目标检测模型进行训练,以修正所述训练好的三维目标检测模型:The three-dimensional target detection method according to claim 1, characterized in that, "according to the two-dimensional detection information and the two-dimensional projection information, a two-dimensional information consistency loss function is used to train the three-dimensional target detection model to be trained After the step of "carrying out model training to obtain a trained three-dimensional target detection model", the method further includes training the trained three-dimensional target detection model in the following manner to modify the trained three-dimensional target detection model:
    判断所述二维图像样本的样本标签是否包含所述目标样本的三维实 际信息;Determine whether the sample label of the two-dimensional image sample contains the three-dimensional real object of the target sample international information;
    若包含,则根据所述三维实际信息与所述三维预测信息,采用三维信息一致性损失函数对所述训练好的三维目标检测模型进行模型训练,得到最终的三维目标检测模型;If included, then use the three-dimensional information consistency loss function to perform model training on the trained three-dimensional target detection model based on the three-dimensional actual information and the three-dimensional predicted information to obtain the final three-dimensional target detection model;
    若不包含,则不对所述训练好的三维目标检测模型进行训练。If it is not included, the trained three-dimensional target detection model will not be trained.
  4. 根据权利要求1所述的三维目标检测方法,其特征在于,“通过待训练三维目标检测模型对二维图像样本进行目标检测,获取所述二维图像样本中目标样本的二维检测信息和三维预测信息”的步骤具体包括:The three-dimensional target detection method according to claim 1, characterized in that: "Perform target detection on two-dimensional image samples through the three-dimensional target detection model to be trained, and obtain the two-dimensional detection information and three-dimensional target samples in the two-dimensional image samples. The steps for "predicting information" specifically include:
    通过所述待训练三维目标检测模型对二维图像样本进行目标检测,得到所述目标样本的二维检测框;Perform target detection on two-dimensional image samples through the three-dimensional target detection model to be trained, and obtain the two-dimensional detection frame of the target sample;
    根据所述二维检测框的二维检测信息和三维预测信息,分别确定所述目标样本的二维检测信息和三维预测信息。According to the two-dimensional detection information and the three-dimensional prediction information of the two-dimensional detection frame, the two-dimensional detection information and the three-dimensional prediction information of the target sample are respectively determined.
  5. 根据权利要求2或3所述的三维目标检测方法,其特征在于,所述方法还包括:The three-dimensional target detection method according to claim 2 or 3, characterized in that the method further includes:
    采用平方损失函数建立所述二维信息一致性损失函数;Using a square loss function to establish the two-dimensional information consistency loss function;
    和/或,and / or,
    采用平方损失函数建立所述三维信息一致性损失函数。A square loss function is used to establish the three-dimensional information consistency loss function.
  6. 根据权利要求2或3所述的三维目标检测方法,其特征在于,所述三维预测信息与所述三维实际信息均至少包括目标样本的三维坐标、尺寸和方向角。The three-dimensional target detection method according to claim 2 or 3, characterized in that both the three-dimensional predicted information and the three-dimensional actual information include at least the three-dimensional coordinates, size and direction angle of the target sample.
  7. 根据权利要求1至4中任一项所述的三维目标检测方法,其特征在于,所述方法还包括通过单目相机获取所述二维图像样本。The three-dimensional target detection method according to any one of claims 1 to 4, characterized in that the method further includes acquiring the two-dimensional image sample through a monocular camera.
  8. 一种计算机设备,包括处理器和存储装置,所述存储装置适于存储多条程序代码,其特征在于,所述程序代码适于由所述处理器加载并运行以执行权利要求1至7中任一项所述的三维目标检测方法。A computer device, comprising a processor and a storage device, the storage device being adapted to store a plurality of program codes, characterized in that the program codes are adapted to be loaded and run by the processor to execute claims 1 to 7 The three-dimensional target detection method described in any one of the above.
  9. 一种计算机可读存储介质,其中存储有多条程序代码,其特征在 于,所述程序代码适于由处理器加载并运行以执行权利要求1至7中任一项所述的三维目标检测方法。A computer-readable storage medium in which multiple pieces of program code are stored, characterized by: Therefore, the program code is adapted to be loaded and run by a processor to perform the three-dimensional target detection method according to any one of claims 1 to 7.
  10. 一种车辆,其特征在于,所述车辆包括权利要求8所述的计算机设备。 A vehicle, characterized in that the vehicle includes the computer device of claim 8.
PCT/CN2023/100354 2022-06-28 2023-06-15 Three-dimensional object detection method, computer device, storage medium, and vehicle WO2024001804A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210749012.9A CN115205846A (en) 2022-06-28 2022-06-28 Three-dimensional target detection method, computer device, storage medium, and vehicle
CN202210749012.9 2022-06-28

Publications (1)

Publication Number Publication Date
WO2024001804A1 true WO2024001804A1 (en) 2024-01-04

Family

ID=83577421

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/100354 WO2024001804A1 (en) 2022-06-28 2023-06-15 Three-dimensional object detection method, computer device, storage medium, and vehicle

Country Status (2)

Country Link
CN (1) CN115205846A (en)
WO (1) WO2024001804A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115205846A (en) * 2022-06-28 2022-10-18 安徽蔚来智驾科技有限公司 Three-dimensional target detection method, computer device, storage medium, and vehicle

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111079619A (en) * 2019-12-10 2020-04-28 北京百度网讯科技有限公司 Method and apparatus for detecting target object in image
CN111563415A (en) * 2020-04-08 2020-08-21 华南理工大学 Binocular vision-based three-dimensional target detection system and method
CN114359892A (en) * 2021-12-09 2022-04-15 北京大学深圳研究生院 Three-dimensional target detection method and device and computer readable storage medium
CN115205846A (en) * 2022-06-28 2022-10-18 安徽蔚来智驾科技有限公司 Three-dimensional target detection method, computer device, storage medium, and vehicle

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111079619A (en) * 2019-12-10 2020-04-28 北京百度网讯科技有限公司 Method and apparatus for detecting target object in image
CN111563415A (en) * 2020-04-08 2020-08-21 华南理工大学 Binocular vision-based three-dimensional target detection system and method
CN114359892A (en) * 2021-12-09 2022-04-15 北京大学深圳研究生院 Three-dimensional target detection method and device and computer readable storage medium
CN115205846A (en) * 2022-06-28 2022-10-18 安徽蔚来智驾科技有限公司 Three-dimensional target detection method, computer device, storage medium, and vehicle

Also Published As

Publication number Publication date
CN115205846A (en) 2022-10-18

Similar Documents

Publication Publication Date Title
CN111325796B (en) Method and apparatus for determining pose of vision equipment
CN109188457B (en) Object detection frame generation method, device, equipment, storage medium and vehicle
US11482014B2 (en) 3D auto-labeling with structural and physical constraints
JP2022514974A (en) Object detection methods, devices, electronic devices, and computer programs
US20220076082A1 (en) Cross-modal sensor data alignment
US11475628B2 (en) Monocular 3D vehicle modeling and auto-labeling using semantic keypoints
CN111091023B (en) Vehicle detection method and device and electronic equipment
WO2024001804A1 (en) Three-dimensional object detection method, computer device, storage medium, and vehicle
CN113888458A (en) Method and system for object detection
WO2022126522A1 (en) Object recognition method, apparatus, movable platform, and storage medium
CN112753038A (en) Method and device for identifying lane change trend of vehicle
US20230368407A1 (en) Drivable area detection method, computer device, storage medium, and vehicle
WO2021189420A1 (en) Data processing method and device
CN109598199B (en) Lane line generation method and device
CN116052100A (en) Image sensing method, computer device, computer-readable storage medium, and vehicle
CN112991388B (en) Line segment feature tracking method based on optical flow tracking prediction and convex geometric distance
CN111383337B (en) Method and device for identifying objects
CN114386481A (en) Vehicle perception information fusion method, device, equipment and storage medium
Zhang Target-based calibration of 3D LiDAR and binocular camera on unmanned vehicles
Zhang et al. 3D car-detection based on a Mobile Deep Sensor Fusion Model and real-scene applications
CN112712062A (en) Monocular three-dimensional object detection method and device based on decoupling truncated object
CN117475397B (en) Target annotation data acquisition method, medium and device based on multi-mode sensor
JP2018097588A (en) Three-dimensional space specifying device, method, and program
WO2024045942A1 (en) Ambient information sensing method, apparatus, and system, computer device, and storage medium
CN116343143A (en) Target detection method, storage medium, road side equipment and automatic driving system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23829985

Country of ref document: EP

Kind code of ref document: A1