CN117710588A

CN117710588A - Three-dimensional target detection method based on visual ranging priori information

Info

Publication number: CN117710588A
Application number: CN202410025204.4A
Authority: CN
Inventors: 吴军; 靳龙; 黄硕; 连劲松; 郭润夏; 杜海龙; 陈玖圣
Original assignee: Civil Aviation University of China
Current assignee: Civil Aviation University of China
Priority date: 2024-01-08
Filing date: 2024-01-08
Publication date: 2024-03-15

Abstract

A three-dimensional target detection method based on visual ranging prior information. A three-dimensional target detection system is established; calibrating internal parameters and external parameters of a camera; time synchronization of the camera and the laser radar; constructing a target image sample dataset: establishing a target detection model; acquiring preprocessed target point cloud data; calculating the depth of a target to be detected; and finishing the three-dimensional target detection and other steps. The three-dimensional target detection method based on visual ranging prior information provided by the invention has the following beneficial effects: the method can solve the problem of determining the depth of the target in the view cone on the premise of knowing the prior information of the target, greatly reduces the range of target detection, improves the detection precision, and is suitable for three-dimensional target detection under complex scenes such as multiple obstacles, blocked targets and the like.

Description

Three-dimensional target detection method based on visual ranging priori information

Technical Field

The invention belongs to the technical field of three-dimensional target detection, and particularly relates to a three-dimensional target detection method based on visual ranging priori information.

Background

Currently, three-dimensional target detection is a popular research direction in the fields of computer vision and automatic driving, and the main methods of three-dimensional target detection are laser point cloud-based and multi-sensor-based methods. The laser point cloud-based method mostly adopts a point cloud deep learning method, and the deep learning method has strong generalization capability and precision, but is difficult to acquire a large-scale and diversified point cloud data set, and the labeling work of the point cloud data is complex and time-consuming. In the method based on multiple sensors, the most commonly used sensors are a camera and a laser radar, because the camera image has rich semantic information, the point cloud data of the laser radar can provide distance information of objects, and the defects of the objects can be overcome by combining the two sensors, so that the target detection research of the laser radar and the camera is gradually becoming a research hot spot.

The existing three-dimensional target detection method based on the fusion of the camera and the laser radar firstly uses a mature two-dimensional detection model to detect an object in an image and determine a boundary box of the object, then generates a corresponding view cone in point cloud, and then carries out further three-dimensional target detection aiming at point cloud data in the view cone.

However, due to the lack of depth information in the image, further detection and localization tasks within the view cone are often limited. Thus, determining the target depth within the view cone is critical to ultimately enabling three-dimensional target detection. However, no relevant document has been found.

Disclosure of Invention

In order to solve the problems, the invention aims to provide a three-dimensional target detection method based on visual ranging prior information.

In order to achieve the above object, the three-dimensional object detection method based on visual ranging prior information provided by the invention comprises the following steps in sequence:

step 1) establishing a three-dimensional target detection system consisting of a laser radar, a camera and a computer: the computer is respectively connected with the laser radar and the camera, and the laser radar and the camera are fixed by using a fixing device;

step 2) establishing a projection model of a camera in the three-dimensional target detection system according to the pinhole imaging model; calibrating the internal parameters of the camera by means of a checkerboard calibration plate according to a homography matrix mapping principle and a nonlinear optimization principle, and calibrating the external parameters of the camera according to the corresponding relation between the 3D point cloud and the 2D image characteristics;

step 3) adding time stamp information to the point cloud data and the image when the laser radar and the camera are used for collecting the point cloud data and the image, and aligning the time stamps of the point cloud data and the image by adopting a time synchronization algorithm, thereby completing time synchronization of the camera and the laser radar;

step 4) acquiring enough target images from different depths and different angles by adopting a camera, wherein the comprehensiveness of the target images is ensured and the target images are not repeated; screening target images to reserve high-quality images, then manually labeling, labeling the positions of targets by using square labeling frames to obtain labeled target images, forming a target image sample dataset by all the labeled target images, and then dividing the target image sample dataset into a training set and a testing set according to proportion:

step 5) constructing an original two-dimensional target detection model by using a deep learning YOLOv7 algorithm model, and then respectively training, evaluating and optimizing the original two-dimensional target detection model by using the training set and the testing set obtained in the step 4), thereby constructing a target detection model; then inputting the original target image acquired by the camera into the target detection model, and outputting a two-dimensional detection frame and a category of the target by the target detection model;

step 6) combining the camera internal parameters and external parameters obtained in the step 2), carrying out preprocessing including depth screening, camera field screening and ground point cloud removal on the original point cloud data corresponding to the original target image acquired by the camera in the step 5) at the same moment, and obtaining preprocessed target point cloud data;

step 7) obtaining a camera focal length according to the camera internal parameters obtained in the step 2), determining the image width of the target to be detected by utilizing the target detection model obtained in the step 5), and calculating the depth of the target to be detected according to a camera imaging principle by combining the camera focal length and the image width of the target to be detected and the prior length of the target to be detected;

and 8) utilizing the camera internal parameters and external parameters obtained in the step 2), the two-dimensional detection frame of the target obtained in the step 5) and the preprocessed target point cloud data obtained in the step 6), and combining the depth of the target to be detected obtained in the step 7), so as to complete three-dimensional target detection.

In the step 1), the laser radar is point cloud acquisition equipment, and a solid-state laser radar or a mechanical laser radar is adopted; the camera is image acquisition equipment, and a CMOS camera or a CCD camera is adopted; the computer is a data processing device.

In step 2), a projection model of a camera in the three-dimensional target detection system is built according to the pinhole imaging model; the calibration of the camera internal parameters is completed according to the homography matrix mapping principle and the nonlinear optimization principle by means of the checkerboard calibration plate, and the method for completing the calibration of the camera external parameters according to the corresponding relation between the 3D point cloud and the 2D image features is as follows:

the camera internal parameter calibration is realized by a calibration tool box in Matlab or a calibration function in OpenCV;

the method comprises the steps of respectively acquiring images and point cloud data of a plurality of groups of checkerboard calibration plates under different angles and distances by using a camera and a laser radar which are subjected to internal reference calibration, taking a laser radar coordinate system as a world coordinate system, extracting pixel coordinates and world coordinates of characteristic points in the checkerboard calibration plates in a computer, and calculating camera external parameters according to the corresponding relation of the characteristic points in the checkerboard calibration plates under the pixel coordinate system and the world coordinate system, wherein the method comprises the following steps: extracting pixel coordinates of feature points in the checkerboard calibration plate image by using an OpenCV library, extracting world coordinates of the feature points in the checkerboard calibration plate point cloud data by using a PCL library, and then establishing the following equation according to the relation between the pixel coordinates of the feature points and the world coordinates:

wherein s is the scale factor of the camera, [ u ] _i v _i ] ^T And [ X ] _i Y _i Z _i ] ^T The coordinates of the characteristic points of the checkerboard calibration plate under the pixel coordinate system and the world coordinate system are respectively obtained, M is an internal reference matrix of the camera, and an external reference matrix [ R T ] of the camera is calculated according to the above formula]。

In step 3), the time synchronization algorithm employs a nearest neighbor matching method.

In the step 4), the labeling adopts a labellmg. Exe labeling method; the ratio of the training set to the testing set is 8:2; objects include aircraft wings, aircraft tails, aircraft engines, and objects of measurable physical length in automobiles.

In step 5), the original two-dimensional target detection model is built by using the deep learning YOLOv7 algorithm model, and then the training set and the testing set obtained in step 4) are used for respectively training, evaluating and optimizing the original two-dimensional target detection model, so that the method for building the target detection model is as follows:

training an original two-dimensional target detection model by using the training set obtained in the step 4), continuously adjusting model parameters to improve detection performance by iterative optimization of a loss function, then evaluating the trained two-dimensional target detection model by using a test set, calculating detection accuracy and average accuracy to evaluate accuracy and effect of the model, optimizing the two-dimensional target detection model according to an evaluation result, and finally selecting the two-dimensional target detection model with the best detection performance as a final target detection model.

In step 6), the method for obtaining preprocessed target point cloud data by combining the camera internal parameters and external parameters obtained in step 2) and performing preprocessing including depth screening, camera field screening and ground point cloud rejection on the original point cloud data corresponding to the original target image acquired by the camera in step 5) is as follows:

setting a depth threshold of an interested region of a target for original point cloud data of the target to be detected, removing the original point cloud data outside the depth threshold range, and obtaining the point cloud data after depth screening; then, projecting the point cloud data subjected to depth screening onto an image plane of the camera by utilizing the camera internal parameters and external parameters obtained in the step 2) to obtain coordinates of corresponding pixel points, judging whether the coordinates of each pixel point are positioned in the camera view field range, and reserving the pixel points positioned in the camera view field range to obtain the point cloud data in the camera view field; and finally, fitting out the ground point cloud by using a random sampling consistency method, and removing the ground point cloud from the point cloud data in the camera view field to obtain preprocessed target point cloud data.

In step 7), the focal length of the camera is calculated according to the camera obtained in step 2), then the image width of the target to be detected is determined by using the target detection model obtained in step 5), and then the depth of the target to be detected is calculated according to the camera imaging principle by combining the focal length of the camera and the image width of the target to be detected with the prior length of the target to be detected, wherein the method comprises the following steps:

the internal reference matrix M of the camera obtained in step 2) is:

wherein dx and dy are pixel sizes of the camera, which can be obtained from factory parameters of the camera, and f/dx is f _x The camera focal length f can be found by:

f＝f _x ·dx

acquiring an image of the object to be detected by using a camera and inputting the image into the object detection model obtained in the step 5), obtaining a two-dimensional detection frame of the object to be detected, and marking the pixel length of the two-dimensional detection frame in the horizontal direction as a pixel width W, wherein the image width W of the object to be detected is represented as the following formula:

W＝w·dx

measuring the actual length of a target to be detected by using a laser range finder, taking the length as the priori length L of the target to be detected, and solving the depth d of the target to be detected according to the imaging principle of a camera by the following formula:

according to the relation between the camera focal length f, the image width W and the pixel size dx, the above formula can be further simplified into the following formula:

in step 8), the method for completing three-dimensional target detection by using the camera internal parameters and external parameters obtained in step 2), the two-dimensional detection frame of the target obtained in step 5) and the preprocessed target point cloud data obtained in step 6) and combining the target depth to be detected obtained in step 7) is as follows:

converting the two-dimensional detection frame of the target obtained in the step 5) and the preprocessed target point cloud data obtained in the step 6) into a camera coordinate system by utilizing the camera internal parameters and external parameters obtained in the step 2), and selecting a three-dimensional point cloud view cone by the two-dimensional detection frame of the target; according to the depth d of the target to be detected obtained in the step 7), setting two depth thresholds in combination with the prior three-dimensional size of the target, and shrinking the three-dimensional view cone by the two depth thresholds; finally, clustering and dividing the point cloud data in the reduced three-dimensional view cone by using an European clustering algorithm, and determining the target point cloud according to the prior length L of the target to be detected, thereby completing the three-dimensional target detection.

The three-dimensional target detection method based on visual ranging prior information provided by the invention has the following beneficial effects: the method can solve the problem of determining the depth of the target in the view cone on the premise of knowing the prior information of the target, greatly reduces the range of target detection, improves the detection precision, and is suitable for three-dimensional target detection under complex scenes such as multiple obstacles, blocked targets and the like.

Drawings

Fig. 1 is a flow chart of a three-dimensional target detection method based on visual ranging prior information.

Detailed Description

The three-dimensional object detection method based on the visual ranging prior information provided by the invention is described in detail below with reference to the accompanying drawings and specific embodiments. The drawings are for reference and description only and do not limit the scope of the invention.

As shown in fig. 1, the three-dimensional target detection method based on visual ranging prior information provided by the invention comprises the following steps in sequence:

the laser radar is point cloud acquisition equipment, and adopts a solid laser radar or a mechanical laser radar; the camera is image acquisition equipment, and a CMOS camera or a CCD camera is adopted; the computer is a data processing device;

the time synchronization algorithm adopts a nearest neighbor matching method.

the labeling adopts a labellmg. Exe labeling method; the ratio of the training set to the testing set is 8:2; objects include aircraft wings, aircraft tails, aircraft engines, and objects of measurable physical length in automobiles.

Step 6) combining the camera internal parameters and external parameters obtained in the step 2), carrying out preprocessing including depth screening, camera field of view (FOV) screening and ground point cloud rejection on the original point cloud data corresponding to the original target image acquired by the camera in the step 5) at the same time to obtain preprocessed target point cloud data;

setting a depth threshold of an interested region of a target for original point cloud data of the target to be detected, removing the original point cloud data outside the depth threshold range, and obtaining the point cloud data after depth screening; then, projecting the point cloud data subjected to depth screening onto an image plane of the camera by utilizing the camera internal parameters and external parameters obtained in the step 2) to obtain coordinates of corresponding pixel points, judging whether the coordinates of each pixel point are positioned in the camera view field range, and reserving the pixel points positioned in the camera view field range to obtain the point cloud data in the camera view field; and finally, fitting out the ground point cloud by using a random sampling consistency (random sample consensus, RANSAC) method, and removing the ground point cloud from the point cloud data in the field of view of the camera to obtain preprocessed target point cloud data.

the internal reference matrix M of the camera obtained in step 2) is:

f＝f _x ·dx

W＝w·dx

step 8) converting the two-dimensional detection frame of the target obtained in the step 5) and the preprocessed target point cloud data obtained in the step 6) into a camera coordinate system by utilizing the camera internal parameters and external parameters obtained in the step 2), and selecting a three-dimensional point cloud view cone by the two-dimensional detection frame of the target; according to the depth d of the target to be detected obtained in the step 7), setting two depth thresholds in combination with the prior three-dimensional size of the target, and shrinking the three-dimensional view cone by the two depth thresholds; finally, clustering and dividing the point cloud data in the reduced three-dimensional view cone by using an European clustering algorithm, and determining the target point cloud according to the prior length L of the target to be detected, thereby completing the three-dimensional target detection.

Claims

1. A three-dimensional target detection method based on visual ranging prior information is characterized by comprising the following steps of: the three-dimensional target detection method based on the visual ranging prior information comprises the following steps of:

step 1), a three-dimensional target detection system consisting of a laser radar, a camera and a computer is established; the computer is respectively connected with the laser radar and the camera, and the laser radar and the camera are fixed by using a fixing device;

2. The three-dimensional object detection method based on visual ranging prior information according to claim 1, wherein: in the step 1), the laser radar is point cloud acquisition equipment, and a solid-state laser radar or a mechanical laser radar is adopted; the camera is image acquisition equipment, and a CMOS camera or a CCD camera is adopted; the computer is a data processing device.

3. The three-dimensional object detection method based on visual ranging prior information according to claim 1, wherein: in step 2), a projection model of a camera in the three-dimensional target detection system is built according to the pinhole imaging model; the calibration of the camera internal parameters is completed according to the homography matrix mapping principle and the nonlinear optimization principle by means of the checkerboard calibration plate, and the method for completing the calibration of the camera external parameters according to the corresponding relation between the 3D point cloud and the 2D image features is as follows:

4. The three-dimensional object detection method based on visual ranging prior information according to claim 1, wherein: in step 3), the time synchronization algorithm employs a nearest neighbor matching method.

5. The three-dimensional object detection method based on visual ranging prior information according to claim 1, wherein: in the step 4), the labeling adopts a labellmg. Exe labeling method; the ratio of the training set to the testing set is 8:2; objects include aircraft wings, aircraft tails, aircraft engines, and objects of measurable physical length in automobiles.

6. The three-dimensional object detection method based on visual ranging prior information according to claim 1, wherein: in step 5), the original two-dimensional target detection model is built by using the deep learning YOLOv7 algorithm model, and then the training set and the testing set obtained in step 4) are used for respectively training, evaluating and optimizing the original two-dimensional target detection model, so that the method for building the target detection model is as follows:

7. The three-dimensional object detection method based on visual ranging prior information according to claim 1, wherein: in step 6), the method for obtaining preprocessed target point cloud data by combining the camera internal parameters and external parameters obtained in step 2) and performing preprocessing including depth screening, camera field screening and ground point cloud rejection on the original point cloud data corresponding to the original target image acquired by the camera in step 5) is as follows:

8. The three-dimensional object detection method based on visual ranging prior information according to claim 1, wherein: in step 7), the focal length of the camera is calculated according to the camera obtained in step 2), then the image width of the target to be detected is determined by using the target detection model obtained in step 5), and then the depth of the target to be detected is calculated according to the camera imaging principle by combining the focal length of the camera and the image width of the target to be detected with the prior length of the target to be detected, wherein the method comprises the following steps:

the internal reference matrix M of the camera obtained in step 2) is:

f＝f _x ·dx

W＝w·dx

9. the three-dimensional object detection method based on visual ranging prior information according to claim 1, wherein: in step 8), the method for completing three-dimensional target detection by using the camera internal parameters and external parameters obtained in step 2), the two-dimensional detection frame of the target obtained in step 5) and the preprocessed target point cloud data obtained in step 6) and combining the target depth to be detected obtained in step 7) is as follows: