CN114495064A - Monocular depth estimation-based vehicle surrounding obstacle early warning method - Google Patents

Monocular depth estimation-based vehicle surrounding obstacle early warning method Download PDF

Info

Publication number
CN114495064A
CN114495064A CN202210104631.2A CN202210104631A CN114495064A CN 114495064 A CN114495064 A CN 114495064A CN 202210104631 A CN202210104631 A CN 202210104631A CN 114495064 A CN114495064 A CN 114495064A
Authority
CN
China
Prior art keywords
depth
training
obstacle
pixel
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210104631.2A
Other languages
Chinese (zh)
Inventor
辛海同
蔡登�
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202210104631.2A priority Critical patent/CN114495064A/en
Publication of CN114495064A publication Critical patent/CN114495064A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/2431Multiple classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/277Analysis of motion involving stochastic approaches, e.g. using Kalman filters
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • G06T7/85Stereo camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle
    • G06T2207/30261Obstacle

Abstract

The invention discloses a vehicle surrounding obstacle early warning method based on monocular depth estimation, which comprises the following steps: (1) acquiring image data, and generating a pixel label, a depth map label and a 3D target label required by training in the image data to form a training data set; (2) establishing a 3D target detection model based on monocular depth estimation; (3) training the 3D target detection model by using a training data set; (4) in the process of early warning the obstacles, detecting the obstacles of continuous frames by using a 3D target detection model obtained by training optimization; (5) constructing a tracking model, and tracking the corresponding obstacles in the continuous frames by using a Hungarian maximum matching algorithm; (6) and establishing a Kalman filtering model related to the space position and the speed of the obstacle, and finally obtaining space position information of the tracked obstacle and judging whether collision danger exists or not through a filtering algorithm. By using the method and the device, the precision of vehicle obstacle early warning can be improved on the premise of saving cost.

Description

Monocular depth estimation-based vehicle surrounding obstacle early warning method
Technical Field
The invention belongs to the field of 3D target detection in computer vision, and particularly relates to a vehicle surrounding obstacle early warning method based on monocular depth estimation.
Background
Vehicle intellectualization makes the driver assistance function become the indispensable function of middle and high-end motorcycle type. In order to ensure driving safety, the detection of obstacles around the vehicle and the early warning of the driver become one of the core functions of the auxiliary driving function. The obstacle early warning system judges whether a safety accident occurs or not by calculating the relative distance and speed relation between the obstacle and the vehicle and timely reminds a driver to avoid danger.
In order to obtain the state information of the vehicle front obstacle, Kunsoo Hu et al, An article on American control reference in 1999, "A Experimental information of a CW/CA system for automatic using a discrete-in-the-loop theory", obtains a discrete millimeter wave radar measurement system equation by a sampling discretization method, and optimally calculates the system state including the target vehicle distance and relative speed by using a second-order Kalman filter, but the detection precision of the method is not high. In order to improve the Detection accuracy of millimeter wave Radar, Anselm Haselhoff et al, in the article of "radio-Vision Fusion with an Application to Car-focusing using an Improved adaptive Detection Algorithm" on the IEEE Intelligent transfer Systems Conference Conference of 2007, proposed the use of millimeter wave Radar and Vision Fusion to detect obstacles. The method comprises the steps of firstly using a millimeter wave radar to detect a 3D space candidate object of an obstacle in advance, then obtaining an interested area on an image through the information, and finally verifying a detection result of the millimeter wave radar through an AdaBoost classifier. However, the method depends heavily on the detection result of the millimeter wave radar, and if a threat target is missed in the millimeter wave radar candidate region, the target cannot be detected again in subsequent operations. In order to solve the problem, a Monocular image and Millimeter Wave Radar three-level fusion strategy is proposed in an article of Integrating Millimeter Wave Radar with a single annular Vision Sensor for On-Road observer Detection Applications On 2011 Sensors, such as W Tao, firstly, coordinate systems of a Millimeter Wave Radar and a camera are calibrated, then, a Millimeter Wave Radar Detection area is locked for Obstacle Detection, and then, a corresponding image area is used for verification of a Detection target. In addition, Xin Liu et al, 2011 in the IEEE International Conference On temporal Electronics and Safety article "On-road detection function front and vision" proposed a cross-validation method to detect obstacles. The method uses a special shadow segmentation method to detect an image, then carries out verification matching with a millimeter wave radar detection result obtained in the same frame, and carries out verification on an unmatched millimeter wave radar object by using visual data again.
In addition to monocular camera Fusion, the colloid Sensing by Stereo Vision and Radar Fusion published by Shunguang Wu et al in IEEE Transactions on Intelligent transfer Systems 2009 proposed the use of depth camera and millimeter wave Radar Fusion to detect obstacles. The method comprises the steps of firstly fitting the closest point of the outline of the threatened obstacle in depth vision, then fusing the closest point with a millimeter wave radar detection result, and finally tracking the closest point of the outline of the fused obstacle by using rigid body constraint so as to obtain the spatial position and the motion state of the threatened obstacle.
Compared with millimeter wave radar, laser radar has far detection distance and precision. In the process of acquiring the state information of surrounding obstacles, for the purpose of achieving real-time performance, Alex H.Lang and the like propose a form of encoding Point cloud features into a bird's-eye view pseudo image to perform target Detection in a 3D space in a convolution mode in an article of pointpilars, Fast Encoders for Object Detection From Point Clouds, published in the international top meeting IEEE Conference on Computer Vision and Pattern Recognition in 2019. In order to improve the precision under the condition of keeping the real-time performance unchanged, Zedong Yang and the like remove an FP module in the universal Point cloud feature learning method by using a mode of fusing and downsampling an F-FPS and a D-FPS in a 3DSSD (three-dimensional space-Based) 3D Single Stage Object Detector. By reducing the calculation amount of feature extraction and maintaining the accuracy, a quite good effect is achieved in the 3D target detection task. However, the use of the laser radar for the obstacle warning function in the auxiliary driving is expensive, and is not suitable in practical situations.
In summary, in consideration of cost, the solution of integrating the laser radar and the sensor is not very suitable for mass production of vehicles with driving assistance function, and due to the equipment limitation of the sensor, the millimeter wave radar alone cannot achieve good obstacle detection effect.
Disclosure of Invention
The invention provides a vehicle surrounding obstacle early warning method based on monocular depth estimation, which not only can save the actual production cost, but also can achieve the precision required by the actual application of the vehicle obstacle early warning function.
A vehicle surrounding obstacle early warning method based on monocular depth estimation comprises the following steps:
(1) acquiring image data, wherein the image data comprises camera calibration parameters and point cloud data in the same frame as the image; generating a pixel label, a depth map label and a 3D target label required by training in image data to form a training data set;
(2) establishing a 3D target detection model based on monocular depth estimation;
(3) training and testing the 3D target detection model by using a training data set to finally obtain a 3D target detection model after training optimization;
(4) in the process of early warning the obstacles, detecting the obstacles of continuous frames by using a 3D target detection model obtained by training optimization;
(5) constructing a tracking model, and tracking the corresponding obstacles in the continuous frames by using a Hungarian maximum matching algorithm;
(6) and establishing a Kalman filtering model related to the space position and the speed of the obstacle, finally obtaining space position information of the tracked obstacle through a filtering algorithm, and judging whether the collision danger exists or not by taking the space position information as a distance reference.
The invention discloses a method for early warning obstacles around a vehicle, and relates to a 3D target detection method based on monocular depth estimation. By using the monocular camera as the sensor, the cost is saved, and with the development of the monocular depth estimation method, the error of the depth obtained by using the monocular estimation method is extremely small in a short distance range, so that the space position information of the detected obstacle has extremely high confidence.
In the step (1), calculating a depth z value of a pixel point corresponding to point cloud data in a camera coordinate system by using camera calibration parameters and point cloud data in the same frame as an image, and taking the z value as a true value of pixel depth; and setting the default value of the depth value of the pixel point which is not matched with the point cloud as 0 so as to obtain the depth map label of the monocular image.
In the step (2), in the 3D target detection model, DenseNet121 is used as a backhaul for image feature extraction, and a BTS depth estimation model is used for predicting the depth value of each pixel point on the basis of the extracted image features; meanwhile, generating an interested pixel set by an interested pixel proposal module on the basis of the extracted image characteristics; and finally, outputting the 3D space position, size and category of the barrier obtained by regression by using the simplified single-stage 3D detection head and taking the pseudo laser point generated by the interested pixel as input.
In the step (3), the process of training the 3D target detection model by using the training data set is as follows:
(3-1) randomly scrambling the training data set, and then simultaneously performing data enhancement on the image, the pixel label, the 3D label and the depth map label according to the random horizontal inversion of 50%;
(3-2) inputting the training data set into a 3D target detection network according to the number of preset pictures with the size of BatchSize, predicting the depth value of each pixel point through a network depth regression head corresponding to a BTS depth estimation model, and generating interested pixel points which are most likely to be obstacles through an interested area module corresponding to an interested pixel proposal module;
(3-3) taking the interested pixel points and the depth values thereof as input, and converting the interested pixel points and the depth values into corresponding spatial coordinate points through camera calibration parameters; inputting the generated spatial coordinate point into the spatial position and size of a regression barrier in a 3D regression head corresponding to the 3D target detection head and predicting the category of the regression barrier, and reducing the Euclidean distance between a pixel point with a true depth value and the predicted depth and the Euclidean distance between the interested category of the pixel and the predicted value thereof as much as possible in the process of training the 3D target detection head, and simultaneously reducing the Euclidean distance between the spatial position, size and category of the barrier and the predicted value as much as possible;
and (3-4) repeating the step (3-1) to the step (3-3), and finishing training after the preset training times are reached.
In the step (3-2), the target function of the network depth regression head training is a scale invariant loss function in a log space, and the formula is as follows:
Figure BDA0003493536520000051
where T denotes the number of pixel points with depth truth values, λ is a hyperparameter whose value is set to 0.5, giAnd expressing the Euclidean distance between the depth predicted value and the truth value in the log space, wherein the specific calculation formula is as follows:
Figure BDA0003493536520000052
wherein the content of the first and second substances,
Figure BDA0003493536520000053
and diThe estimated depth value and the depth true value are respectively represented, and because the scene has more depth true value pixel points, the final loss function of the network depth regression head is defined as follows:
Figure BDA0003493536520000054
where α is set to 10 in the training as the loss weight control amount.
The region of interest module sets a pixel class cross entropy loss function to constrain the network during training, wherein the loss function is defined as:
Figure BDA0003493536520000055
wherein y represents a pixel type, the values of which are 0 and 1, and represents a background pixel point and an obstacle pixel point respectively.
Figure BDA0003493536520000056
And expressing the pixel point category predicted value.
In step (3-3), the training objective function of the 3D regression head includes a classification loss function and a regression loss function, and the formula is:
Figure BDA0003493536520000061
Figure BDA0003493536520000062
wherein L iscTo classify the loss, PiIn order to predict the probability of the ith class, K represents the number of predicted classes, and the method mainly comprises two classes: cars and others, so that K is set to 2, yiRepresenting a category of the class; l isrThe regression Loss of the target space position is determined by using SmoothL1Loss function, beta is a hyperparameter and is set to be 0.1, muiAnd
Figure BDA0003493536520000066
true and predicted values.
In the step (6), in the established Kalman filtering model, the observed quantities are x, y, z, h, w, l and theta, wherein x, y and z correspond to the spatial position of the obstacle, h, w and l correspond to the size of the obstacle, and theta corresponds to the orientation of the obstacle; the predicted quantity is xp、yp、zp、hp、wp、lp、θp、υx、υyAnd upsilonzNamely the spatial position, size, orientation and speed in three directions of the obstacle after filtering;
when a Kalman filtering model is established, assuming that prediction noise and observation noise are both subjected to normal distribution, the specific setting is as follows:
Figure BDA0003493536520000063
Figure BDA0003493536520000064
Figure BDA0003493536520000065
Figure BDA0003493536520000071
q represents a prediction noise covariance matrix in a Kalman filtering model, K represents an observation noise covariance matrix, F represents a state transition matrix, and H represents an observation matrix.
Compared with the prior art, the invention has the following beneficial effects:
1. the method for early warning the obstacles around the vehicle based on monocular depth estimation can accurately detect the space positions of the obstacles in a relatively close range.
2. The invention uses the monocular camera to detect the space position of the surrounding obstacles, thereby reducing the cost of mass production of vehicles.
3. According to the invention, through the maximum matching algorithm and the Kalman filtering algorithm, more accurate obstacle position information can be obtained.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a network framework diagram of a 3D object detection model in the method of the present invention;
FIG. 3 results of the present invention detecting vehicle obstacles under different road monocular cameras in a KITTI dataset.
Detailed Description
The invention will be described in further detail below with reference to the drawings and examples, which are intended to facilitate the understanding of the invention without limiting it in any way.
As shown in fig. 1, the obstacle early warning function of the present invention is mainly divided into two modules, an obstacle detection module (3D target detection model) and an obstacle tracking module, and on the basis of the detection result, the obstacle tracking module tracks the obstacle and finally obtains the ID and distance information of the obstacle.
The network framework of the 3D object detection model based on monocular depth estimation is shown in fig. 2, and the network is an end-to-end trained network model. When the model is trained, input data are images, camera calibration parameters corresponding to the images and point cloud data of the same frame as the images. The basic training steps are as follows:
1. labels required for network model training are generated from the raw data and include pixel labels, image depth map labels, and 3D object labels. Wherein the pixel label indicates whether the pixel point is the pixel of interest. The pixel label is generated through the 2D frame label, wherein the label of the pixel point in the 2D frame is set to be 1 to represent the interested pixel point, and the pixel point outside the 2D frame is set to be 0. The image depth map label is obtained through laser point cloud data, the laser point cloud data is firstly converted into coordinates under a camera coordinate system through camera calibration external parameters, then the coordinates are converted into pixel coordinates according to camera internal parameters, the depth value of a pixel point corresponding to the laser point is obtained, and the depth map numerical value of the pixel point which is not matched with the laser point is set to be 0.
2. Initializing network model parameters, sending image data into image feature extraction backbone network extraction scales to respectively sample feature maps of 2 times, 4 times, 8 times and 16 times, and then sending the four feature maps into a depth estimation module and an interested pixel proposal module respectively.
3. In the Depth Estimation module, Depth values are estimated using the BTS Depth Estimation model mentioned in the paper From Big to Small Multi-Scale Local Planar guide for cellular Depth Estimation. After obtaining the estimated depth value of the pixel, on one hand, the label is used for optimizing the model, and on the other hand, the depth information is used as the input of the 3D target detection head for the next calculation.
4. In an interested pixel proposal module, each scale feature map is firstly subjected to sampling and splicing to the size of an original image to obtain a feature map with multi-scale information, the feature map is used as input, a prediction head is used for predicting the pixel category, and the top 4096 pixel points with higher scores are used as interested pixel points. Finally, the interested pixel points and the characteristics thereof are sent to a 3D target detection head for next calculation
And 5.3D target detection heads receive the interested pixel points, the depth maps and the characteristics of the interested pixel points, convert the interested pixel points into points in a world coordinate system through camera calibration internal parameters and external parameters, and obtain 256 candidate points and characteristics thereof through the modes of characteristic distance farthest point downsampling, grouping and characteristic learning. Finally, a detection head is used to predict the 3D frame and the category thereof by taking the 256 candidate points and the characteristics thereof as input.
6. And circularly traversing the training data set for a plurality of times to finally obtain a convergent monocular obstacle detection network.
After the detection result is obtained, the obstacles need to be tracked, and the maximum obstacle matching between the continuous frames is carried out through the Hungarian algorithm. Meanwhile, in order to obtain a stable tracking result, a prediction error is reduced through Kalman filtering.
Fig. 3 shows the effect of detecting vehicle obstacles in three road scenes by using the monocular 3D object, and it is obvious that the method of the present invention can comprehensively detect the vehicles around the own vehicle and can more accurately predict the distance between the surrounding vehicles and the own vehicle.
In the embodiment, training and testing are performed on a large public data set KITTI data set, wherein a monocular Object Detection network trains on a KITTI 3D Object Detection Evaluation 2017, and the data set is divided into a training set and a verification set, which respectively comprise 3712 pictures and 3769 pictures. The invention carries out experimental verification of detection and Tracking on three scenes of a KITTI Object Tracking Evaluation data set.
Criteria used in the present invention are Precision (Precision) and Recall (Recall). The target detection algorithm is used for detecting and tracking three road scenes of KITTI, and the results are shown in table 1.
TABLE 1
Figure BDA0003493536520000091
As can be seen from table 1, what is proposed by the present invention fully demonstrates the effectiveness of the present invention in obstacle detection and tracking.
The embodiments described above are intended to illustrate the technical solutions and advantages of the present invention, and it should be understood that the above-mentioned embodiments are only specific embodiments of the present invention, and are not intended to limit the present invention, and any modifications, additions and equivalents made within the scope of the principles of the present invention should be included in the scope of the present invention.

Claims (8)

1. A vehicle surrounding obstacle early warning method based on monocular depth estimation is characterized by comprising the following steps:
(1) acquiring image data, wherein the image data comprises camera calibration parameters and point cloud data in the same frame as the image; generating a pixel label, a depth map label and a 3D target label required by training in image data to form a training data set;
(2) establishing a 3D target detection model based on monocular depth estimation;
(3) training and testing the 3D target detection model by using a training data set to finally obtain a 3D target detection model after training optimization;
(4) in the process of early warning the obstacles, detecting the obstacles of continuous frames by using a 3D target detection model obtained by training optimization;
(5) constructing a tracking model, and tracking the corresponding obstacles in the continuous frames by using a Hungarian maximum matching algorithm;
(6) and establishing a Kalman filtering model related to the space position and the speed of the obstacle, finally obtaining space position information of the tracked obstacle through a filtering algorithm, and judging whether the collision danger exists or not by taking the space position information as a distance reference.
2. The monocular depth estimation-based vehicle surrounding obstacle early warning method according to claim 1, wherein in the step (1), a depth z value of a pixel point corresponding to point cloud data in a camera coordinate system is calculated by using camera calibration parameters and point cloud data in the same frame as an image, and the z value is used as a pixel depth true value; and setting the default value of the depth value of the pixel point which is not matched with the point cloud as 0 so as to obtain the depth map label of the monocular image.
3. The monocular depth estimation-based vehicle surrounding obstacle warning method as recited in claim 1, wherein in step (2), in the 3D object detection model, a DenseNet121 is used as a backhaul for image feature extraction, and a BTS depth estimation model is used to predict a depth value of each pixel point on the basis of the extracted image feature; meanwhile, generating an interested pixel set by an interested pixel proposal module on the basis of the extracted image characteristics; and finally, outputting the 3D space position, size and category of the barrier obtained by regression by using the simplified single-stage 3D detection head and taking the pseudo laser point generated by the interested pixel as input.
4. The monocular depth estimation-based vehicle surrounding obstacle warning method according to claim 1, wherein in the step (3), the training of the 3D target detection model using the training data set is performed as follows:
(3-1) randomly scrambling the training data set, and then simultaneously performing data enhancement on the image, the pixel label, the 3D label and the depth map label according to the random horizontal inversion of 50%;
(3-2) inputting the training data set into a 3D target detection network according to the preset picture number of the BatchSize, predicting the depth value of each pixel point through a network depth regression head corresponding to a BTS depth estimation model, and generating interested pixel points which are most likely to be obstacles through an interested area module corresponding to an interested pixel proposal module;
(3-3) taking the interested pixel points and the depth values thereof as input, and converting the interested pixel points and the depth values into corresponding spatial coordinate points through camera calibration parameters; inputting the generated spatial coordinate point to the spatial position and size of a regression obstacle in a 3D regression head corresponding to the 3D target detection head and predicting the category of the regression obstacle, and reducing the Euclidean distance between a pixel point with a depth truth value and the predicted depth and the Euclidean distance between the pixel interested category and the predicted value thereof as much as possible in the process of training the 3D target detection head, and simultaneously reducing the Euclidean distance between the spatial position, the size and the category of the obstacle and the predicted value as much as possible;
and (3-4) repeating the step (3-1) to the step (3-3), and finishing training after the preset training times are reached.
5. The monocular depth estimation-based vehicle surrounding obstacle warning method according to claim 4, wherein in the step (3-2), the objective function trained by the network depth regression head is a scale invariant loss function in a log space, and the formula is as follows:
Figure FDA0003493536510000031
where T denotes the number of pixel points with true depth values, λ is a hyperparameter whose value is set to 0.5, giThe Euclidean distance in log space from the depth predicted value and the truth value is represented, and the specific calculation formula is as follows:
Figure FDA0003493536510000032
wherein the content of the first and second substances,
Figure FDA0003493536510000033
and diRespectively representing estimated depth value and depth true value, and the network depth regression head is the most important due to more depth true value pixel points in the sceneThe final loss function is defined as:
Figure FDA0003493536510000034
where α is set to 10 in the training as the loss weight control amount.
6. The monocular depth estimation-based vehicle surrounding obstacle warning method according to claim 4, wherein in the step (3-2), the region of interest module sets a pixel class cross entropy loss function to constrain the network during training, and the loss function is defined as:
Figure FDA0003493536510000035
wherein y represents a pixel category with values of 0 and 1, respectively representing background points and obstacle pixel points,
Figure FDA0003493536510000036
and expressing the pixel point category predicted value.
7. The monocular depth estimation-based vehicle surrounding obstacle warning method of claim 4, wherein in step (3-3), the training objective function of the 3D regression head comprises a classification loss function and a regression loss function, and the formula is as follows:
Figure FDA0003493536510000037
Figure FDA0003493536510000038
wherein L iscTo classify the loss, PiTo predict the probability of being of class i, K representsThe number of predicted types mainly comprises two types in the method: cars, and others, so K is set to 2, yiRepresenting a category of the class; l isrThe regression Loss of the target space position is determined by using SmoothL1Loss function, beta is a hyperparameter and is set to be 0.1, muiAnd
Figure FDA0003493536510000041
true and predicted values.
8. The monocular depth estimation-based vehicle surrounding obstacle early warning method according to claim 1, wherein in the step (6), in the established kalman filter model, the observed quantities are x, y, z, h, w, l and θ, wherein x, y and z correspond to the spatial position of the obstacle, h, w and l correspond to the size of the obstacle, and θ corresponds to the orientation of the obstacle; the predicted quantity is xp、yp、zp、hp、wp、lp、θp、υx、υyAnd upsilonzI.e. spatial position, size, orientation and velocity in three directions of the obstacles after filtering;
when a Kalman filtering model is established, assuming that prediction noise and observation noise are both subjected to normal distribution, the specific setting is as follows:
Figure FDA0003493536510000042
Figure FDA0003493536510000043
Figure FDA0003493536510000044
Figure FDA0003493536510000045
q represents a prediction noise covariance matrix in a Kalman filtering model, K represents an observation noise covariance matrix, F represents a state transition matrix, and H represents an observation matrix.
CN202210104631.2A 2022-01-28 2022-01-28 Monocular depth estimation-based vehicle surrounding obstacle early warning method Pending CN114495064A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210104631.2A CN114495064A (en) 2022-01-28 2022-01-28 Monocular depth estimation-based vehicle surrounding obstacle early warning method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210104631.2A CN114495064A (en) 2022-01-28 2022-01-28 Monocular depth estimation-based vehicle surrounding obstacle early warning method

Publications (1)

Publication Number Publication Date
CN114495064A true CN114495064A (en) 2022-05-13

Family

ID=81477134

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210104631.2A Pending CN114495064A (en) 2022-01-28 2022-01-28 Monocular depth estimation-based vehicle surrounding obstacle early warning method

Country Status (1)

Country Link
CN (1) CN114495064A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI817578B (en) * 2022-06-22 2023-10-01 鴻海精密工業股份有限公司 Assistance method for safety driving, electronic device and computer-readable storage medium
TWI817580B (en) * 2022-06-22 2023-10-01 鴻海精密工業股份有限公司 Assistance method for safety driving, electronic device and computer-readable storage medium
TWI817579B (en) * 2022-06-22 2023-10-01 鴻海精密工業股份有限公司 Assistance method for safety driving, electronic device and computer-readable storage medium
WO2024011557A1 (en) * 2022-07-15 2024-01-18 深圳市正浩创新科技股份有限公司 Map construction method and device and storage medium
WO2024045030A1 (en) * 2022-08-29 2024-03-07 中车株洲电力机车研究所有限公司 Deep neural network-based obstacle detection system and method for autonomous rail rapid transit

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI817578B (en) * 2022-06-22 2023-10-01 鴻海精密工業股份有限公司 Assistance method for safety driving, electronic device and computer-readable storage medium
TWI817580B (en) * 2022-06-22 2023-10-01 鴻海精密工業股份有限公司 Assistance method for safety driving, electronic device and computer-readable storage medium
TWI817579B (en) * 2022-06-22 2023-10-01 鴻海精密工業股份有限公司 Assistance method for safety driving, electronic device and computer-readable storage medium
WO2024011557A1 (en) * 2022-07-15 2024-01-18 深圳市正浩创新科技股份有限公司 Map construction method and device and storage medium
WO2024045030A1 (en) * 2022-08-29 2024-03-07 中车株洲电力机车研究所有限公司 Deep neural network-based obstacle detection system and method for autonomous rail rapid transit

Similar Documents

Publication Publication Date Title
CN110942449B (en) Vehicle detection method based on laser and vision fusion
CN112417967B (en) Obstacle detection method, obstacle detection device, computer device, and storage medium
CN112396650B (en) Target ranging system and method based on fusion of image and laser radar
WO2020052540A1 (en) Object labeling method and apparatus, movement control method and apparatus, device, and storage medium
CN114495064A (en) Monocular depth estimation-based vehicle surrounding obstacle early warning method
CN110738121A (en) front vehicle detection method and detection system
JP2021523443A (en) Association of lidar data and image data
Rawashdeh et al. Collaborative automated driving: A machine learning-based method to enhance the accuracy of shared information
CN114359181B (en) Intelligent traffic target fusion detection method and system based on image and point cloud
CN108645375B (en) Rapid vehicle distance measurement optimization method for vehicle-mounted binocular system
US20230213643A1 (en) Camera-radar sensor fusion using local attention mechanism
CN113850102B (en) Vehicle-mounted vision detection method and system based on millimeter wave radar assistance
Liu et al. Vehicle detection and ranging using two different focal length cameras
CN112906777A (en) Target detection method and device, electronic equipment and storage medium
CN112683228A (en) Monocular camera ranging method and device
CN114325634A (en) Method for extracting passable area in high-robustness field environment based on laser radar
CN117111055A (en) Vehicle state sensing method based on thunder fusion
Tsai et al. Accurate and fast obstacle detection method for automotive applications based on stereo vision
Fu et al. Camera-based semantic enhanced vehicle segmentation for planar lidar
KR100962329B1 (en) Road area detection method and system from a stereo camera image and the recording media storing the program performing the said method
CN112733678A (en) Ranging method, ranging device, computer equipment and storage medium
Liu et al. Research on security of key algorithms in intelligent driving system
CN111612818A (en) Novel binocular vision multi-target tracking method and system
CN114648639B (en) Target vehicle detection method, system and device
CN114384486A (en) Data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination