CN117576665B - Automatic driving-oriented single-camera three-dimensional target detection method and system - Google Patents

Automatic driving-oriented single-camera three-dimensional target detection method and system Download PDF

Info

Publication number
CN117576665B
CN117576665B CN202410077692.3A CN202410077692A CN117576665B CN 117576665 B CN117576665 B CN 117576665B CN 202410077692 A CN202410077692 A CN 202410077692A CN 117576665 B CN117576665 B CN 117576665B
Authority
CN
China
Prior art keywords
depth
dimensional
uncertainty
target
predicted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202410077692.3A
Other languages
Chinese (zh)
Other versions
CN117576665A (en
Inventor
徐小龙
周鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Posts and Telecommunications filed Critical Nanjing University of Posts and Telecommunications
Priority to CN202410077692.3A priority Critical patent/CN117576665B/en
Publication of CN117576665A publication Critical patent/CN117576665A/en
Application granted granted Critical
Publication of CN117576665B publication Critical patent/CN117576665B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/50Depth or shape recovery
    • G06T7/55Depth or shape recovery from multiple images
    • G06T7/593Depth or shape recovery from multiple images from stereo images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a single-camera three-dimensional target detection method and system for automatic driving, wherein the method comprises the following steps: inputting the obtained monocular image into a feature extraction network, and outputting a two-dimensional detection result; cutting out RoI features of the two-dimensional detection result by adopting a RoIAlign method; connecting the coordinate graph normalized by the monocular image with the map of each cut-out RoI feature in a channel mode to form a final RoI feature; predicting three-dimensional detection information according to the final RoI characteristics; calculating the target depth by adopting a geometric projection formula from the predicted two-dimensional frame height in the two-dimensional detection result and the predicted three-dimensional frame height in the three-dimensional detection information; calculating the target depth by directly solving the depth and a geometric projection formula in the three-dimensional detection information, and obtaining the final depth through uncertainty weighted fusion; and fusing the predicted three-dimensional detection information with the weighted fusion to obtain the final depth, and outputting the predicted information of the target.

Description

Automatic driving-oriented single-camera three-dimensional target detection method and system
Technical Field
The invention relates to a single-camera three-dimensional target detection method and system for automatic driving, and belongs to the technical field of three-dimensional target detection.
Background
Three-dimensional object detection has been an important problem in automatic driving, and its main task is to identify the three-dimensional position of the vehicle, vehicle size information and yaw angle by calculation.
In the computer vision application scene facing to automatic driving, a three-dimensional target detection algorithm for identifying three-dimensional space information of a vehicle is important. In three-dimensional spatial information, depth estimation is the most important branch. However, it is very difficult in theory to accurately acquire depth information of a target from a single camera, and inaccuracy of depth prediction is a main cause of performance degradation. The existing single-camera three-dimensional target detection method for automatic driving mainly comprises a method based on radar, a method based on pre-training depth and a direct regression method, wherein the two methods are seriously dependent on additional information, and the calculation and labor costs are high. In recent years, researchers of computer vision propose a plurality of methods based on direct regression, so that research cost is greatly reduced, and detection speed is improved.
However, most of these methods are single depth estimation methods, and in model training, depth is estimated directly by using a neural network by using texture information of a vehicle or by using altitude information through a geometric projection formula, and image information cannot be comprehensively utilized.
Disclosure of Invention
The invention aims to provide a single-camera three-dimensional target detection method and system for automatic driving, which aim to solve the defect that most of the existing methods are single-depth estimation methods and cannot comprehensively utilize image information and predict inaccuracy.
An automatic driving-oriented single-camera three-dimensional target detection method, comprising the following steps:
inputting the obtained monocular image into a feature extraction network, and outputting a two-dimensional detection result;
cutting out RoI features of the two-dimensional detection result by adopting a RoIAlign method;
Connecting the coordinate graph normalized by the monocular image with the map of each cut-out RoI feature in a channel mode to form a final RoI feature;
Predicting three-dimensional detection information according to the final RoI characteristics;
Calculating the target depth by adopting a geometric projection formula from the predicted two-dimensional frame height in the two-dimensional detection result and the predicted three-dimensional frame height in the three-dimensional detection information;
calculating the target depth by directly solving the depth and a geometric projection formula in the three-dimensional detection information, and obtaining the final depth through uncertainty weighted fusion;
And fusing the predicted three-dimensional detection information with the weighted fusion to obtain the final depth, and outputting the predicted information of the target.
Further, the two-dimensional detection result includes four parts:
heatmap predicting class scores of targets and coarse coordinates of the centers of the 2D frames;
Offset_2d, predicting the Offset of the 3Dbounding box center point projection and the 2Dbounding box center coordinates after downsampling;
size_2D: height and width of 2D frame, unit pixels;
residual_2D:2Dbounding box center coordinates downsampled residual.
Further, the three-dimensional detection information includes:
Angle: the angle prediction output is divided into 24 intervals by adopting a multi-bin strategy, wherein the first 12 are used for classifying the prediction output, and the second 12 are used for regression prediction output;
Direct_depth, namely directly predicting depth information of a target by using a feature extraction network, and outputting two columns of information, wherein the first column is a depth value, and the second column is uncertain;
the offset_3D:3Dbounding box center point projects the residual after downsampling;
Size_3d:3Dbounding box, actually predicting the deviation of the size, and adding the average size of the targets in the dataset to the predicted deviation to obtain the predicted size;
deviation of predicted Depth, offset of predicted Depth of cut-off target.
Further, the loss function of the feature extraction network is:
Initially setting weights of the two-dimensional detection portion, the three-dimensional detection portion/> ;/> being expressed as an overall loss; and/> is denoted as loss of each predicted branch.
Further, the method for calculating the target depth by directly solving the depth and geometric projection formula in the three-dimensional detection information and obtaining the final depth by uncertainty weighted fusion is as follows:
direct depth estimation is performed at the final RoI feature:
Wherein is a predicted branch in three-dimensional information for estimating depth and uncertainty; the direct depth estimation result is denoted by/> , the set parameter is denoted by/> , and the heteroscedastic random uncertainty in modeling depth estimation is denoted by/> ;
Bringing the height of the three-dimensional frame obeying the Laplace distribution into a geometric projection formula, and predicting the depth according to the geometric projection as follows:
Wherein denotes a focal length,/> denotes a two-dimensional frame height,/> obeys a standard laplace distribution/> ,/> denotes a three-dimensional frame height,/> denotes a scale parameter,/> denotes a mean of the three-dimensional frame heights;
Meanwhile, the depth deviation obeying the Laplace distribution ,/> is also predicted in the three-dimensional detection information, and the depth and uncertainty of the final geometric projection prediction are obtained by utilizing the additivity of the Laplace distribution:
,/>
Where ,/>,/> denotes the variance of the depth deviation,/> denotes the mean of the depth deviation,/> is based on the uncertainty of the geometric projection; the/> is expressed as depth based on geometric projection; the/> is denoted/> ;
Fusing the direct depth found on the RoI feature with the depth/> based on geometric projection using uncertainty guidance; weight/> formula:
Wherein denotes direct depth estimation,/> denotes geometry projection based depth estimation,/> denotes uncertainty of direct depth estimation and geometry projection based depth estimation; the/> represents the sum of squares of uncertainty of the direct depth estimation and the depth estimation based on geometrical projection; the/> is expressed as the uncertainty of the direct depth estimation or the uncertainty based on the geometry projection depth;
the final target depth and uncertainty/> calculation formula:
,/>
because the target depth also obeys the laplace distribution, the loss function expression for the target depth information is:
Where represents a tag true value,/> represents a target depth,/> represents uncertainty,/> represents two depth estimates, and represents uncertainty corresponding to the depth estimate.
Further, the predicting three-dimensional detection information according to the final RoI characteristic includes:
And carrying out convolution, group normalization, activation, adaptive average pooling and convolution operation on the RoI characteristic, and outputting predicted three-dimensional detection information.
Further, the feature extraction network comprises a DLA-34 main network and a Neck network, wherein the DLA-34 main network adopts a CENTERNET framework, the DLA-34 main network is used for inputting the last 4 layers of feature graphs of the output 6 layers of feature graphs into a Neck network, and the Neck network outputs one layer of feature graphs of the input 4 layers of feature graphs as a two-dimensional detection result.
Further, the predicted information of the target includes three-dimensional center point coordinates, a size, and a yaw angle.
Further, the RoI features include only object level features and no background noise.
The second aspect of the invention provides a single-camera three-dimensional target detection system for automatic driving, which comprises:
The feature extraction module is used for acquiring a monocular image, inputting the monocular image into the feature extraction network and outputting a two-dimensional detection result;
the feature clipping module is used for clipping RoI features of the two-dimensional detection result by adopting a RoIAlign method;
the normalization module is used for connecting the coordinate graph normalized by the monocular image with the map of each cut RoI feature in a channel mode to form a final RoI feature;
the three-dimensional detection module is used for predicting three-dimensional detection information according to the final RoI characteristics;
The algorithm module is used for calculating the target depth by adopting a geometric projection formula from the predicted two-dimensional frame height in the two-dimensional detection result and the predicted three-dimensional frame height in the three-dimensional detection information;
The uncertainty fusion module is used for calculating the target depth from the depth directly obtained from the three-dimensional detection information and the geometric projection formula, and obtaining the final depth through uncertainty weighted fusion;
and the fusion module is used for fusing the predicted three-dimensional detection information with the weighted fusion to obtain the final depth and outputting the predicted information of the target.
Compared with the prior art, the invention has the beneficial effects that:
1. The method integrates direct depth estimation and geometric projection-based depth estimation through uncertainty guidance, comprehensively utilizes the texture and geometric characteristics of the image, provides more accurate depth estimation, and has better robustness;
2. The invention distributes higher weight values to branches with unstable depth prediction through depth fusion, which is helpful for improving the stability of the whole depth estimation;
3. In order to better assist the three-dimensional detection task, two-dimensional detection task branches are added, group normalization is carried out in each channel, position information among the channels can be reserved, better learning of spatial information in three-dimensional target detection is facilitated, and a network training process is accelerated by using group normalization;
4. The invention adopts two-stage detection, carries out further detection on RoI characteristics, is faster than most single-stage methods, and has higher detection precision than the detection method of each category at present on the premise of ensuring the real-time requirement of three-dimensional target detection of the automatic driving single camera.
Drawings
FIG. 1 is a three-dimensional spatial information diagram of a detection target of the method of the present invention;
FIG. 2 is a schematic diagram of a network structure of the method of the present invention;
FIG. 3 is a schematic diagram of a network prediction branch of the method of the present invention.
Detailed Description
The invention is further described in connection with the following detailed description, in order to make the technical means, the creation characteristics, the achievement of the purpose and the effect of the invention easy to understand.
Example 1
The invention discloses a single-camera three-dimensional target detection method for automatic driving, wherein three-dimensional space information is shown in fig. 1, and the method comprises the following steps:
inputting the obtained monocular image into a feature extraction network, and outputting a two-dimensional detection result;
cutting out RoI features of the two-dimensional detection result by adopting a RoIAlign method;
Connecting the coordinate graph normalized by the monocular image with the map of each cut-out RoI feature in a channel mode to form a final RoI feature;
Predicting three-dimensional detection information according to the final RoI characteristics;
Calculating the target depth by adopting a geometric projection formula from the predicted two-dimensional frame height in the two-dimensional detection result and the predicted three-dimensional frame height in the three-dimensional detection information;
calculating the target depth by directly solving the depth and a geometric projection formula in the three-dimensional detection information, and obtaining the final depth through uncertainty weighted fusion;
Fusing the predicted three-dimensional detection information with a weighted fusion to obtain a final depth, and outputting predicted information of a target, wherein the predicted information of the target comprises three-dimensional center point coordinates , a size/> and a yaw angle/> ;
The feature extraction network comprises a DLA-34 main network and a Neck network, wherein the DLA-34 main network adopts a CENTERNET framework, the DLA-34 main network is used for inputting the last 4 layers of feature graphs of the output 6 layers of feature graphs into a Neck network, and the Neck network outputs one layer of feature graphs of the input 4 layers of feature graphs as a two-dimensional detection result.
Here the task of three-dimensional object detection is decoupled. For monocular images, the task of three-dimensional object detection is to find each object of interest in the picture, estimate the class and three-dimensional box of the object, and the main object in the KITTI dataset is an automobile. The three-dimensional box information is divided into three-dimensional center point coordinates , size/> , and yaw/> of the target, as shown in fig. 1. After the target depth/> was found, the projection points/> at the center of the three-dimensional frame were used to find/> and/> using the following formula:
,/>
Wherein is the principal point, and/() is the focal length, so that the 3D center point can be predicted. The size and yaw angle are output by other associated predicted branches.
The main predictive branches of the present invention are shown in figure 2.
1) The prediction branch is divided into a two-dimensional detection part and a three-dimensional detection part, the three-dimensional detection is realized on the basis of RoI characteristics, and finally a final three-dimensional frame is formed according to two-dimensional detection and three-dimensional detection information. The group normalization module sets num_groups=32, num_channels=256, eps default values 1e-5. The two-dimensional center can guide regression of the three-dimensional projection center point, and the two-dimensional task and the three-dimensional task are connected, and learning of different tasks is mutually promoted through the connection. The two-dimensional bounding box width and height prediction branches of the two-dimensional detection module can enable the model to learn some characteristics helpful for depth estimation, and the two-dimensional detection module is required for a three-dimensional detection task because an object generally appears as a near large and a far small on the graph based on an imaging principle. The two-dimensional detection output is improved on the basis of CENTERNET, the back 4-layer characteristic diagram output by the Backbone is fed into Neck, and finally the final-layer characteristic diagram is output as the output of the whole network, wherein the final-layer characteristic diagram comprises four parts.
Heatmap predicting the class score of the object and the coarse coordinates of the 2D box center, supervising the coarse coordinates with a 3Dbounding box center projection, which helps to perceive 3D geometry information and is associated with the task of estimating the 3D object center.
Offset_2d predicts the Offset (/ > ,/>) of the 3Dbounding box center point projections and the 2Dbounding box center coordinates after downsampling (s=4).
The size_2d:2d frame height and width (, unit pixels.
Residual_2D:2Dbounding box center coordinates downsampled residual (,/>).
2) To better focus each object, the RoI features are extracted using RoIAlign clipping, the computed normalized graph is connected together with each RoI feature map in a channel fashion to obtain final RoI features, and some information of the three-dimensional box is predicted using the extracted final RoI features.
Angle: the angle prediction output is divided into 24 intervals by adopting a multi-bin strategy, wherein the first 12 are used for classifying the prediction output, and the last 12 are used for regression prediction output.
Direct_depth, which is to directly predict the depth information of a target, namely the target distance z (depth) under a camera coordinate system, by using a backbone neural network model. Two columns of information are output, the first column being depth values and the second column being uncertainty (log variance form).
The offset_3D:3Dbounding box center point projection is the post-downsampled residual (/ > ,).
Size_3d: size information of 3Dbounding box, what is actually predicted is the deviation of the size, and the predicted deviation is added to the average size of the targets in the dataset/> to obtain the predicted size.
Deviation value of predicted Depth, which can make up the deviation of truncated target Depth prediction.
The invention is two-stage detection, the task of the two-dimensional detection stage is a front-end task of 3D detection, and the final depth estimation task depends on the front-end two-dimensional detection and the three-dimensional detection task. The total loss function is:
Initially setting weights of the two-dimensional detection portion, the three-dimensional detection portion/> ;/> being expressed as an overall loss; and/> is denoted as loss of each predicted branch. And observing the learning condition of each task and the local change trend of the loss function of the front-end task by using a hierarchical task learning strategy. If the pre-task tends to converge, the weight of the task will also be increased. As the task progresses, the weight of the 3D detection branch gradually increases from 0to 1. The loss weight of each item can dynamically reflect the learning condition of the pre-task, so that training is more stable.
In the method of the invention, the main process is solving the depth, and the specific process is shown in fig. 3, and the steps are as follows:
1) The direct depth estimation based on uncertainty theory relies on the appearance of the target and surrounding pixels, the RoI features contain only object features, not background noise. Direct depth estimation is performed herein on the RoI features:
Branch is used to estimate depth and uncertainty. Estimating depth using an inverse Sigmoid transform on the first channel, mapping the continuous range to a positive range; the/> indicates a setting parameter, where/> is a small number to ensure stability of the value, and e-6 is taken in this example. And/> denotes the heteroscedastic uncertainty in the modeling depth estimate.
2) In geometric projection, it is assumed herein that the three-dimensional box height of the target is subject to laplace distribution , the parameters enabling end-to-end prediction by the size_3d branch:
,/>
Is a standard laplace distribution/> . Thus, the loss function of 3D height can be expressed as:
The loss function makes the predicted target/> as close as possible to true height/> , which can make the network learn more accurate height predictions. Adding regularization term/> facilitates joint optimization of the height and uncertainty predictions.
3) Bringing the 3D height subject to the laplace distribution into a geometric projection formula:
Wherein denotes a focal length,/> denotes a two-dimensional frame height,/> obeys a standard laplace distribution/> ,/> denotes a three-dimensional frame height,/> denotes a scale parameter,/> denotes a mean of the three-dimensional frame heights;
The projection depth is therefore also subject to the laplace distribution, with a mean value of depth , standard deviation/> . The network in turn predicts a depth bias to help achieve more accurate depth results.
The same depth deviation is still subject to the laplace distribution ,/>. The additivity of the laplace distribution is exploited, so the depth and uncertainty of the final geometric projection prediction is:
,/>
Where denotes the variance of the depth deviation,/> denotes the mean of the depth deviation,/> is the uncertainty based on the geometric projection,/> denotes the depth based on the geometric projection; the/> is denoted/> ;
4) The direct depth found on the RoI features and the depth based on geometric projection/> are fused together using uncertainty guidance. Weight/> formula:
Wherein denotes direct depth estimation,/> denotes depth estimation based on geometric projection; the/> is expressed as the uncertainty of the direct depth estimation or the uncertainty based on the geometry projection depth;
5) The final target depth and uncertainty/> calculation formula:
,/>
since here the depth also obeys the laplace distribution, the loss function of depth:
The overall loss can enable the predicted depth result to be more approximate to the real depth value, and uncertainty of the height and depth deviation of the three-dimensional frame is trained in the optimization process. The depth fusion formula dynamically assigns weights by observing changes, more favoring training unstable depth prediction branches, and depth estimates with higher uncertainty get higher weights, meaning that even if one estimate has higher uncertainty, it can still have some impact on the final depth estimate, which helps to improve the overall depth estimate stability, since the higher uncertainty estimates have more impact on the final result. For example: when the uncertainty calculated based on the height calculation is larger than the uncertainty of directly estimating the depth, the network is more biased to the depth prediction based on the height, and the corresponding weight is improved, so that the prediction of the depth is comprehensively optimized, and the fault tolerance is enhanced.
Depth prediction is important in the subsequent reasoning process. The depth fusion model can well represent the uncertainty of the depth, in order to obtain the final three-dimensional frame confidence coefficient, the uncertainty of the depth after fusion is further mapped to a value between 0 and 1, the confidence coefficient of the depth is represented through an exponential function, and the depth confidence coefficient can provide more accurate confidence coefficient for each projection depth:
Let be the probability that the target is correctly detected (three-dimensional box confidence), where/> represents the classification Heatmap score and/> represents the conditional three-dimensional box confidence. Previous approaches typically use two-dimensional confidence/> as the final score and do not consider features in three-dimensional space. Or using three-dimensional box IOU modeling/> , but since the average three-dimensional box IOU of the training phase model is larger than the verification phase, this results in poor performance in the verification phase. The conditional three-dimensional box confidence is expressed herein as a deep confidence, and the final confidence is obtained using the probability chain law as:
The final score represents both the 2D detection confidence and fusion depth confidence, which can guide better reliability. The calculation process of the method introduces uncertainty of direct depth estimation and priori information of a projection model, and depth errors caused by three-dimensional frame height errors are well reflected into confidence calculation.
In this embodiment, the method and the model are tested on KITTI datasets and compared with the mainstream single-camera three-dimensional target detection method. The overall performance comparison is shown in Table 1, and our model is MonoCoDe. Wherein the best results are bolded and the next best results are skewed; e denotes a direct depth estimation, H denotes a depth estimation from height; AP is the most important index meanAveragePrecision for measuring algorithm precision in a target detection algorithm, and the experimental evaluation index is 40-point interpolation AP under the condition that a medium-difficulty sample is IoU (the intersection ratio of prediction and true value) or more under the automobile classification.
Table 1: target detection overall performance comparison result
As can be seen from table 1, the present invention performs better than other methods in the classification of automobiles (various method data are from the respective paper publication data), including methods using additional information. The automobile classification is the object of most interest in KITTI three-dimensional object detection benchmark evaluations, and the medium level is the main basis for ranking. Except for the difficulty level, the methods herein all exceeded Monocon (a direct depth estimation method using assisted learning, the 2022 monocular 3D object detection SOTA method). The methods herein perform better than other classes of SOTA models. For example, at a mid-level of three-dimensional detection, the method herein is 2.67% higher than MonoFlex (about 20% better than this method). In addition, the running speed of the method is 38fps, the requirement of real-time detection is met, the speed is much faster than that of a method by means of additional information, and the advantage of a single-camera three-dimensional target detection method which does not depend on any auxiliary information is also reflected.
Example 2
The invention also discloses a single-camera three-dimensional target detection system for automatic driving, which comprises:
The feature extraction module is used for inputting the acquired monocular image into a feature extraction network and outputting a two-dimensional detection result;
the feature clipping module is used for clipping RoI features of the two-dimensional detection result by adopting a RoIAlign method;
the normalization module is used for connecting the coordinate graph normalized by the monocular image with the map of each cut RoI feature in a channel mode to form a final RoI feature;
the three-dimensional detection module is used for predicting three-dimensional detection information according to the final RoI characteristics;
The algorithm module is used for calculating the target depth by adopting a geometric projection formula from the predicted two-dimensional frame height in the two-dimensional detection result and the predicted three-dimensional frame height in the three-dimensional detection information;
The uncertainty fusion module is used for calculating the target depth from the depth directly obtained from the three-dimensional detection information and the geometric projection formula, and obtaining the final depth through uncertainty weighted fusion;
and the fusion module is used for fusing the predicted three-dimensional detection information with the weighted fusion to obtain the final depth and outputting the predicted information of the target.
The foregoing is merely a preferred embodiment of the present invention, and it should be noted that modifications and variations could be made by those skilled in the art without departing from the technical principles of the present invention, and such modifications and variations should also be regarded as being within the scope of the invention.

Claims (8)

1. An automatic driving-oriented single-camera three-dimensional target detection method is characterized by comprising the following steps of:
inputting the obtained monocular image into a feature extraction network, and outputting a two-dimensional detection result;
cutting out RoI features of the two-dimensional detection result by adopting a RoIAlign method;
Connecting the coordinate graph normalized by the monocular image with the map of each cut-out RoI feature in a channel mode to form a final RoI feature;
Predicting three-dimensional detection information according to the final RoI characteristics;
Calculating the target depth by adopting a geometric projection formula from the predicted two-dimensional frame height in the two-dimensional detection result and the predicted three-dimensional frame height in the three-dimensional detection information;
calculating the target depth by directly solving the depth and a geometric projection formula in the three-dimensional detection information, and obtaining the final depth through uncertainty weighted fusion;
Fusing the predicted three-dimensional detection information with the final depth obtained by weighting fusion, and outputting the predicted information of the target;
the loss function of the feature extraction network is:
Lossall=∑wi·Lossi
Initially setting a weight w i =1 of the two-dimensional detection portion, the three-dimensional detection portion w i=0;Lossall being represented as an overall loss; loss i represents the Loss of each predicted branch;
The method for calculating the target depth by directly solving the depth and the geometric projection formula in the three-dimensional detection information and obtaining the final depth by uncertainty weighted fusion comprises the following steps:
direct depth estimation is performed at the final RoI feature:
σd=Head(Direct_Depth(RoI)) [1]
The Head (direct_depth) is a prediction branch in three-dimensional information and is used for estimating Depth and uncertainty, z d represents a Direct Depth estimation result, epsilon is a set parameter, and sigma d represents heteroscedastic uncertainty in modeling Depth estimation;
Bringing the height of the three-dimensional frame obeying the Laplace distribution La (mu HH) into a geometric projection formula, and predicting the depth according to the geometric projection as follows:
Wherein f represents a focal length, H 2D represents a two-dimensional frame height, X obeys a standard Laplacian distribution La (0, 1), H 3D represents a three-dimensional frame height, lambda H represents a scale parameter, and mu H represents a mean value of the three-dimensional frame height;
Meanwhile, the depth deviation obeying Laplacian distribution La (mu bb),) is also predicted in the three-dimensional detection information, and the depth and uncertainty of the final geometric projection prediction are obtained by utilizing the additivity of the Laplacian distribution:
zp=μzbp 2=σz 2b 2
Wherein σb denotes the variance of the depth deviation, μ b denotes the mean of the depth deviation, σ p is the uncertainty based on the geometric projection, and z p is the depth based on the geometric projection; sigma H is denoted as/>
Fusing the direct depth z d obtained on the RoI feature and the depth z p based on geometric projection by using uncertainty guidance; weight ω i (i=d, p) calculation formula:
ωi=σi 2/∑σj 2(j=d,p),
Where d represents the direct depth estimate, p represents the geometric projection-based depth estimate, Σσ j 2 represents the sum of squares of the uncertainty of the direct depth estimate and the geometric projection-based depth estimate; σ i is expressed as the uncertainty of the direct depth estimate or the uncertainty based on the geometry projection depth;
The final target depth z c and uncertainty σ c calculate the formula:
zc=∑ωizic 2=∑ωiσi 2
because the target depth also obeys the laplace distribution, the loss function expression for the target depth information is:
Where z * represents a tag true value, z c represents a target depth, σ c represents uncertainty, z i represents two depth estimates, and σ i represents uncertainty corresponding to the depth estimate.
2. The autopilot-oriented single-camera three-dimensional target detection method of claim 1 wherein the two-dimensional detection result comprises four parts:
Heatmap: predicting class scores of the targets and coarse coordinates of the centers of the 2D frames;
Offset_2d: predicting the offset of the 3Dbounding box center point projection and the 2Dbounding box center coordinates after downsampling;
Size_2d: height and width of the 2D frame, unit pixels;
Residual_2d:2Dbounding box center coordinates down-sampled residual.
3. The autopilot-oriented single-camera three-dimensional object detection method of claim 1 wherein the three-dimensional detection information comprises:
Angle: the angle prediction output is divided into 24 intervals by adopting a multi-bin strategy, wherein the first 12 are used for classifying the prediction output, and the second 12 are used for regression prediction output;
direct_depth: directly predicting the depth information of the target by using a feature extraction network, and outputting two columns of information, wherein the first column is a depth value, and the second column is uncertain;
Offset_3d:3Dbounding box the residual error after downsampling is projected at the center point;
Size_3d:3Dbounding box, actually predicting the deviation of the size, and adding the average size of the targets in the dataset to the predicted deviation to obtain the predicted size;
depth_bias: and (3) predicting the deviation value of the depth to make up the deviation of the truncated target depth prediction.
4. The method for detecting a three-dimensional object by using a single camera for automatic driving according to claim 1, wherein predicting three-dimensional detection information according to a final RoI characteristic comprises:
And carrying out convolution, group normalization, activation, adaptive average pooling and convolution operation on the RoI characteristic, and outputting predicted three-dimensional detection information.
5. The method for detecting the three-dimensional target of the single camera facing the automatic driving according to claim 1, wherein the feature extraction network comprises a DLA-34 main network and a Neck network, the DLA-34 main network adopts a CENTERNET framework, the DLA-34 main network is used for inputting the last 4 layers of feature graphs of the output 6 layers of feature graphs into a Neck network, and the Neck network outputs one layer of feature graphs of the input 4 layers of feature graphs as a two-dimensional detection result.
6. The autopilot-oriented single camera three-dimensional target detection method of claim 1 wherein the predicted information of the target includes three-dimensional center point coordinates, dimensions, and yaw angle.
7. The autopilot-oriented single camera three-dimensional object detection method of claim 1 wherein the RoI features include only object level features.
8. An autopilot-oriented single-camera three-dimensional target detection system, the system comprising:
The feature extraction module is used for acquiring a monocular image, inputting the monocular image into the feature extraction network and outputting a two-dimensional detection result;
the feature clipping module is used for clipping RoI features of the two-dimensional detection result by adopting a RoIAlign method;
the normalization module is used for connecting the coordinate graph normalized by the monocular image with the map of each cut RoI feature in a channel mode to form a final RoI feature;
the three-dimensional detection module is used for predicting three-dimensional detection information according to the final RoI characteristics;
The algorithm module is used for calculating the target depth by adopting a geometric projection formula from the predicted two-dimensional frame height in the two-dimensional detection result and the predicted three-dimensional frame height in the three-dimensional detection information;
The uncertainty fusion module is used for calculating the target depth from the depth directly obtained from the three-dimensional detection information and the geometric projection formula, and obtaining the final depth through uncertainty weighted fusion;
the fusion module is used for fusing the predicted three-dimensional detection information with the weighting fusion to obtain the final depth and outputting the predicted information of the target;
the loss function of the feature extraction network is:
Lossall=∑wi·Lossi
Initially setting a weight w i =1 of the two-dimensional detection portion, the three-dimensional detection portion w i=0;Lossall being represented as an overall loss; loss i represents the Loss of each predicted branch;
Calculating the target depth by directly solving the depth and the geometric projection formula in the three-dimensional detection information, and obtaining the final depth by uncertainty weighted fusion comprises the following steps:
direct depth estimation is performed at the final RoI feature:
σd=Head(Direct_Depth(RoI)) [1]
The Head (direct_depth) is a prediction branch in three-dimensional information and is used for estimating Depth and uncertainty, z d represents a Direct Depth estimation result, epsilon is a set parameter, and sigma d represents heteroscedastic uncertainty in modeling Depth estimation;
Bringing the height of the three-dimensional frame obeying the Laplace distribution La (mu HH) into a geometric projection formula, and predicting the depth according to the geometric projection as follows:
Wherein f represents a focal length, H 2D represents a two-dimensional frame height, X obeys a standard Laplacian distribution La (0, 1), H 3D represents a three-dimensional frame height, lambda H represents a scale parameter, and mu H represents a mean value of the three-dimensional frame height;
Meanwhile, the depth deviation obeying Laplacian distribution La (mu bb),) is also predicted in the three-dimensional detection information, and the depth and uncertainty of the final geometric projection prediction are obtained by utilizing the additivity of the Laplacian distribution:
zp=μzbp 2=σz 2b 2
Wherein σb denotes the variance of the depth deviation, μ b denotes the mean of the depth deviation, σ p is the uncertainty based on the geometric projection, and z p is the depth based on the geometric projection; sigma H is denoted as/>
Fusing the direct depth z d obtained on the RoI feature and the depth z p based on geometric projection by using uncertainty guidance; weight ω i (i=d, p) calculation formula:
ωi=σi 2/∑σj 2(j=d,p),
Where d represents the direct depth estimate, p represents the geometric projection-based depth estimate, Σσ j 2 represents the sum of squares of the uncertainty of the direct depth estimate and the geometric projection-based depth estimate; σ i is expressed as the uncertainty of the direct depth estimate or the uncertainty based on the geometry projection depth;
The final target depth z c and uncertainty σ c calculate the formula:
zc=∑ωizic 2=∑ωiσi 2
because the target depth also obeys the laplace distribution, the loss function expression for the target depth information is:
Where z * represents a tag true value, z c represents a target depth, σ c represents uncertainty, z i represents two depth estimates, and σ i represents uncertainty corresponding to the depth estimate.
CN202410077692.3A 2024-01-19 2024-01-19 Automatic driving-oriented single-camera three-dimensional target detection method and system Active CN117576665B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410077692.3A CN117576665B (en) 2024-01-19 2024-01-19 Automatic driving-oriented single-camera three-dimensional target detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410077692.3A CN117576665B (en) 2024-01-19 2024-01-19 Automatic driving-oriented single-camera three-dimensional target detection method and system

Publications (2)

Publication Number Publication Date
CN117576665A CN117576665A (en) 2024-02-20
CN117576665B true CN117576665B (en) 2024-04-16

Family

ID=89890470

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410077692.3A Active CN117576665B (en) 2024-01-19 2024-01-19 Automatic driving-oriented single-camera three-dimensional target detection method and system

Country Status (1)

Country Link
CN (1) CN117576665B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118447468B (en) * 2024-07-08 2024-09-20 山西省财政税务专科学校 Monocular three-dimensional detection method and device based on spatial relationship between adjacent targets

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325794A (en) * 2020-02-23 2020-06-23 哈尔滨工业大学 Visual simultaneous localization and map construction method based on depth convolution self-encoder
US11004233B1 (en) * 2020-05-01 2021-05-11 Ynjiun Paul Wang Intelligent vision-based detection and ranging system and method
CN113159151A (en) * 2021-04-12 2021-07-23 中国科学技术大学 Multi-sensor depth fusion 3D target detection method for automatic driving
CN115222789A (en) * 2022-07-15 2022-10-21 杭州飞步科技有限公司 Training method, device and equipment for instance depth estimation model
CN116580085A (en) * 2023-03-13 2023-08-11 联通(上海)产业互联网有限公司 Deep learning algorithm for 6D pose estimation based on attention mechanism

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111325794A (en) * 2020-02-23 2020-06-23 哈尔滨工业大学 Visual simultaneous localization and map construction method based on depth convolution self-encoder
US11004233B1 (en) * 2020-05-01 2021-05-11 Ynjiun Paul Wang Intelligent vision-based detection and ranging system and method
CN113159151A (en) * 2021-04-12 2021-07-23 中国科学技术大学 Multi-sensor depth fusion 3D target detection method for automatic driving
CN115222789A (en) * 2022-07-15 2022-10-21 杭州飞步科技有限公司 Training method, device and equipment for instance depth estimation model
CN116580085A (en) * 2023-03-13 2023-08-11 联通(上海)产业互联网有限公司 Deep learning algorithm for 6D pose estimation based on attention mechanism

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于单目图像的自动驾驶三维目标检测算法研究;乔德文;《中国优秀硕士学位论文全文数据库 工程科技Ⅱ辑》;20240115;C035-406 *

Also Published As

Publication number Publication date
CN117576665A (en) 2024-02-20

Similar Documents

Publication Publication Date Title
US9990736B2 (en) Robust anytime tracking combining 3D shape, color, and motion with annealed dynamic histograms
CN114565900A (en) Target detection method based on improved YOLOv5 and binocular stereo vision
CN111201451A (en) Method and device for detecting object in scene based on laser data and radar data of scene
CN117576665B (en) Automatic driving-oriented single-camera three-dimensional target detection method and system
CN110197106A (en) Object designation system and method
KR20210090384A (en) Method and Apparatus for Detecting 3D Object Using Camera and Lidar Sensor
US20220129685A1 (en) System and Method for Determining Object Characteristics in Real-time
CN110992424B (en) Positioning method and system based on binocular vision
CN113092807B (en) Urban overhead road vehicle speed measuring method based on multi-target tracking algorithm
CN113281718B (en) 3D multi-target tracking system and method based on laser radar scene flow estimation
CN114495064A (en) Monocular depth estimation-based vehicle surrounding obstacle early warning method
CN114372523A (en) Binocular matching uncertainty estimation method based on evidence deep learning
CN111862147B (en) Tracking method for multiple vehicles and multiple lines of human targets in video
CN116310673A (en) Three-dimensional target detection method based on fusion of point cloud and image features
CN114608522B (en) Obstacle recognition and distance measurement method based on vision
CN113112547A (en) Robot, repositioning method thereof, positioning device and storage medium
CN115937520A (en) Point cloud moving target segmentation method based on semantic information guidance
CN117523514A (en) Cross-attention-based radar vision fusion data target detection method and system
CN115909268A (en) Dynamic obstacle detection method and device
CN112699748B (en) Human-vehicle distance estimation method based on YOLO and RGB image
CN114220138A (en) Face alignment method, training method, device and storage medium
CN111709269B (en) Human hand segmentation method and device based on two-dimensional joint information in depth image
CN112712062A (en) Monocular three-dimensional object detection method and device based on decoupling truncated object
CN116740519A (en) Three-dimensional target detection method, system and storage medium for close-range and long-range multi-dimensional fusion
CN114140497A (en) Target vehicle 3D real-time tracking method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20240220

Assignee: Nanjing Benli Information Technology Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980016890

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20240927

Application publication date: 20240220

Assignee: Nanjing Eryuefei Network Technology Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980016831

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20240927

EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20240220

Assignee: Nanjing Zhujin Intelligent Technology Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980017765

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20241010

Application publication date: 20240220

Assignee: Nanjing Shangyao Electronic Technology Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980017764

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20241010

Application publication date: 20240220

Assignee: Nanjing Qida Network Technology Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980017763

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20241010

Application publication date: 20240220

Assignee: Nanjing Donglai Information Technology Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980017666

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20241009

Application publication date: 20240220

Assignee: Nanjing Zijin Information Technology Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980017766

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20241010

EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20240220

Assignee: Nanjing Yuanshen Intelligent Technology R&D Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980018301

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20241012

Application publication date: 20240220

Assignee: Nanjing Yuze Robot Technology Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980018300

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20241012

Application publication date: 20240220

Assignee: Nanjing Zhongyang Information Technology Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980018299

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20241012

Application publication date: 20240220

Assignee: Nanjing Fangtai Intelligent Technology Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980018298

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20241012

Application publication date: 20240220

Assignee: Nanjing Gaoxi Information Technology Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980018297

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20241012

Application publication date: 20240220

Assignee: Nanjing Fuliang Network Technology Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980018296

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20241012

Application publication date: 20240220

Assignee: Nanjing Yixun Intelligent Equipment Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980018292

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20241012

Application publication date: 20240220

Assignee: Nanjing Yihe Information Technology Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980018291

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20241012

Application publication date: 20240220

Assignee: Nanjing Xingzhuo Intelligent Equipment Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980018289

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20241012

Application publication date: 20240220

Assignee: Nanjing Tichi Information Technology Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980018288

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20241012

Application publication date: 20240220

Assignee: Nanjing Jindong Technology Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980018286

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20241012

Application publication date: 20240220

Assignee: Nanjing Jinsheng Artificial Intelligence Technology Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980018283

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20241012

Application publication date: 20240220

Assignee: Nanjing Jingda Environmental Protection Technology Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980018281

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20241012

Application publication date: 20240220

Assignee: Nanjing Hancong Robot Technology Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980018278

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20241012

Application publication date: 20240220

Assignee: Jiangsu Huida Information Technology Industry Development Research Institute Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980018270

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20241012

Application publication date: 20240220

Assignee: Nanjing Extreme New Materials Research Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980018268

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20241012

Application publication date: 20240220

Assignee: Nanjing Youqi Intelligent Technology Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980018261

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20241012

Application publication date: 20240220

Assignee: Nanjing Haohang Intelligent Technology Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980018249

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20241012

Application publication date: 20240220

Assignee: Nanjing Pengjia Robot Technology Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980018246

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20241012

Application publication date: 20240220

Assignee: Nanjing Nuoyan Intelligent Technology Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980018241

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20241012

Application publication date: 20240220

Assignee: Nanjing Junshang Network Technology Co.,Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: X2024980018234

Denomination of invention: A single camera 3D object detection method and system for autonomous driving

Granted publication date: 20240416

License type: Common License

Record date: 20241012