WO2021203807A1 - 一种基于多源数据知识迁移的三维物体检测框架 - Google Patents

一种基于多源数据知识迁移的三维物体检测框架 Download PDF

Info

Publication number
WO2021203807A1
WO2021203807A1 PCT/CN2021/074212 CN2021074212W WO2021203807A1 WO 2021203807 A1 WO2021203807 A1 WO 2021203807A1 CN 2021074212 W CN2021074212 W CN 2021074212W WO 2021203807 A1 WO2021203807 A1 WO 2021203807A1
Authority
WO
WIPO (PCT)
Prior art keywords
dimensional
point cloud
bounding box
target
image feature
Prior art date
Application number
PCT/CN2021/074212
Other languages
English (en)
French (fr)
Inventor
谭晓军
冯大鹏
梁小丹
王焕宇
杨陈如诗
杨梦雨
Original Assignee
中山大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中山大学 filed Critical 中山大学
Priority to US17/917,268 priority Critical patent/US20230260255A1/en
Publication of WO2021203807A1 publication Critical patent/WO2021203807A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/7715Feature extraction, e.g. by transforming the feature space, e.g. multi-dimensional scaling [MDS]; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/09Supervised learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • G06V20/647Three-dimensional objects by matching two-dimensional images to three-dimensional objects

Definitions

  • the invention relates to the field of machine learning and computer vision, in particular to a three-dimensional object detection framework based on knowledge transfer of multi-source data.
  • Three-dimensional object detection is an important research field of computer vision. There are a wide range of application scenarios in industrial production and daily life, such as driverless cars, intelligent robots, and intelligent robots. Compared with the two-dimensional object detection, the three-dimensional object detection task is more challenging and rich in practical application value. Three-dimensional object detection mainly completes the task of object recognition and positioning, and obtains three-dimensional information of the object, including the center point coordinates of the object C x , Cy , C z , the size of the object, namely the length l, width w and height h, and direction of the object Angle ⁇ , ⁇ , ⁇ .
  • a three-dimensional object detection framework based on knowledge transfer from multi-source data includes the following steps:
  • the image feature extraction unit extracts the first image feature from the image, and outputs the first image feature to the interested target selection unit, the knowledge transfer unit, and the three-dimensional target parameter prediction unit;
  • the target of interest selection unit generates a series of two-dimensional bounding boxes of the target of interest according to the first image feature, so as to extract the point cloud data of the corresponding area from the point number space to output to the point cloud feature unit;
  • the point cloud feature extraction unit extracts point cloud features from the point cloud data, and outputs the point cloud features to the knowledge transfer unit and the three-dimensional target parameter prediction unit;
  • the knowledge transfer unit calculates the cosine similarity between the image feature and the point cloud feature, and processes the cosine similarity to update the parameters of the image feature extraction unit;
  • the three-dimensional target parameter prediction unit generates a three-dimensional bounding box according to the image feature and the point cloud feature, and outputs the nine-degree-of-freedom parameters of the three-dimensional bounding box, and then updates the image feature through backpropagation
  • the parameters of the extraction unit and the point cloud feature extraction unit
  • the two-dimensional detector extracts the candidate bounding box of the target from the image, and sends the candidate bounding box to the image feature extraction unit;
  • the image feature extraction unit extracts a second image feature from the candidate bounding box, and outputs the second image feature to the interest target selection unit and the three-dimensional target parameter prediction unit:
  • the target-of-interest selection unit generates a corresponding two-dimensional bounding box according to the second image feature, and outputs the center coordinates of the corresponding two-dimensional bounding box to the three-dimensional target parameter prediction unit.
  • the three-dimensional target parameter prediction unit generates a corresponding three-dimensional bounding box according to the second image feature, and calculates and combines the corresponding three-dimensional bounding box and the center point coordinates of the corresponding two-dimensional bounding box. Output the nine-degree-of-freedom parameters of the corresponding three-dimensional bounding box.
  • the present invention provides a three-dimensional object detection framework based on knowledge transfer of multi-source data.
  • the technical solution provided by the present invention is implemented.
  • the image feature extraction unit outputs the extracted image features to the interested target selection unit, the knowledge transfer unit and the three-dimensional
  • the target parameter prediction unit, the target selection unit of interest outputs the two-dimensional bounding box of the target of interest according to the image features, and according to the two-dimensional bounding box, the corresponding point cloud data is extracted from the point cloud space and output to the point cloud feature extraction Unit, the point cloud feature extraction unit extracts the corresponding point cloud feature to the knowledge transfer unit; then, the knowledge unit calculates the cosine similarity between the image feature and the point cloud feature, and updates the image according to the cosine similarity
  • the parameter of the feature extraction unit so that the image feature is gradually similar to the point cloud feature, so that the image feature learns the point cloud feature, and then the three-dimensional target parameter prediction unit is based on the image feature and the point cloud
  • the feature outputs the nine-degree
  • the two-dimensional detector detects and extracts the candidate boundary of the target from the image Frame, and using the updated image feature extraction unit to extract image features from the candidate bounding box, and the interest target selection unit and the three-dimensional target parameter prediction unit process the image features to predict the three-dimensional parameters of the target, Compared with the prior art, since the image features by learning point cloud features already have the ability to express three-dimensional spatial information, it can effectively improve the accuracy of image-based three-dimensional object parameter prediction.
  • FIG. 1 is a schematic diagram of the overall flow of a three-dimensional object detection framework based on multi-source data knowledge transfer in an embodiment of the present invention
  • Figure 2 is a schematic diagram of the training process of the neural network training stage in an embodiment of the present invention.
  • proposal feature is an image detector
  • conv2D and fc constitute the image feature extraction unit
  • shared MLP, global context, conv 1X1 and max pool constitute points Cloud feature extraction unit
  • mimic is the knowledge transfer unit
  • 2D Bounding Box is the current selection unit of interest
  • 6DoF Module is the three-dimensional target parameter prediction unit;
  • FIG. 3 is a schematic diagram of a process of acquiring point cloud data of a corresponding area in a point cloud space in an embodiment of the present invention
  • FIG. 4 is a schematic flowchart of a point cloud feature extraction unit enhancing the ability of a one-dimensional convolutional neural network model to model global information in a point cloud space in an embodiment of the present invention
  • Figure 5 is a schematic diagram of the generation process of the three-dimensional bounding box in the neural network training stage in an embodiment of the present invention.
  • 1 is the image feature
  • 2 is the point cloud feature
  • 3 is the first feature vector
  • 4 is the second feature vector.
  • 5 is the three-dimensional bounding box.
  • the present invention is further configured to: before the step S1, the method further includes using a two-dimensional detector to extract the candidate bounding box of the target from the image, so as to obtain the corresponding region in the point cloud space according to the candidate bounding box of the target. Point cloud data.
  • the present invention is further configured to: before the step S1, it further includes the computer system receiving an annotation tag input by the tester for the image.
  • the present invention is further configured as: the step S2 includes: S2-1, the target of interest selection unit detects the target of interest from the first image feature, and uses the RPN network to output the target of interest corresponding to the target of interest.
  • a series of two-dimensional bounding boxes; S2-2. Calculate the IoU value of the two-dimensional bounding box and the label on the two-dimensional image corresponding to the target of interest, and select the label with the largest IoU value
  • the real label of the target of interest and in the point space, extract and output the point cloud data of the area corresponding to the real label to the point cloud feature extraction unit, and also output the two-dimensional The coordinates of the center point of the bounding box are to the three-dimensional target parameter prediction unit.
  • step S3 specifically includes:
  • step S4 includes:
  • S4-1 Calculate the cosine similarity between the received image feature and the point cloud feature
  • step S5 includes:
  • the linear layer of the three-dimensional target parameter prediction unit maps the received image feature and the point cloud feature to generate a three-dimensional bounding box
  • the present invention provides a three-dimensional object detection framework based on knowledge transfer of multi-source data.
  • the three-dimensional object detection framework is applied to be provided with an image feature extraction unit, an interest target selection unit, a point cloud feature extraction unit, a knowledge transfer unit, and
  • the three-dimensional object detection framework needs to go through the neural network training stage and the neural network inference stage to realize the three-dimensional prediction of the object in the image.
  • the specific operation flow is shown in Figure 1, where the neural network
  • the operation process in the training phase is shown in Figure 2.
  • the implementation of the three-dimensional object detection framework to predict the three-dimensional parameters of the object in the image specifically includes the following steps: S1, the image feature extraction unit extracts the first image feature from the image, and outputs the first image feature to the target selection unit of interest, A knowledge transfer unit and a three-dimensional target parameter prediction unit; specifically, the computer system inputs the images collected by each camera to the image feature extraction unit, and the image feature extraction unit uses the two-dimensional convolutional neural network model in its unit Perform image feature extraction on the acquired image, and use the extracted image feature as the first image feature, and then send the first image feature to the interested target selection unit, the knowledge transfer unit, and the three-dimensional target parameter prediction unit.
  • the computer system before performing the step S1, the computer system also performs the following operations: after receiving the images collected by the cameras, the three-dimensional point cloud processing unit in the system is used to process the images to obtain the same The instance point cloud data corresponding to the image is stored, and the instance point cloud data is stored in the point cloud space; then, as shown in FIG.
  • the two-dimensional detector in the computer system detects the image to obtain
  • the target is detected on the above, and a candidate bounding box and a plurality of two-dimensional labeled boxes for the target are generated, and the candidate bounding box is matched with the corresponding plurality of two-dimensional labeled boxes to calculate the candidate bounding box and
  • the two-dimensional labeling box with the largest IoU value is selected as the true value of the candidate bounding box, and then the two-dimensional labeling box with the largest IoU value is mapped to
  • the point cloud space where the point cloud data is implemented is stored to obtain a three-dimensional labeling frame, and the point cloud data of the three-dimensional labeling frame in the point cloud space is the point cloud data corresponding to the candidate bounding box,
  • the point cloud data in the box is extracted and output to the point cloud feature extraction unit.
  • the image feature extraction unit and the point cloud feature extraction unit respectively input the extracted image features and point clouds to
  • the computer system before sending the collected image to the image feature extraction unit, the computer system also receives at least one label manually input by the tester for the image, and after that, the computer system transfers the The image and the annotation tag corresponding to the image are input to the image feature extraction unit.
  • the target-of-interest selection unit generates a series of two-dimensional bounding boxes of the target-of-interest according to the first image feature; specifically, the process of extracting and outputting point cloud data by the target-of-interest selection unit is as follows :
  • the target of interest selection unit detects a target of interest from the first image feature, and uses the RPN network to output a series of two-dimensional bounding boxes corresponding to the target of interest;
  • the point cloud feature extraction unit extracts point cloud features from the point cloud data, and outputs the point cloud features to the knowledge transfer unit and the three-dimensional target parameter prediction unit; specifically, the The point cloud feature extraction unit implements the extraction of point cloud features through the following process:
  • the accuracy of the model is reduced; and in the one-dimensional convolutional neural network model, by adding a connection shortcut between every two convolutional layers, that is, the residual block, the residual module is used for the next convolutional layer
  • the gradient is close to 0, so as to ensure that the network does not disappear or explode when updating the parameters.
  • the convolution operation is performed on each input vector to The dimensionality of the input vector is reduced, and then the weight corresponding to each input vector is obtained through softmax, and then the input vector is weighted and summed to obtain the corresponding global feature, and finally the global feature is added to each input vector , To increase the global response of the input vector, thereby enhancing the one-dimensional convolutional neural network model's ability to model global information in the point cloud space.
  • the knowledge transfer unit calculates the cosine similarity between the image feature and the point cloud feature, and after minimizing the cosine similarity, calculates the gradient of the backpropagation image feature, To update the parameters of the image feature extraction unit, where the parameters of the image feature extraction unit are updated to make the image features extracted by the updated image feature extraction unit have the ability to express a three-dimensional space.
  • the knowledge transfer unit uses the feature after receiving the image feature and the point cloud feature
  • the encoding method encodes the image feature and the point cloud feature into a first feature vector and a second feature vector, respectively, so as to map the first feature vector and the second feature vector to a high-dimensional space, and obtain the The spatially corresponding array, and then calculate the cosine similarity between the image feature and the point cloud feature according to the cosine similarity calculation method; by calculating the cosine similarity between the image feature and the point cloud feature , It can be judged whether the two vectors are similar.
  • the two features are more similar, that is, the image feature is closer to the point cloud feature, because the point cloud feature has the ability to express three-dimensional space If the image feature is very similar to the corresponding point cloud feature, it means that the image feature can express more three-dimensional space information.
  • the value of the loss function of the two-dimensional convolutional neural network is gradually approached or even reaches the minimum value.
  • the image feature extracted by the image feature extraction unit can be compared with the The point cloud features are becoming more and more similar, so that the image features learn the point cloud features. Since the point cloud features have the ability to express three-dimensional spatial information, when the image features are closer to the point cloud features The more three-dimensional spatial information the image feature expresses, the more helpful it is to improve the accuracy of three-dimensional object parameter prediction.
  • the three-dimensional target parameter prediction unit generates a three-dimensional bounding box according to the image feature and the point cloud feature, and outputs the nine-degree-of-freedom parameters of the three-dimensional bounding box, and then updates the image feature through backpropagation
  • the parameters of the extraction unit and the point cloud feature extraction unit wherein the nine degrees of freedom parameters of the three-dimensional bounding box refer to the coordinates (x, y, z,) of the center point of the three-dimensional bounding box, Euler angle Parameters ( ⁇ , ⁇ , ⁇ ), and the length, width, and height parameters of the three-dimensional bounding box.
  • the working process of the three-dimensional target parameter prediction unit is as follows:
  • the linear layer of the three-dimensional target parameter prediction unit maps the image feature and the point cloud feature to output a corresponding three-dimensional bounding box; after the image feature learns the point cloud feature, such as As shown in FIG. 5, the knowledge transfer unit inputs the first feature vector corresponding to the image feature, and the second feature vector corresponding to the point cloud feature into the fully connected layer of the three-dimensional target parameter prediction unit, that is, the linear layer , Mapping the first feature vector and the second feature vector by the linear layer to output a three-dimensional bounding box corresponding to the detection target.
  • [f u , 0, c u ; 0, f v , c v ; 0, 0, 1] are the camera parameters
  • (u, v) are the two-dimensional bounding box obtained by the target selection unit of interest The pixel coordinates of the center point, thereby obtaining the center point coordinates (x, y, z) of the three-dimensional bounding box;
  • the knowledge transfer unit and the three-dimensional target parameter prediction unit can simultaneously update the relevant parameters of the image feature extraction unit.
  • the above steps S1-S5 are all performed in the neural network training phase.
  • the purpose of performing the above steps is to make the image features learn the expression of the point cloud features, so that the image features have the ability to express three-dimensional space; when the neural network training phase ends
  • the framework enters the neural network inference stage, and performs the following steps to output the object parameters of the three-dimensional target predicted based on the two-dimensional image:
  • the two-dimensional detector extracts the candidate bounding box of the target from the image, and sends the candidate bounding box to the image feature extraction unit;
  • the image feature extraction unit extracts a second image feature from the candidate bounding box, and outputs the second image feature to the interest target selection unit and the three-dimensional target parameter prediction unit:
  • the target-of-interest selection unit generates a corresponding two-dimensional bounding box according to the second image feature, and outputs the center coordinates of the corresponding two-dimensional bounding box to the three-dimensional target parameter prediction unit.
  • the three-dimensional target parameter prediction unit generates a corresponding three-dimensional bounding box according to the second image feature, and calculates and combines the corresponding three-dimensional bounding box and the center point coordinates of the corresponding two-dimensional bounding box. Output the nine-degree-of-freedom parameters of the corresponding three-dimensional bounding box; the linear layer of the three-dimensional target parameter prediction unit maps the second image feature to generate the corresponding three-dimensional bounding box, and then, by executing the above
  • the operations described in steps S5-2 to S5-4 are used to calculate the nine-degree-of-freedom parameters of the corresponding three-dimensional bounding box, that is, the center point coordinate coordinates (x, y, z) and the Euler angle parameters ( ⁇ , ⁇ , ⁇ ), and the length, width, and height of the corresponding three-dimensional bounding box, and output the obtained nine-degree-of-freedom parameters as the final result of detecting the three-dimensional parameters of the target object.
  • the present invention provides a three-dimensional object detection framework based on knowledge transfer of multi-source data.
  • the target selection unit of interest can output the point cloud data of the target of interest according to the image features.
  • the point cloud feature is extracted from the point cloud data by the point cloud feature, and then in the knowledge transfer unit, the image feature is made to learn the point cloud feature and update the parameters of the image feature extraction unit, and the three-dimensional target parameter
  • the prediction unit updates the parameters of the image feature extraction unit and the point cloud feature extraction unit according to the image features and point cloud features.
  • the updated image feature extraction unit re-extracts the image features to the three-dimensional target parameter prediction unit.
  • the target parameter prediction unit calculates and inputs three-dimensional parameters according to the image characteristics, thereby providing the detection accuracy of three-dimensional object detection based on the two-dimensional image.
  • This application has industrial applicability.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Image Analysis (AREA)

Abstract

一种基于多源数据知识迁移的三维物体检测框架,通过将图像特征提取单元所提取的图像特征输出,使感兴趣目标选择单元根据图像特征,输出感兴趣目标的点云数据至点云特征提取单元,由点云特征提取单元从点云数据中提取点云特征,然后,在知识迁移单元中,使图像特征学习点云特征并更新图像特征提取单元的参数,而三维目标参数预测单元根据图像特征和点云特征更新图像特征提取单元和点云特征提取单元的参数,最后,由更新后的图像特征提取单元重新提取图像特征至三维目标参数预测单元,由三维目标参数预测单元根据图像特征,推算并输出三维参数,由此,提高了基于二维图像的三维物体检测的检测精度。

Description

一种基于多源数据知识迁移的三维物体检测框架 技术领域
本发明涉及机器学习及计算机视觉领域,特别是一种基于多源数据知识迁移的三维物体检测框架。
背景技术
三维物体检测是计算机视觉的一个重要研究领域,在工业生产以及日常生活中存在广泛的应用场景,如无人驾驶汽车、智能机器人、以及智能机器人等。相对于二维物体检测,三维物体检测任务更具有挑战且更富有实际应用价值。三维物体检测主要完成对物体的识别和定位任务,得到物体的三维信息,包括物体的中心点坐标C x,C y,C z、物体尺寸即物体的长度l、宽度w和高度h、以及方向角α,β,γ。
技术问题
近年来,深度学习的发展使得二维物体检测的速度和精度等方面与较大的提升,然而,由于二维RGB图像缺失深度信息,使得现有的基于图像的三维物体检测的方法的检测精度,相对于基于点云数据的三维物体检测的方法落后许多,因此,如何利用点云数据来提高基于图像的三维物体检测的精度,成为一个业界研究方向。
技术解决方案
一种基于多源数据知识迁移的三维物体检测框架,包括以下步骤:
S1、图像特征提取单元从图像中提取第一图像特征,并将所述第一图像特征输出至感兴趣目标选择单元、知识迁移单元和三维目标参数预测单元;
S2、所述感兴趣目标选择单元根据所述第一图像特征,生成一系列的感兴趣目标的二维包围盒,以从点数空间中提取相应区域的点云数据以输出至点云特征单元;
S3、所述点云特征提取单元从所述点云数据中提取点云特征,并将所述点云特征输出至所述知识迁移单元和三维目标参数预测单元;
S4、所述知识迁移单元计算所述图像特征与所述点云特征两者之间的余弦相似度,并对所述余弦相似度进行处理,以更新所述图像特征提取单元的参数;
S5、所述三维目标参数预测单元根据所述图像特征、所述点云特征生成三维包围盒,并输出所述三维包围盒的九个自由度参数,之后还通过反向传播更新所述图像特征提取单元、所述点云特征提取单元的参数;
S6、二维检测器从所述图像中提取目标的候选边界框,并将所述候选边界框发送至所述图像特征提取单元;
S7、所述图像特征提取单元从所述候选边界框中提取第二图像特征,并将所述第二图像特征输出至所述感兴趣目标选择单元、以及所述三维目标参数预测单元:
S8、所述感兴趣目标选择单元根据所述第二图像特征,生成相应的二维包围盒,并输出所述相应的二维包围盒的中心坐标至所述三维目标参数预测单元。
S9、所述三维目标参数预测单元根据所述第二图像特征,生成相应的三维包围盒,并根据所述相应的三维包围盒、以及所述相应的二维包围盒的中心点坐标,计算并输出所述相应的三维包围盒的九个自由度参数。
有益效果
本发明提供了一种基于多源数据知识迁移的三维物体检测框架,实施本发明所提供的技术方案,由图像特征提取单元将提取的图像特征输出至感兴趣目标选择单元、知识迁移单元和三维目标参数预测单元,由感兴趣目标选择单元根据图像特征输出感兴趣目标的二维包围盒,并根据所述二维包围盒,从点云空间中提取对应的点云数据输出至点云特征提取单元,再由点云特征提取单元提取相应的点云特征至知识迁移单元;然后,由知识单元通过计算图像特征与点云特征之间余弦相似度,并根据所述余弦相似度更新所述图像特征提取单元的参数,以使得所述图像特征逐渐相似于所述点云特征,由此实现使所述图像特征学习所述点云特征,然后,由三维目标参数预测单元根据图像特征和点云特征输出三维包围盒的九个自由度参数,并在反向传播时同时更新图像特征提取单元和点云特征提取单元的参数,最后,由二维检测器从图像中检测并提取目标的候选边界框,以及利用已更新的图像特征提取单元对所述候选边界框提取图像特征,并由感兴趣目标选择单元和三维目标参数预测单元对所述图像特征进行处理,以对目标进行三维参数预测,相比于现有技术,由于通过学习点云特征的图像特征已具有表达三维空间信息的能力,因此能有效提高基于图像的三维物体参数预测的准确率。
附图说明
图1为本发明一实施例中一种基于多源数据知识迁移的三维物体检测框架的总流程示意图;
图2为本发明一实施例中神经网络训练阶段的训练流程示意图,图中,proposal feature为图像检测器,conv2D和fc构成图像特征提取单元,shared MLP、global context、conv 1X1和max pool构成点云特征提取单元,mimic为知识迁移单元,2D Bounding Box为感兴趣目前选择单元,6DoF Module为三维目标参数预测单元;
图3为本发明一实施例中获取点云空间中相应区域点云数据的流程示意图;
图4为本发明一实施例中点云特征提取单元增强一维卷积神经网络模型对点云空间全局信息的建模能力的流程示意图;
图5为本发明一实施例中神经网络训练阶段的三维包围盒的生成流程示意图,图中,1为图像特征,2为点云特征,3为第一特征向量,4为第二特征向量,5为三维包围盒。
本发明的最佳实施方式
本发明进一步设置为:在所述步骤S1之前,还包括,利用二维检测器从所述图像中提取目标的候选边界框,以根据所述目标的候选边界框获取点云空间中相应区域的点云数据。
本发明进一步设置为:在所述步骤S1之前,还包括计算机系统接收测试人员针对所述图像所输入的标注标签。
本发明进一步设置为:所述步骤S2包括:S2-1、所述感兴趣目标选择单元从所述第一图像特征中检测出感兴趣目标,并利用RPN网络输出与所述感兴趣目标相应的一系列二维包围盒;S2-2、计算与所述感兴趣目标对应的所述二维包围盒、所述标注标签在所述二维图像上的IoU值,选取具有最大IoU值的标注标签作为所述感兴趣目标的真实标签,并在点数空间中,将与所述真实标签具有对应关系的区域的点云数据,提取并输出至所述点云特征提取单元,还输出所述二维包围盒的中心点坐标至所述三维目标参数预测单元。
本发明进一步设置为:所述步骤S3具体包括:
S3-1、将所述点云数据输入至一维卷积神经网络模型;
S3-2、通过残差连接提高所述一维卷积神经网络模型的训练性能,并通过注意力机制来增强所述一维卷积神经网络模型对点云空间全局信息的建模能力;
S3-3、执行最大池化操作,以获取与所述感兴趣目标相对应的点云特征。
本发明进一步设置为:所述步骤S4包括:
S4-1、根据所接收的所述图像特征和所述点云特征,计算两者之间的余弦相似度;
S4-2、将所述余弦相似度最小化;
S4-3、计算并反向传播所述图像特征的梯度,以更新所述图像特征单元的二维卷积神经网络模型参数。
本发明进一步设置为:所述步骤S5包括:
S5-1、所述三维目标参数预测单元的线性层将接收的所述图像特征、所述点云特征进行映射,以生成三维包围盒;
S5-2、预测深度坐标z,然后通过以下式(1)预测所述三维包围盒坐标x和y,所述式(1)为:
(1)
Figure PCTCN2021074212-appb-000001
Figure PCTCN2021074212-appb-000002
其中,[f u,0,c u;0,f v,c v;0,0,1]为相机的内参数,(u,v)是所述感兴趣目标选择单元得到的二维包围盒的中心点像素坐标;
S5-3、根据所述三维包围盒的中心点坐标,通过以下式(2)预测四元数,然后通过以下式(3)将所述四元数转换成欧拉角,以消除欧拉角的万向锁问题,所述式(2)、式(3)的表达式如下:
(2)
Figure PCTCN2021074212-appb-000003
(3)
Figure PCTCN2021074212-appb-000004
其中,四元数表示为
Figure PCTCN2021074212-appb-000005
欧拉角表示为;
roll,pitch,yaw=(α,β,γ);
S5-4、根据所述三维包围盒的中心点坐标、欧拉角、以及所述图像特征、所述点云特征在线性层上的映射,计算所述三维包围盒的长、宽、高参数,并输出所述三维包围盒的中心点坐标(x,y,z)、欧拉角参数(α,β,γ),以及所述三维包围盒的长、宽、高参数;
S5-5、计算所述图像特征、所述点云特征的梯度,并通过所述梯度进行反向传播,以更新所述图像特征提取单元、所述点云特征提取单元相应的参数。
本发明的实施方式
本发明提供的一种基于多源数据知识迁移的三维物体检测框架,所述三维物体检测框架应用在设置有图像特征提取单元、感兴趣目标选择单元、点云特征提取单元、知识迁移单元、以及三维目标参数预测单元的计算机系统中,所述三维物体检测框架实现对图像中物体的三维预测需要经历神经网络训练阶段和神经网络推理阶段,具体的操作流程如图1所示,其中,神经网络训练阶段中的运作流程如图2所示。
所述三维物体检测框架实现预测图像中物体的三维参数具体包括以下步骤:S1、图像特征提取单元从图像中提取第一图像特征,并将所述第一图像特征输出 至感兴趣目标选择单元、知识迁移单元和三维目标参数预测单元;具体地,计算机系统将各相机所采集的图像输入至所述图像特征提取单元,由所述图像特征提取单元利用其单元中的二维卷积神经网络模型,对获取的图像进行图像特征提取,并将提取所得的图像特征作为第一图像特征,之后则将所述第一图像特征发送至感兴趣目标选择单元、知识迁移单元和三维目标参数预测单元。
进一步地,执行所述步骤S1之前,所述计算机系统还执行以下操作:在接收到各相机所采集的图像后,利用系统中的三维点云处理单元对所述图像进行处理,以得到与所述图像对应的实例点云数据,并将该实例点云数据存储至点云空间;然后,如图3所示,计算机系统中的二维检测器对所述图像进行检测,以从所述图像上检测出目标,并生成针对所述目标的候选边界框以及多个二维标注框,通过将所述候选边界框与相应的多个二维标注框进行匹配,以计算所述候选边界框与每一个与其相应的二维标注框之间的IoU值,选择IoU值最大的二维标注框作为所述候选边界框的真实值,接着,将所述具有最大IoU值的二维标注框映射到存储有实施点云数据的点云空间中以得到三维标注框,所述三维标注框在所述点云空间中所框中的点云数据,就是与所述候选边界框对应的点云数据,之后,则将被框中的点云数据提取并输出至点云特征提取单元,后面,则由所述图像特征提取单元、点云特征提取单元分别将其提取的图像特征、点云输入至三维参数预测单元,由所述三维参数预测单元根据这些特征对目标物体的三维参数进行预测。
进一步地,在将所采集的图像发送至所述图像特征提取单元之前,所述计算机系统还接收测试人员针对所述图像所手动输入的至少一个标注标签,之后,所述计算机系统则将所述图像、以及与所述图像对应的所述标注标签输入至图像特征提取单元。
S2、所述感兴趣目标选择单元根据所述第一图像特征,生成一系列的感兴趣目标的二维包围盒,;具体地,所述感兴趣目标选择单元提取和输出点云数据的流程如下:
S2-1、所述感兴趣目标选择单元从所述第一图像特征中检测出感兴趣目标,并利用RPN网络输出与所述感兴趣目标相应的一系列二维包围盒;
S2-2、计算与所述感兴趣目标对应所述二维包围盒、以及所述标注标签其在 所述二维图像上的IoU值,以选取具有最大IoU值的标注标签作为所述感兴趣目标的真实标签,接着,在点数空间中,将与所述真实标签具有对应关系的区域的点云数据,提取并输出至点云特征提取单元,还计算所述二维包围盒的中心点坐标,并将所述二维包围盒的中心点坐标输出至三维目标参数预测单元,其中,所述标注标签是指在执行步骤1之前有测试人员在图像上所人工标注的标签。
S3、所述点云特征提取单元从所述点云数据中提取点云特征,并将所述点云特征输出至所述知识迁移单元、以及所述三维目标参数预测单元;具体地,所述点云特征提取单元是通过以下流程实现提取点云特征的:
S3-1、将所述点云数据输入至一维卷积神经网络模型;
S3-2、通过残差链接提高所述一维卷积神经网络模型的训练性能,并通过注意力机制来增强所述一维卷积神经网络模型对点云空间全局信息的建模能力;
S3-3、执行最大池化操作,以获取与所述感兴趣目标相对应的点云特征,通过对经残差连接、注意力机制训练的神经网络其卷积层输出的结果,执行最大池化操作,以获取和获悉局部接收域中值最大的点,并将该点提取为点云特征。
可以理解地,在网络层数较多的网络中,随着网络深度增加,网络的训练难度增加,容易出现神经网络退化、梯度消散、梯度爆炸等的问题,导致输出的结果与样本误差大,模型的准确性降低;而在所述一维卷积神经网络模型中,通过在每两层卷积层之间增加一个连接捷径即残差块,由所述残差模块向下一个卷积层输入残差项f(x)以激活该卷积层,并将上一个卷积层的输出结果x直接传输至所述下一个卷积层的输出,以作为下一个卷积层的初始输出结果,使下一个卷积层的输出结果F(x)为F(x)=x+f(x),之后,直接利用链式法则对F(x)求导以得出梯度时,就能避免梯度接近于0,从而保证网络在更新参数时不会发生梯度消失或梯度爆炸,而且,当f(x)=0时,F(x)=x即实现了恒等映射,而网络出现退化问题,是因为网络中冗余的层的学习了不是实现恒等映射的参数而造成的,通过使冗余的层学习f(x)=0,冗余的层实现恒等映射,从而加快网络收敛,避免网络退化。
而关于通过注意力机制来增强所述一维卷积神经网络模型对点云空间全局信息的建模能力,其原理具体如图4所示,先通过对每个输入向量进行卷积操作,以对所述输入向量进行降维,然后,通过softmax得到与每个输入向量对应的的权重,然后对输入向量加权求和得到相应的global feature全局特征,最后把全局 特征加到每个输入向量上,以增大所述输入向量对全局的响应,从而增强所述一维卷积神经网络模型对点云空间全局信息的建模能力。
S4、所述知识迁移单元根据所述图像特征与所述点云特征,计算两者之间的余弦相似度,并在将所述余弦相似度最小化后,计算反向传播图像特征的梯度,以更新所述图像特征提取单元的参数,其中,更新所述图像特征提取单元的参数的,是为了使得更新后的图像特征提取单元其所提取的图像特征具有表达三维空间的能力。
所述知识迁移单元实现上述目的的具体过程如下:
S4-1、根据接收的所述图像特征和所述点云特征,计算两者之间的余弦相似度;所述知识迁移单元在接收到所述图像特征、所述点云特征后,利用特征编码法将所述图像特征、所述点云特征分别编码成第一特征向量、第二特征向量,以将所述第一特征向量和第二特征向量映射到高维空间,获取其在高维空间对应的数组,然后,根据余弦相似度的计算方法计算所述图像特征于所述点云特征之间的余弦相似度;通过计算所述图像特征与所述点云特征之间的余弦相似度,可以判断两个向量是否相似,若计算结果越接近于1,说明两个特征越相似,即所述图像特征越接近于所述点云特征,由于所述点云特征具有表达三维空间的能力,假若所述图像特征与对应的所述点云特征非常相似,则意味着该图像特征可以表达出更多三维空间的信息。
S4-2、将所述余弦相似度最小化;将余弦相似度进行最小化,是为了克服余弦相似度对特征向量的数组不敏感的缺陷,从而使个体之即特征之间的差异明显,从而使所述知识迁移单元。
S4-3、计算并反向传播所述图像特征的梯度,以更新所述图像特征单元的二维卷积神经网络模型的参数;具体地,根据最小化后的余弦相似度,可以得到所述图像特征与所述点云特征之间的误差,根据该误差计算并反向传播所述图像特征的梯度,进而对所述图像特征单元的二维卷积神经网络的各参数例如权重和偏置值进行更新和调整,目的是为了寻找所述二维卷积神经网络中损失函数的最小值。
通过重复上述步骤S4-1至S4-3,使二维卷积神经网络的损失函数的值逐渐逼近甚至到达最小值,这样,就可以使所述图像特征提取单元所提取的图像特征 与所述点云特征越来越相似,由此实现所述图像特征学习所述点云特征,由于点云特征具有表达三维空间信息的能力,因此,当所述图像特征越接近于所述点云特征时,所述图像特征其所表达的三维空间信息越多,越有利于提高三维物体参数预测的准确性。
S5、所述三维目标参数预测单元根据所述图像特征、所述点云特征生成三维包围盒,并输出所述三维包围盒的九个自由度参数,之后还通过反向传播更新所述图像特征提取单元、所述点云特征提取单元的参数,其中,所述三维包围盒的九个自由度参数,是指所述三维包围盒的中心点坐标(x,y,z,),欧拉角参数(α,β,γ),以及所述三维包围盒的长、宽、高参数。
具体地,所述三维目标参数预测单元其工作流程如下:
S5-1、所述三维目标参数预测单元的线性层将所述图像特征、所述点云特征进行映射,以输出相应的三维包围盒;所述图像特征在学习所述点云特征后,如图5所示,由所述知识迁移单元将所述图像特征对应的第一特征向量,所述点云特征对应的第二特征向量输入至所述三维目标参数预测单元的全连接层即线性层,由所述线性层对所述第一特征向量、所述第二特征向量进行映射,以输出与检测目标对应的三维包围盒。
S5-2、计算、预测所述三维包围盒的中心点坐标,具体地,先预测深度坐标z,然后通过以下式(1)预测坐标x和y(x,y);其中,所述深度坐标z的预测方式是——根据所述框架中所预先设立的训练数据集的定义,设置一个最短预测距离和一个最长预测距离,所述最短预测距离与所述最长预测距离之间的差值作为深度距离,把所述深度距离按等份划分N个距离区间,然后预测目标物体出现在不同距离区间的可能概率,最后根据所述可能概率求解出期望z;所述式(1)为:
(1)
Figure PCTCN2021074212-appb-000006
Figure PCTCN2021074212-appb-000007
其中,[f u,0,c u;0,f v,c v;0,0,1]是相机内参数,(u,v)是所述感兴趣目标选择单元得到的二维包围盒的中心点像素坐标,由此,得到所述三维包围盒的中心点坐标 (x,y,z);
S5-3、根据所述三维包围盒的中心点坐标,通过以下式(2)预测四元数,然后通过式(3)将所述四元数转换成欧拉角,以消除欧拉角的万向锁问题,所述式(2)、式(3)的表达式如下:
(2)
Figure PCTCN2021074212-appb-000008
(3)
Figure PCTCN2021074212-appb-000009
其中,四元数表示为
Figure PCTCN2021074212-appb-000010
欧拉角表示为
roll,pitch,yaw=(α,β,γ)。
S5-4、根据所述三维包围盒的中心点坐标、欧拉角、以及所述图像特征、所述点云特征在线性层上的映射,计算所述三维包围盒的长、宽、高参数,并输出所述三维包围盒的中心点坐标(x,y,z)、欧拉角参数(α,β,γ),以及所述三维包围盒的长、宽、高。
S5-5、计算所述图像特征、所述点云特征的梯度,并通过所述梯度进行反向传播,以更新所述图像特征提取单元其二维卷积神经网络、所述点云特征提取单元其一维卷积神经网络的相应参数,同时,还更新所述三维目标参数预测单元其线性层即全连接层的相应参数。
进一步地,所述知识迁移单元和所述三维目标参数预测单元,两者可以同时更新所述图像特征提取单元的相关参数。
上述步骤S1-S5均是在神经网络训练阶段执行,执行上述步骤的目的是为了使图像特征学习所述点云特征的表达,从而使得图像特征具有表达三维空间的能力;当神经网络训练阶段结束,所述框架进入神经网络推理阶段,并通过执行以下步骤,以输出基于二维图像所述预测的三维目标的物体参数:
S6、二维检测器从所述图像中提取目标的候选边界框,并将所述候选边界框发送至所述图像特征提取单元;
S7、所述图像特征提取单元从所述候选边界框中提取第二图像特征,并将所述第二图像特征输出至所述感兴趣目标选择单元、以及所述三维目标参数预测单元:
S8、所述感兴趣目标选择单元根据所述第二图像特征,生成相应的二维包围盒,并输出所述相应的二维包围盒的中心坐标至所述三维目标参数预测单元。
S9、所述三维目标参数预测单元根据所述第二图像特征,生成相应的三维包围盒,并根据所述相应的三维包围盒、以及所述相应的二维包围盒的中心点坐标,计算并输出所述相应的三维包围盒的九个自由度参数;所述三维目标参数预测单元的线性层对所述第二图像特征进行映射,以生成所述相应的三维包围盒,然后,通过执行上述步骤S5-2到S5-4所描述的操作,从而计算所述相应的三维包围盒的九个自由度参数即中心点坐标坐标(x,y,z)、欧拉角参数(α,β,γ),以及所述相应的三维包围盒的长、宽、高,并将所得的九个自由度参数作为检测目标物体的三维参数的最终结果而输出。
本发明提供了一种基于多源数据知识迁移的三维物体检测框架,通过将图像特征提取单元所提取处的图像特征输出,使感兴趣目标选择单元根据图像特征,输出感兴趣目标的点云数据至点云特征提取单元,由点云特征所从点云数据中提取点云特征,然后,在知识迁移单元中,使图像特征学习点云特征并更新图像特征提取单元的参数,而三维目标参数预测单元根据所述图像特征和点云特征更新所述图像特征提取单元和点云特征提取单元的参数,最后,由更新后的图像特征提取单元重新提取图像特征至三维目标参数预测单元,由三维目标参数预测单元根据所述图像特征,推算并输入三维参数,由此,提供了基于二维图像的三维物体检测的检测精度。
工业实用性
本申请具有工业实用性。

Claims (7)

  1. 一种基于多源数据知识迁移的三维物体检测框架,其特征在于,包括以下步骤:
    S1、图像特征提取单元从图像中提取第一图像特征,并将所述第一图像特征输出至感兴趣目标选择单元、知识迁移单元和三维目标参数预测单元;
    S2、所述感兴趣目标选择单元根据所述第一图像特征,生成一系列的感兴趣目标的二维包围盒,以从点数空间中提取相应区域的点云数据以输出至点云特征单元;
    S3、所述点云特征提取单元从所述点云数据中提取点云特征,并将所述点云特征输出至所述知识迁移单元和三维目标参数预测单元;
    S4、所述知识迁移单元计算所述图像特征与所述点云特征两者之间的余弦相似度,并对所述余弦相似度进行处理,以更新所述图像特征提取单元的参数;
    S5、所述三维目标参数预测单元根据所述图像特征、所述点云特征生成三维包围盒,并输出所述三维包围盒的九个自由度参数,之后还通过反向传播更新所述图像特征提取单元、所述点云特征提取单元的参数;
    S6、二维检测器从所述图像中提取目标的候选边界框,并将所述候选边界框发送至所述图像特征提取单元;
    S7、所述图像特征提取单元从所述候选边界框中提取第二图像特征,并将所述第二图像特征输出至所述感兴趣目标选择单元、以及所述三维目标参数预测单元;
    S8、所述感兴趣目标选择单元根据所述第二图像特征,生成相应的二维包围盒,并输出所述相应的二维包围盒的中心坐标至所述三维目标参数预测单元;
    S9、所述三维目标参数预测单元根据所述第二图像特征、以及所述相应的二维包围盒的中心点坐标,生成相应的三维包围盒,并输出所述相应 的三维包围盒的九个自由度参数。
  2. 根据权利要求1所述一种基于多源数据知识迁移的三维物体检测框架,其特征在于:在所述步骤S1之前,还包括,利用二维检测器从所述图像中提取目标的候选边界框,以根据所述目标的候选边界框获取点云空间中相应区域的点云数据。
  3. 根据权利要求1所述一种基于多源数据知识迁移的三维物体检测框架,其特征在于:在所述步骤S1之前,还包括计算机系统接收测试人员针对所述图像所输入的标注标签。
  4. 根据权利要求3所述一种基于多源数据知识迁移的三维物体检测框架,其特征在于,所述步骤S2包括:
    S2-1、所述感兴趣目标选择单元从所述第一图像特征中检测出感兴趣目标,并利用RPN网络输出与所述感兴趣目标相应的一系列二维包围盒;
    S2-2、计算与所述感兴趣目标对应的所述二维包围盒、所述标注标签在所述二维图像上的IoU值,选取具有最大IoU值的标注标签作为所述感兴趣目标的真实标签,并在点数空间中,将与所述真实标签具有对应关系的区域的点云数据,提取并输出至所述点云特征提取单元,还输出所述二维包围盒的中心点坐标至所述三维目标参数预测单元。
  5. 根据权利要求4所述一种基于多源数据知识迁移的三维物体检测框架,其特征在于,所述步骤S3具体包括:
    S3-1、将所述点云数据输入至一维卷积神经网络模型;
    S3-2、通过残差连接提高所述一维卷积神经网络模型的训练性能,并通过注意力机制来增强所述一维卷积神经网络模型对点云空间全局信息的建模能力;
    S3-3、执行最大池化操作,以获取与所述感兴趣目标相对应的点云特征。
  6. 根据权利要求1所述一种基于多源数据知识迁移的三维物体检测框 架,其特征在于,所述步骤S4包括:
    S4-1、根据所接收的所述图像特征和所述点云特征,计算两者之间的余弦相似度;
    S4-2、将所述余弦相似度最小化;
    S4-3、计算并反向传播所述图像特征的梯度,以更新所述图像特征单元的二维卷积神经网络模型参数。
  7. 根据权利要求1所述一种基于多源数据知识迁移的三维物体检测框架,其特征在于,所述步骤S5包括:
    S5-1、所述三维目标参数预测单元的线性层将接收的所述图像特征、所述点云特征进行映射,以生成三维包围盒;
    S5-2、预测深度坐标z,然后通过以下式(1)预测所述三维包围盒坐标x和y,所述式(1)为:
    (1)
    Figure PCTCN2021074212-appb-100001
    Figure PCTCN2021074212-appb-100002
    其中,[f u,0,c u;0,f v,c v;0,0,1]为相机的内参数,(u,v)是所述感兴趣目标选择单元得到的二维包围盒的中心点像素坐标;
    S5-3、根据所述三维包围盒的中心点坐标,通过以下式(2)预测四元数,然后通过以下式(3)将所述四元数转换成欧拉角,以消除欧拉角的万向锁问题,所述式(2)、式(3)的表达式如下:
    (2)
    Figure PCTCN2021074212-appb-100003
    (3)
    Figure PCTCN2021074212-appb-100004
    其中,四元数表示为
    Figure PCTCN2021074212-appb-100005
    欧拉角表示为;roll,pitch,yaw=(α,β,γ);
    S5-4、根据所述三维包围盒的中心点坐标、欧拉角、以及所述图像特征、所述点云特征在线性层上的映射,计算所述三维包围盒的长、宽、高参数,并输出所述三维包围盒的中心点坐标(x,y,z)、欧拉角参数(α,β,γ),以及所述三维包围盒的长、宽、高参数。
    S5-5、所述三维目标参数预测单元根据所述第二图像特征,生成相应的三维包围盒,并根据所述相应的三维包围盒、以及所述相应的二维包围盒的中心点坐标,计算并输出所述相应的三维包围盒的九个自由度参数。
PCT/CN2021/074212 2020-04-09 2021-01-28 一种基于多源数据知识迁移的三维物体检测框架 WO2021203807A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/917,268 US20230260255A1 (en) 2020-04-09 2021-01-28 Three-dimensional object detection framework based on multi-source data knowledge transfer

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010272335.4 2020-04-09
CN202010272335.4A CN111507222B (zh) 2020-04-09 2020-04-09 一种基于多源数据知识迁移的三维物体检测框架

Publications (1)

Publication Number Publication Date
WO2021203807A1 true WO2021203807A1 (zh) 2021-10-14

Family

ID=71864729

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/074212 WO2021203807A1 (zh) 2020-04-09 2021-01-28 一种基于多源数据知识迁移的三维物体检测框架

Country Status (3)

Country Link
US (1) US20230260255A1 (zh)
CN (1) CN111507222B (zh)
WO (1) WO2021203807A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113978457A (zh) * 2021-12-24 2022-01-28 深圳佑驾创新科技有限公司 一种碰撞风险预测方法及装置
CN114882285A (zh) * 2022-05-23 2022-08-09 北方民族大学 一种基于信息增强的细粒度三维点云分类方法
CN115496910A (zh) * 2022-11-07 2022-12-20 中国测绘科学研究院 基于全连接图编码及双重扩张残差的点云语义分割方法
CN115984827A (zh) * 2023-03-06 2023-04-18 安徽蔚来智驾科技有限公司 点云感知方法、计算机设备及计算机可读存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111507222B (zh) * 2020-04-09 2023-07-07 中山大学 一种基于多源数据知识迁移的三维物体检测框架
CN112034488B (zh) * 2020-08-28 2023-05-02 京东科技信息技术有限公司 目标物体自动标注方法与装置
CN112650220B (zh) * 2020-12-04 2022-03-25 东风汽车集团有限公司 一种车辆自动驾驶方法、车载控制器及系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109523552A (zh) * 2018-10-24 2019-03-26 青岛智能产业技术研究院 基于视锥点云的三维物体检测方法
CN110045729A (zh) * 2019-03-12 2019-07-23 广州小马智行科技有限公司 一种车辆自动驾驶方法及装置
EP3525131A1 (en) * 2018-02-09 2019-08-14 Bayerische Motoren Werke Aktiengesellschaft Methods and apparatuses for object detection in a scene represented by depth data of a range detection sensor and image data of a camera
CN111507222A (zh) * 2020-04-09 2020-08-07 中山大学 一种基于多源数据知识迁移的三维物体检测框架

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110689008A (zh) * 2019-09-17 2020-01-14 大连理工大学 一种面向单目图像的基于三维重建的三维物体检测方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3525131A1 (en) * 2018-02-09 2019-08-14 Bayerische Motoren Werke Aktiengesellschaft Methods and apparatuses for object detection in a scene represented by depth data of a range detection sensor and image data of a camera
CN109523552A (zh) * 2018-10-24 2019-03-26 青岛智能产业技术研究院 基于视锥点云的三维物体检测方法
CN110045729A (zh) * 2019-03-12 2019-07-23 广州小马智行科技有限公司 一种车辆自动驾驶方法及装置
CN111507222A (zh) * 2020-04-09 2020-08-07 中山大学 一种基于多源数据知识迁移的三维物体检测框架

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113978457A (zh) * 2021-12-24 2022-01-28 深圳佑驾创新科技有限公司 一种碰撞风险预测方法及装置
CN114882285A (zh) * 2022-05-23 2022-08-09 北方民族大学 一种基于信息增强的细粒度三维点云分类方法
CN114882285B (zh) * 2022-05-23 2024-03-29 北方民族大学 一种基于信息增强的细粒度三维点云分类方法
CN115496910A (zh) * 2022-11-07 2022-12-20 中国测绘科学研究院 基于全连接图编码及双重扩张残差的点云语义分割方法
CN115984827A (zh) * 2023-03-06 2023-04-18 安徽蔚来智驾科技有限公司 点云感知方法、计算机设备及计算机可读存储介质
CN115984827B (zh) * 2023-03-06 2024-02-02 安徽蔚来智驾科技有限公司 点云感知方法、计算机设备及计算机可读存储介质

Also Published As

Publication number Publication date
US20230260255A1 (en) 2023-08-17
CN111507222B (zh) 2023-07-07
CN111507222A (zh) 2020-08-07

Similar Documents

Publication Publication Date Title
WO2021203807A1 (zh) 一种基于多源数据知识迁移的三维物体检测框架
Zhou et al. To learn or not to learn: Visual localization from essential matrices
CN110232350B (zh) 一种基于在线学习的实时水面多运动目标检测跟踪方法
CN106780631B (zh) 一种基于深度学习的机器人闭环检测方法
CN113065546B (zh) 一种基于注意力机制和霍夫投票的目标位姿估计方法及系统
CN113205466A (zh) 一种基于隐空间拓扑结构约束的残缺点云补全方法
Vaquero et al. Dual-branch CNNs for vehicle detection and tracking on LiDAR data
CN111998862B (zh) 一种基于bnn的稠密双目slam方法
JP2019008571A (ja) 物体認識装置、物体認識方法、プログラム、及び学習済みモデル
EP4057226A1 (en) Method and apparatus for estimating pose of device
US11948368B2 (en) Real-time target detection and 3d localization method based on single frame image
CN117252904B (zh) 基于长程空间感知与通道增强的目标跟踪方法与系统
CN114170410A (zh) 基于PointNet的图卷积与KNN搜索的点云零件级分割方法
CN117213470B (zh) 一种多机碎片地图聚合更新方法及系统
CN114596335A (zh) 一种无人艇目标检测追踪方法及系统
CN112069997B (zh) 一种基于DenseHR-Net的无人机自主着陆目标提取方法及装置
CN115049833A (zh) 一种基于局部特征增强和相似性度量的点云部件分割方法
Luo et al. Dual-stream VO: Visual Odometry Based on LSTM Dual-Stream Convolutional Neural Network.
CN115147720A (zh) 基于坐标注意力和长短距上下文的sar舰船检测方法
Zhang et al. Depth Monocular Estimation with Attention-based Encoder-Decoder Network from Single Image
CN113673540A (zh) 一种基于定位信息引导的目标检测方法
Chen et al. Underwater target detection and embedded deployment based on lightweight YOLO_GN
Zhang et al. Low-cost Mars terrain classification system based on coarse-grained annotation
CN117036408B (zh) 一种动态环境下联合多目标跟踪的物体slam方法
CN117058556B (zh) 基于自监督蒸馏的边缘引导sar图像舰船检测方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21784070

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21784070

Country of ref document: EP

Kind code of ref document: A1