WO2023035822A1 - Target detection method and apparatus, and device and storage medium - Google Patents

Target detection method and apparatus, and device and storage medium Download PDF

Info

Publication number
WO2023035822A1
WO2023035822A1 PCT/CN2022/110147 CN2022110147W WO2023035822A1 WO 2023035822 A1 WO2023035822 A1 WO 2023035822A1 CN 2022110147 W CN2022110147 W CN 2022110147W WO 2023035822 A1 WO2023035822 A1 WO 2023035822A1
Authority
WO
WIPO (PCT)
Prior art keywords
point cloud
feature
target detection
target
candidate
Prior art date
Application number
PCT/CN2022/110147
Other languages
French (fr)
Chinese (zh)
Inventor
徐辉
叶汇贤
Original Assignee
上海芯物科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海芯物科技有限公司 filed Critical 上海芯物科技有限公司
Publication of WO2023035822A1 publication Critical patent/WO2023035822A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed in the present application are a target detection method and apparatus, and a device and a storage medium. The method comprises: inputting point cloud data corresponding to a target training sample into an initial target detection model, wherein the initial target detection model comprises a feature extraction network and a feature fusion network; performing feature extraction on the point cloud data by means of the feature extraction network, so as to obtain a plurality of candidate detection boxes; inputting the candidate detection boxes into the feature fusion network, so as to obtain a prediction detection box obtained after the feature fusion network performs feature fusion on the candidate detection boxes on the basis of distance features of the candidate detection boxes; and adjusting a parameter of the initial target detection model according to a loss function value determined by means of the prediction detection box and a mark detection box which corresponds to the target training sample. By means of the technical solution of the present application, feature fusion can be performed on candidate detection boxes on the basis of the association characteristic between the detection boxes that is reflected by distance features of the candidate detection boxes, thereby improving the prediction precision of a target detection model.

Description

一种目标检测方法、装置、设备及存储介质A target detection method, device, equipment and storage medium
本申请要求在2021年09月13日提交中国专利局、申请号为202111066892.1的中国专利申请的优先权,该申请的全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application with application number 202111066892.1 filed with the China Patent Office on September 13, 2021, the entire content of which is incorporated herein by reference.
技术领域technical field
本申请实施例涉及计算机视觉技术领域,尤其涉及一种目标检测方法、装置、设备及存储介质。The embodiments of the present application relate to the technical field of computer vision, and in particular, to an object detection method, device, equipment, and storage medium.
背景技术Background technique
随着机器学习的发展,基于点云的三维目标检测广泛应用于自动驾驶系统、目标识别、三维重建。With the development of machine learning, point cloud-based 3D object detection is widely used in autonomous driving systems, object recognition, and 3D reconstruction.
在三维目标检测的过程中,确定检测框的轨迹是非常重要的步骤。采用常用的3DSSD三维检测模型输出的各个候选检测框之间相互独立,缺少全局信息,预测精度待提高。In the process of 3D object detection, determining the trajectory of the detection frame is a very important step. The candidate detection frames output by the commonly used 3DSSD three-dimensional detection model are independent of each other, lack of global information, and the prediction accuracy needs to be improved.
发明内容Contents of the invention
本申请实施例提供一种目标检测方法、装置、设备及存储介质,以实现能够基于各候选检测框的距离特征所反映的检测框之间的关联特性,对候选检测框进行特征融合,提高目标检测模型的预测精度。Embodiments of the present application provide a target detection method, device, device, and storage medium, so as to realize feature fusion of candidate detection frames based on the correlation characteristics between detection frames reflected by the distance features of each candidate detection frame, and improve target detection. Check the predictive accuracy of the model.
第一方面,本申请实施例提供了一种目标检测模型的训练方法,包括:In the first aspect, the embodiment of the present application provides a method for training a target detection model, including:
将目标训练样本对应的点云数据输入初始目标检测模型;其中,所述初始目标检测模型包括:特征提取网络和特征融合网络;Input the point cloud data corresponding to the target training sample into the initial target detection model; wherein, the initial target detection model includes: a feature extraction network and a feature fusion network;
通过所述特征提取网络对所述点云数据进行特征提取,得到多个候选检测框;performing feature extraction on the point cloud data through the feature extraction network to obtain a plurality of candidate detection frames;
将各所述候选检测框输入所述特征融合网络,获得所述特征融合网络基于各所述候选检测框的距离特征对各所述候选检测框进行特征融合后得到的预测检测框;Input each of the candidate detection frames into the feature fusion network, and obtain a predicted detection frame obtained by performing feature fusion on each of the candidate detection frames by the feature fusion network based on the distance features of each of the candidate detection frames;
根据所述预测检测框和所述目标训练样本对应的标记检测框确定的损失函数值调整所述初始目标检测模型的参数。Adjusting the parameters of the initial target detection model according to the loss function value determined by the predicted detection frame and the marked detection frame corresponding to the target training sample.
第二方面,本申请实施例提供了一种目标检测方法,包括:In the second aspect, the embodiment of the present application provides a target detection method, including:
获取待检测点云数据;Obtain point cloud data to be detected;
将所述待检测点云数据输入采用所述目标检测模型的训练方法训练得到的目标检测模型;Inputting the point cloud data to be detected into the target detection model obtained by training using the training method of the target detection model;
获取所述目标检测模型输出的目标检测框,基于所述目标检测框确定待检测点云数据的目标检测结果。Obtain the target detection frame output by the target detection model, and determine the target detection result of the point cloud data to be detected based on the target detection frame.
第三方面,本申请实施例还提供了一种目标检测模型的训练装置,该装置包括:In a third aspect, the embodiment of the present application also provides a training device for a target detection model, the device comprising:
输入模块,用于将目标训练样本对应的点云数据输入初始目标检测模型;其中,所述初始目标检测模型包括:特征提取网络和特征融合网络;The input module is used to input the point cloud data corresponding to the target training sample into the initial target detection model; wherein, the initial target detection model includes: a feature extraction network and a feature fusion network;
特征提取模块,用于通过所述特征提取网络对所述点云数据进行特征提取,得到多个候选检测框;A feature extraction module, configured to perform feature extraction on the point cloud data through the feature extraction network to obtain a plurality of candidate detection frames;
特征融合模块,用于将各所述候选检测框输入所述特征融合网络,获得所述特征融合网络基于各所述候选检测框的距离特征对各所述候选检测框进行特征融合后得到的预测检测框;A feature fusion module, configured to input each of the candidate detection frames into the feature fusion network, and obtain a prediction obtained by performing feature fusion on each of the candidate detection frames by the feature fusion network based on the distance characteristics of each of the candidate detection frames detection frame;
参数调整模块,用于根据所述预测检测框和所述目标训练样本对应的标记检测框确定的损失函数值调整所述初始目标检测模型的参数。A parameter adjustment module, configured to adjust the parameters of the initial target detection model according to the loss function value determined by the predicted detection frame and the marked detection frame corresponding to the target training sample.
第四方面,本申请实施例还提供了一种目标检测装置,该装置包括:In the fourth aspect, the embodiment of the present application also provides a target detection device, which includes:
获取模块,用于获取待检测点云数据;An acquisition module, configured to acquire point cloud data to be detected;
输入模块,用于将所述待检测点云数据输入采用所述目标检测模型的训练方法训练得到的目标检测模型;An input module, configured to input the point cloud data to be detected into the target detection model trained by the training method of the target detection model;
确定模块,用于获取所述目标检测模型输出的目标检测框,基于所述目标检测框确定待检测点云数据的目标检测结果。The determining module is configured to acquire a target detection frame output by the target detection model, and determine a target detection result of the point cloud data to be detected based on the target detection frame.
第五方面,本申请实施例还提供了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如本申请实施例中任一所述的目标检测模型的训练方法或目标检测方法。In the fifth aspect, the embodiment of the present application also provides a computer device, including a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the program, it implements the The training method of the target detection model or the target detection method described in any one of the embodiments.
第四方面,本申请实施例还提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如本申请实施例中任一所述的目标检测模型的训练方法或目标检测方法。In the fourth aspect, the embodiment of the present application also provides a computer-readable storage medium, on which a computer program is stored, and when the program is executed by a processor, the training of the target detection model as described in any one of the embodiments of the present application is realized. method or object detection method.
本申请实施例通过将目标训练样本对应的点云数据输入初始目标检测模型;其中,初始目标检测模型包括:特征提取网络和特征融合网络;通过特征提取网络对点云数据进行特征提取,得到多个候选检测框;将各候选检测框输入所述特征融合网络,获得特征融合网络基于各候选检测框的距离特征对各候 选检测框进行特征融合后得到的预测检测框;根据预测检测框和目标训练样本对应的标记检测框确定的损失函数值调整初始目标检测模型的参数,解决3DSSD三维检测模型输出的各个候选检测框之间相互独立,缺少全局信息的问题,实现基于各候选检测框的距离特征所反映的检测框之间的关联特性,对候选检测框进行特征融合,提高目标检测模型的预测精度。In the embodiment of the present application, the point cloud data corresponding to the target training sample is input into the initial target detection model; wherein, the initial target detection model includes: a feature extraction network and a feature fusion network; feature extraction is performed on the point cloud data through the feature extraction network, and multiple candidate detection frames; each candidate detection frame is input into the feature fusion network, and the feature fusion network obtains the predicted detection frame obtained after feature fusion of each candidate detection frame based on the distance feature of each candidate detection frame; according to the predicted detection frame and the target The loss function value determined by the marked detection frame corresponding to the training sample adjusts the parameters of the initial target detection model to solve the problem that the candidate detection frames output by the 3DSSD three-dimensional detection model are independent of each other and lack global information, and realize the distance based on each candidate detection frame The correlation characteristics between the detection frames reflected by the features, and the feature fusion of the candidate detection frames are performed to improve the prediction accuracy of the target detection model.
附图说明Description of drawings
图1是本申请实施例一中的一种目标检测模型的训练方法的流程图;FIG. 1 is a flowchart of a method for training a target detection model in Embodiment 1 of the present application;
图2A是本申请实施例二中的一种目标检测模型的训练方法的流程图;2A is a flowchart of a method for training a target detection model in Embodiment 2 of the present application;
图2B是本申请实施例二中的一种特征融合网络的结构示意图;FIG. 2B is a schematic structural diagram of a feature fusion network in Embodiment 2 of the present application;
图2C是相关技术中的一种体素中的中心点与相邻点的权重的示意图;FIG. 2C is a schematic diagram of the weights of a central point and adjacent points in a voxel in the related art;
图2D是本申请实施例二中的一种体素中的中心点与相邻点的权重的示意图;FIG. 2D is a schematic diagram of the weights of a central point and adjacent points in a voxel in Embodiment 2 of the present application;
图3是本申请实施例三中的一种目标检测方法的流程图;FIG. 3 is a flow chart of a target detection method in Embodiment 3 of the present application;
图4是本申请实施例四中的一种目标检测模型的训练装置的结构示意图;4 is a schematic structural diagram of a training device for a target detection model in Embodiment 4 of the present application;
图5是本申请实施例五中的一种目标检测装置的结构示意图;Fig. 5 is a schematic structural diagram of a target detection device in Embodiment 5 of the present application;
图6是本申请实施例六中的一种计算机设备的结构示意图。FIG. 6 is a schematic structural diagram of a computer device in Embodiment 6 of the present application.
具体实施方式Detailed ways
下面结合附图和实施例对本申请作进一步的详细说明。可以理解的是,此处所描述的具体实施例仅仅用于解释本申请,而非对本申请的限定。另外还需要说明的是,为了便于描述,附图中仅示出了与本申请相关的部分而非全部结构。The application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, but not to limit the present application. In addition, it should be noted that, for the convenience of description, only some structures related to the present application are shown in the drawings but not all structures.
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步定义和解释。同时,在本申请的描述中,术语“第一”、“第二”等仅用于区分描述,而不能理解为指示或暗示相对重要性。It should be noted that like numerals and letters denote similar items in the following figures, therefore, once an item is defined in one figure, it does not require further definition and explanation in subsequent figures. Meanwhile, in the description of the present application, the terms "first", "second" and the like are only used to distinguish descriptions, and cannot be understood as indicating or implying relative importance.
实施例一Embodiment one
图1为本申请实施例一提供的一种目标检测模型的训练方法的流程图,本实施例可适用于训练三维目标检测模型的情况,该方法可以由本申请实施例中的目标检测模型的训练装置来执行,该装置可采用软件和/或硬件的方式实现, 如图1所示,该方法具体包括如下步骤:Figure 1 is a flow chart of a method for training a target detection model provided in Embodiment 1 of the present application. This embodiment is applicable to the situation of training a three-dimensional target detection model. This method can be implemented by training the target detection model in the embodiment of the present application. device, the device can be implemented in the form of software and/or hardware, as shown in Figure 1, the method specifically includes the following steps:
S110,将目标训练样本对应的点云数据输入初始目标检测模型;其中,所述初始目标检测模型包括:特征提取网络和特征融合网络。S110. Input the point cloud data corresponding to the target training samples into an initial target detection model; wherein, the initial target detection model includes: a feature extraction network and a feature fusion network.
其中,目标训练样本是指用于训练初始目标检测模型的训练样本集中的样本,目标训练样本对应的点云数据可以通过激光雷达获取,点云数据是指在一个三维坐标系统中的一组向量的集合。点云数据可以包括几何位置、颜色信息或反射强度等信息。目标训练样本可以被标记检测框,所述检测框用于框出目标物体的位置,可以用于进一步识别目标物体的类型。Among them, the target training samples refer to the samples in the training sample set used to train the initial target detection model, the point cloud data corresponding to the target training samples can be obtained through the lidar, and the point cloud data refers to a set of vectors in a three-dimensional coordinate system collection. Point cloud data can include information such as geometric position, color information, or reflection intensity. The target training sample can be marked with a detection frame, and the detection frame is used to frame the position of the target object, which can be used to further identify the type of the target object.
其中,初始目标检测模型是指未训练或未训练完备的目标检测模型,用于确定目标物体的检测框,检测目标物体的类型;初始目标检测模型可以包括:特征提取网络和特征融合网络。所述特征提取网络用于提取目标训练样本对应的点云数据的特征,确定候选检测框;特征融合网络用于对候选检测框的特征进行融合得到预测检测框。Wherein, the initial target detection model refers to an untrained or incompletely trained target detection model, which is used to determine the detection frame of the target object and detect the type of the target object; the initial target detection model may include: a feature extraction network and a feature fusion network. The feature extraction network is used to extract the features of the point cloud data corresponding to the target training samples to determine the candidate detection frame; the feature fusion network is used to fuse the features of the candidate detection frame to obtain the predicted detection frame.
S120,通过所述特征提取网络对所述点云数据进行特征提取,得到多个候选检测框。S120. Perform feature extraction on the point cloud data through the feature extraction network to obtain a plurality of candidate detection frames.
其中,候选检测框是指通过特征提取网络确定的检测框,候选检测框的数量可以为一个或多个,候选检测框的数量由目标检测样本中目标物体的数量决定,一个目标物体对应一个检测框,各检测框之间相互独立。Among them, the candidate detection frame refers to the detection frame determined by the feature extraction network. The number of candidate detection frames can be one or more. The number of candidate detection frames is determined by the number of target objects in the target detection sample. One target object corresponds to one detection frame. frames, each detection frame is independent of each other.
具体的,通过所述特征提取网络对所述点云数据进行特征提取的步骤可以包括:对点云数据进行下采样、分组和组内采样点聚合。Specifically, the step of performing feature extraction on the point cloud data through the feature extraction network may include: performing down-sampling, grouping and aggregation of sampling points within the group on the point cloud data.
示例性的,对点云数据进行采样的方式可以是按照预设采样间隔对点云数据进行采样或者采用最远点采样方法FPS进行采样。对采样后的点云数据进行分组可以是定义一个单元长方体,所述单元长方体可以成为体素;对点云数据按照体素进行空间划分,落在每个体素内的点云数据属于一组数据。组内采样点聚合的方式可以是,基于多层感知机对每个体素内的点云数据进行聚合处理,得到中心点对应的特征向量,将中心点的特征向量通过回归器得到候选检测框。Exemplarily, the manner of sampling the point cloud data may be to sample the point cloud data according to a preset sampling interval or to use the furthest point sampling method FPS to perform sampling. Grouping the sampled point cloud data can be defined as a unit cuboid, and the unit cuboid can become a voxel; the point cloud data is spaced according to voxels, and the point cloud data falling in each voxel belongs to a group of data . The way to aggregate the sampling points in the group can be to aggregate the point cloud data in each voxel based on the multi-layer perceptron to obtain the feature vector corresponding to the center point, and pass the feature vector of the center point through the regressor to obtain the candidate detection frame.
在一个具体的例子中,每个体素内的点云数据进行聚合的方式可以是:将每个体素内的每个点云数据输入多层感知机,各点云数据的特征通过池化聚合确定聚合后中心点对应的特征向量,将中心点的特征向量通过回归器得到候选检测框。In a specific example, the way to aggregate the point cloud data in each voxel can be: input each point cloud data in each voxel into a multi-layer perceptron, and the characteristics of each point cloud data are determined by pooling aggregation After aggregation, the feature vectors corresponding to the central points are passed through the regressor to obtain candidate detection frames.
在另一个具体的例子中,每个体素内的点云数据进行聚合的方式可以是:计算每个体素内的点云数据的中心点;计算每个体素内的点云数据与中心点的偏移量;将中心点和偏移量输入多层感知机得到中心点特征,从而可以对中心 点特征进行回归得到候选检测框。In another specific example, the way to aggregate the point cloud data in each voxel can be: calculate the center point of the point cloud data in each voxel; calculate the deviation between the point cloud data in each voxel and the center point Shift; input the center point and offset into the multi-layer perceptron to obtain the center point feature, so that the center point feature can be regressed to obtain the candidate detection frame.
S130,将各所述候选检测框输入所述特征融合网络,获得所述特征融合网络基于各所述候选检测框的距离特征对各所述候选检测框进行特征融合后得到的预测检测框。S130. Input each of the candidate detection frames into the feature fusion network, and obtain a predicted detection frame obtained by performing feature fusion on each of the candidate detection frames by the feature fusion network based on the distance features of each of the candidate detection frames.
具体的,将各体素对应的候选检测框输入特征融合网络,确定各候选检测框之间的距离特征,根据距离特征对各候选检测框进行特征融合得到预测检测框。Specifically, input the candidate detection frames corresponding to each voxel into the feature fusion network, determine the distance features between the candidate detection frames, and perform feature fusion on each candidate detection frame according to the distance features to obtain a predicted detection frame.
示例性的,确定各候选检测框之间的距离特征的方式可以为:确定两个不同候选检测框之间的欧式距离得到距离矩阵。距离矩阵可以捕捉目标检测样本的全局特征。Exemplarily, the manner of determining the distance feature between each candidate detection frame may be: determine the Euclidean distance between two different candidate detection frames to obtain a distance matrix. The distance matrix can capture the global characteristics of object detection samples.
S140,根据所述预测检测框和所述目标训练样本对应的标记检测框确定的损失函数值调整所述初始目标检测模型的参数。S140. Adjust parameters of the initial target detection model according to a loss function value determined by the predicted detection frame and the marked detection frame corresponding to the target training sample.
其中,损失函数用于度量通过初始目标检测模型对目标训练样本对应的点云数据进行目标检测得到的目标检测框与目标训练样本对应的标记检测框之间的差异。Among them, the loss function is used to measure the difference between the target detection frame obtained by performing target detection on the point cloud data corresponding to the target training sample through the initial target detection model and the marked detection frame corresponding to the target training sample.
具体的,根据预测检测框和目标训练样本对应的标记检测框,计算损失函数值,基于损失函数值对初始目标检测模型的参数进行调整。初始目标检测模型中的参数可以包括未经训练的初始参数,也可以包括经过预训练的参数。Specifically, the loss function value is calculated according to the predicted detection frame and the marked detection frame corresponding to the target training sample, and the parameters of the initial target detection model are adjusted based on the loss function value. The parameters in the initial target detection model may include untrained initial parameters or pre-trained parameters.
本实施例的技术方案,通过将目标训练样本对应的点云数据输入初始目标检测模型;其中,所述初始目标检测模型包括:特征提取网络和特征融合网络;通过所述特征提取网络对所述点云数据进行特征提取,得到多个候选检测框;将各所述候选检测框输入所述特征融合网络,获得所述特征融合网络基于各所述候选检测框的距离特征对各所述候选检测框进行特征融合后得到的预测检测框;根据所述预测检测框和所述目标训练样本对应的标记检测框确定的损失函数值调整所述初始目标检测模型的参数,能够基于各候选检测框的距离特征所反映的检测框之间的关联特性,对候选检测框进行特征融合,提高目标检测模型的预测精度。In the technical solution of this embodiment, the point cloud data corresponding to the target training sample is input into the initial target detection model; wherein, the initial target detection model includes: a feature extraction network and a feature fusion network; through the feature extraction network, the The point cloud data is subjected to feature extraction to obtain a plurality of candidate detection frames; each of the candidate detection frames is input into the feature fusion network, and the feature fusion network is obtained based on the distance characteristics of each of the candidate detection frames for each of the candidate detection frames. The predicted detection frame obtained after the frame is subjected to feature fusion; the parameters of the initial target detection model can be adjusted according to the loss function value determined by the predicted detection frame and the marked detection frame corresponding to the target training sample, which can be based on each candidate detection frame The correlation characteristics between the detection frames reflected by the distance feature, and the feature fusion of the candidate detection frames are performed to improve the prediction accuracy of the target detection model.
实施例二Embodiment two
图2A为本申请实施例二中的一种目标检测模型的训练方法的流程图,本实施例以上述实施例为基础进行优化,对特征融合网络进一步细化。FIG. 2A is a flowchart of a method for training a target detection model in Embodiment 2 of the present application. This embodiment optimizes the feature fusion network based on the above-mentioned embodiment.
如图2A所示,本实施例的方法具体包括如下步骤:As shown in Figure 2A, the method of this embodiment specifically includes the following steps:
S210,将目标训练样本对应的点云数据输入初始目标检测模型;其中,初始目标检测模型包括:特征提取网络和特征融合网络;特征融合网络包括:通道融合层、距离确定层和特征关联层。S210, input the point cloud data corresponding to the target training samples into the initial target detection model; wherein, the initial target detection model includes: a feature extraction network and a feature fusion network; the feature fusion network includes: a channel fusion layer, a distance determination layer and a feature association layer.
具体的,如图2B所示,通道融合层用于在通道方向上对特征提取网络输出的候选检测框进行融合;距离确定层用于确定各候选检测框之间的距离矩阵;特征关联层用于将距离矩阵与融合后的候选检测框进行特征关联得到预测检测框。Specifically, as shown in Figure 2B, the channel fusion layer is used to fuse the candidate detection frames output by the feature extraction network in the channel direction; the distance determination layer is used to determine the distance matrix between each candidate detection frame; the feature association layer uses The feature correlation between the distance matrix and the fused candidate detection frame is carried out to obtain the predicted detection frame.
S220,通过特征提取网络对点云数据进行特征提取,得到多个候选检测框。S220, performing feature extraction on the point cloud data through a feature extraction network to obtain a plurality of candidate detection frames.
可选的,步骤S220包括:步骤S221、步骤S222和步骤S223。Optionally, step S220 includes: step S221, step S222 and step S223.
步骤S221,基于预设采样间隔对所述点云数据进行采样得到采样点云。Step S221: Sampling the point cloud data based on a preset sampling interval to obtain a sampled point cloud.
示例性的,按照预设采样间隔对点云数据进行下采样,例如每间隔N个点采样一个点云数据,则点云数据的数据量减少N倍。Exemplarily, the point cloud data is down-sampled according to a preset sampling interval, for example, one point cloud data is sampled every N points, and the data amount of the point cloud data is reduced by N times.
步骤S221,对所述采样点云对应的空间进行划分得到划分后每个体素对应的体素采样点云。Step S221 , dividing the space corresponding to the sampling point cloud to obtain a voxel sampling point cloud corresponding to each voxel after division.
其中,体素是体积元素的简称,是数字数据于三维空间分割上的最小单位。Among them, voxel is the abbreviation of volume element, which is the smallest unit of digital data in three-dimensional space division.
示例性的,体素可以是长、宽和高分别自定义为L0、W0、H0的长方体,所述长方体充满于采样点云对应的空间,基于长方体对采样点云对应的空间进行划分得到多个体素,落在每个体素内的采样点云为体素采样点云。Exemplarily, the voxel can be a cuboid whose length, width and height are respectively defined as L0, W0 and H0, and the cuboid is filled with the space corresponding to the sampling point cloud, and the space corresponding to the sampling point cloud is divided based on the cuboid to obtain multiple voxel, the sampling point cloud falling within each voxel is the voxel sampling point cloud.
步骤S223,对于各所述体素,基于多层感知机对所对应的体素采样点云进行特征聚合,得到所述体素采样点云对应的候选检测框。Step S223 , for each voxel, perform feature aggregation on the corresponding voxel sampling point cloud based on the multi-layer perceptron to obtain a candidate detection frame corresponding to the voxel sampling point cloud.
可选的,步骤S223,具体包括:Optionally, step S223 specifically includes:
针对每个体素所对应的体素采样点云,确定所述体素采样点云的中心点;确定所述体素采样点云中每个点与所述中心点的偏移量;将所述体素采样点云和所述偏移量输入所述多层感知机,确定各所述体素采样点云对应的中心点特征,将所述中心点特征进行回归得到候选检测框。For the voxel sampling point cloud corresponding to each voxel, determine the center point of the voxel sampling point cloud; determine the offset between each point in the voxel sampling point cloud and the center point; The voxel sampling point cloud and the offset are input into the multi-layer perceptron, the center point features corresponding to each of the voxel sampling point clouds are determined, and the center point features are regressed to obtain a candidate detection frame.
具体的,针对每个体素内的体素采样点云,确定体素采样点云的中心点,所述中心点可以是体素采样点云的重心。分别确定体素采样点云中每个点与所述中心点的偏移量,将所述体素采样点云和所述偏移量输入所述多层感知机,确定各所述体素采样点云对应的中心点特征,将中心点的特征进行回归得到候选检测框。Specifically, for the voxel sampling point cloud in each voxel, determine the center point of the voxel sampling point cloud, where the center point may be the center of gravity of the voxel sampling point cloud. Determine the offset of each point in the voxel sampling point cloud and the center point respectively, input the voxel sampling point cloud and the offset into the multi-layer perceptron, and determine each of the voxel sampling The center point feature corresponding to the point cloud, and the feature of the center point is regressed to obtain a candidate detection frame.
这样做的好处是:相关技术中将体素中的每个点输入同一个多层感知机,如图2C所示,X 1,X 2,X 3,X n,X c为体素中的5个点,点X c与所有相邻点对 中心点影响力的权重都是一致的,均为1。如图2D所示,本申请实施例中,将所述体素采样点云和所述偏移量输入多层感知机,使得相邻点对中心点的权重与偏移量相关,例如:点X c与点X 1的权重为X 1-X c,点X c与点X 2的权重为X 2-X c,点X c与点X 3的权重为X 3-X c,点X c与点X n的权重为X n-X cThe advantage of this is that in related technologies, each point in a voxel is input into the same multi-layer perceptron, as shown in Figure 2C, where X 1 , X 2 , X 3 , X n , and X c are points in a voxel 5 points, point X c and all adjacent points have the same weight of influence on the central point, both of which are 1. As shown in Figure 2D, in the embodiment of the present application, the voxel sampling point cloud and the offset are input into the multi-layer perceptron, so that the weight of the adjacent point to the central point is related to the offset, for example: point The weight of X c and point X 1 is X 1 -X c , the weight of point X c and point X 2 is X 2 -X c , the weight of point X c and point X 3 is X 3 -X c , point X c The weight with point X n is X n -X c .
示例性的,将体素采样点云和所述偏移量输入多层感知机,得到各体素采样点云对应的中心点特征具体为:Exemplarily, the voxel sampling point cloud and the offset are input into the multi-layer perceptron, and the center point features corresponding to each voxel sampling point cloud are specifically:
X m=max(ReLU(MLP([x j-x i;x i]))) X m =max(ReLU(MLP([x j -xi ; xi ])))
其中,X m为各体素采样点云对应的中心点特征,ReLU为激活函数,MLP为多层感知机,x i为体素采样点云中的第i个点,x j为体素采样点云中的第j个点。 Among them, X m is the center point feature corresponding to each voxel sampling point cloud, ReLU is the activation function, MLP is the multi-layer perceptron, x i is the i-th point in the voxel sampling point cloud, and x j is the voxel sampling The jth point in the point cloud.
S230,将各候选检测框输入通道融合层,获得通道融合层基于通道方向对候选检测框进行融合后得到的融合检测框。S230. Input each candidate detection frame into the channel fusion layer, and obtain a fusion detection frame obtained after the channel fusion layer fuses the candidate detection frames based on the channel direction.
可选的,步骤S230,包括:步骤S231和步骤S232。Optionally, step S230 includes: step S231 and step S232.
S231,通过第一卷积层对各所述候选检测框在通道方向上的特征进行融合后得到融化后的特征。S231. Obtain melted features after fusing the features of each candidate detection frame in the channel direction through the first convolutional layer.
S232,通过第二卷积层对所述融化后的特征进行卷积操作得到融合检测框。S232. Perform a convolution operation on the melted features through a second convolution layer to obtain a fusion detection frame.
示例性的,若候选检测框的像素点数为num_points,通道为C0,则候选检测框的维度为(num_points,C0),其中,C0是像素点的坐标。通道融合层包含第一卷积层和第二卷积层。其中,第一卷积层为一个1×C0的卷积核,第二卷积层为一个1×C1的卷积核,C1为第二卷积层的通道数,决定融合检测框的通道数。Exemplarily, if the number of pixel points of the candidate detection frame is num_points and the channel is C0, then the dimension of the candidate detection frame is (num_points, C0), where C0 is the coordinate of the pixel point. The channel fusion layer consists of a first convolutional layer and a second convolutional layer. Among them, the first convolution layer is a convolution kernel of 1×C0, the second convolution layer is a convolution kernel of 1×C1, and C1 is the number of channels of the second convolution layer, which determines the number of channels of the fusion detection frame .
通过第一卷积层对各所述候选检测框在通道方向上的特征进行融合后得到融合后的特征,其维度为(num_points,C0),通过第二卷积层对所述融化后的特征进行卷积操作,对各候选检测框进行融合得到融合检测框。Through the first convolutional layer, the features of each candidate detection frame in the channel direction are fused to obtain the fused feature, and its dimension is (num_points, C0), and the melted feature is obtained through the second convolutional layer Convolution operation is performed, and each candidate detection frame is fused to obtain a fusion detection frame.
S240,将各候选检测框输入距离确定层,获得各候选检测框之间的距离特征。S240. Input each candidate detection frame into the distance determination layer, and obtain distance features between each candidate detection frame.
具体的,将各候选检测框输入距离确定层,确定各候选检测框之间的欧氏距离,基于各候选检测框之间的欧氏距离确定距离矩阵;基于所述距离矩阵,对各候选检测框在距离方向上的特征进行卷积操作,得到距离特征。Specifically, each candidate detection frame is input into the distance determination layer, the Euclidean distance between each candidate detection frame is determined, and the distance matrix is determined based on the Euclidean distance between each candidate detection frame; based on the distance matrix, each candidate detection The features of the frame in the distance direction are convoluted to obtain distance features.
示例性的,计算两两候选检测框之间的欧式距离,若一个候选检测框的维度为(num_points,C0),则距离矩阵中元素的维度为(C0,num_points,num_points);采用二维卷积核对距离矩阵进行卷积得到维度为(num_points, C1)的距离特征。Exemplary, calculate the Euclidean distance between two candidate detection frames, if the dimension of a candidate detection frame is (num_points, C0), then the dimension of the element in the distance matrix is (C0, num_points, num_points); The product kernel performs convolution on the distance matrix to obtain a distance feature with a dimension of (num_points, C1).
S250,将所述融合检测框和所述距离特征输入所述特征关联层,获得特征关联层基于所述距离特征对所述融合检测框进行特征关联后得到的预测检测框。S250. Input the fused detection frame and the distance feature into the feature association layer, and obtain a predicted detection frame obtained after the feature association layer performs feature association on the fused detection frame based on the distance feature.
具体的,将距离特征和融合检测框进行卷积操作,实现基于距离特征对融合检测框进行特征关联,使融合检测框能够综合候选检测框的局部特征以及各候选检测框之间的距离所体现的全局特征。Specifically, the distance feature and the fusion detection frame are convolved to realize the feature association of the fusion detection frame based on the distance feature, so that the fusion detection frame can synthesize the local features of the candidate detection frame and the distance between the candidate detection frames. global features.
S260,根据所述预测检测框和所述目标训练样本对应的标记检测框确定的损失函数值调整所述初始目标检测模型的参数。S260. Adjust the parameters of the initial target detection model according to the loss function value determined by the predicted detection frame and the marked detection frame corresponding to the target training sample.
本实施例的技术方案,通过将目标训练样本对应的点云数据输入初始目标检测模型;其中,所述初始目标检测模型包括:特征提取网络和特征融合网络;通过所述特征提取网络对所述点云数据进行特征提取,得到多个候选检测框;将各所述候选检测框输入所述特征融合网络,获得所述特征融合网络基于各所述候选检测框的距离特征对各所述候选检测框进行特征融合后得到的预测检测框;根据所述预测检测框和所述目标训练样本对应的标记检测框确定的损失函数值调整所述初始目标检测模型的参数,能够基于各候选检测框的距离特征所反映的检测框之间的关联特性,对候选检测框进行特征融合,提高目标检测模型的预测精度。In the technical solution of this embodiment, the point cloud data corresponding to the target training sample is input into the initial target detection model; wherein, the initial target detection model includes: a feature extraction network and a feature fusion network; through the feature extraction network, the The point cloud data is subjected to feature extraction to obtain a plurality of candidate detection frames; each of the candidate detection frames is input into the feature fusion network, and the feature fusion network is obtained based on the distance characteristics of each of the candidate detection frames for each of the candidate detection frames. The predicted detection frame obtained after the frame is subjected to feature fusion; the parameters of the initial target detection model can be adjusted according to the loss function value determined by the predicted detection frame and the marked detection frame corresponding to the target training sample, which can be based on each candidate detection frame The correlation characteristics between the detection frames reflected by the distance feature, and the feature fusion of the candidate detection frames are performed to improve the prediction accuracy of the target detection model.
实施例三Embodiment three
图3为本申请实施例三提供的一种目标检测方法的流程图,本实施例可适用于对待检测点云数据进行目标检测的情况,该方法可以由本申请实施例中的目标检测装置来执行,该装置可采用软件和/或硬件的方式实现,如图3所示,该方法具体包括如下步骤:Fig. 3 is a flow chart of a target detection method provided in Embodiment 3 of the present application. This embodiment is applicable to the situation where target detection is performed on the point cloud data to be detected. This method can be executed by the target detection device in the embodiment of the present application , the device can be implemented in the form of software and/or hardware, as shown in Figure 3, the method specifically includes the following steps:
S310,获取待检测点云数据。S310, acquiring point cloud data to be detected.
具体的,所述待检测点云数据可以是通过三维激光扫描仪获取的包含待检测物体的点云数据。Specifically, the point cloud data to be detected may be point cloud data including the object to be detected obtained by a three-dimensional laser scanner.
S320,将所述待检测点云数据输入本申请实施一或实施例二所述的目标检测模型的训练方法训练得到的目标检测模型。S320. Input the point cloud data to be detected into the target detection model trained by the target detection model training method described in the first embodiment or the second embodiment of this application.
其中,目标检测模型是采用本申请实施例一或实施例二所述的目标检测模型的训练方法训练训练完备的模型。Wherein, the target detection model is a fully trained model trained by using the target detection model training method described in Embodiment 1 or Embodiment 2 of the present application.
具体的,将所述待检测点云数据输入目标检测模型,得到待检测物体对应的候选检测框。所述目标检测模型包括:特征提取网络和特征融合网络;所述 特征提取网络用于提取目标训练样本对应的点云数据的特征,确定候选检测框;特征融合网络用于对候选检测框的特征进行融合得到预测检测框。Specifically, the point cloud data to be detected is input into a target detection model to obtain a candidate detection frame corresponding to the object to be detected. The target detection model includes: a feature extraction network and a feature fusion network; the feature extraction network is used to extract the features of the point cloud data corresponding to the target training sample, and determine the candidate detection frame; the feature fusion network is used to extract the features of the candidate detection frame Fusion is performed to obtain the predicted detection frame.
S330,获取所述目标检测模型输出的目标检测框,基于所述目标检测框确定待检测点云数据的目标检测结果。S330. Acquire a target detection frame output by the target detection model, and determine a target detection result of the point cloud data to be detected based on the target detection frame.
其中,目标检测框用于表示待检测物体的位置,对目标检测框中的目标物体进行类别识别确定目标检测结果。对目标检测框中的目标物体进行类别识别的方式本申请实施例不作限制,例如可以采用语义分类模型识别目标检测框中的目标物体的类型。Wherein, the target detection frame is used to indicate the position of the object to be detected, and the category recognition is performed on the target object in the target detection frame to determine the target detection result. The method of classifying the target object in the target detection frame is not limited in this embodiment of the present application. For example, a semantic classification model may be used to identify the type of the target object in the target detection frame.
本实施例的技术方案,通过获取待检测点云数据;将所述待检测点云数据输入本申请实施一或实施例二所述的目标检测模型的训练方法训练得到的目标检测模型;获取所述目标检测模型输出的目标检测框,基于所述目标检测框确定待检测点云数据的目标检测结果,能够基于各候选检测框的距离特征所反映的检测框之间的关联特性,对候选检测框进行特征融合,提高目标检测模型的预测精度。In the technical solution of this embodiment, by obtaining the point cloud data to be detected; inputting the point cloud data to be detected into the target detection model obtained by training the target detection model training method described in the first embodiment or embodiment two of this application; The target detection frame output by the target detection model, based on the target detection frame to determine the target detection result of the point cloud data to be detected, can be based on the correlation characteristics between the detection frames reflected by the distance characteristics of each candidate detection frame, for candidate detection The frame is used for feature fusion to improve the prediction accuracy of the target detection model.
实施例四Embodiment four
图4为本申请实施例四提供的一种目标检测模型的训练装置的结构示意图。本实施例可适用于训练三维目标检测模型的情况,该装置可采用软件和/或硬件的方式实现,该装置可集成在任何提供目标检测模型的训练功能的设备中,如图4所示,所述目标检测模型的训练装置具体包括:输入模块410、特征提取模块420、特征融合模块430和参数调整模块440。FIG. 4 is a schematic structural diagram of a training device for a target detection model provided in Embodiment 4 of the present application. This embodiment can be applied to the situation of training a three-dimensional target detection model, and the device can be implemented in the form of software and/or hardware, and the device can be integrated in any device that provides the training function of the target detection model, as shown in Figure 4, The training device of the target detection model specifically includes: an input module 410 , a feature extraction module 420 , a feature fusion module 430 and a parameter adjustment module 440 .
其中,输入模块410,用于将目标训练样本对应的点云数据输入初始目标检测模型;其中,所述初始目标检测模型包括:特征提取网络和特征融合网络;特征提取模块420,用于通过所述特征提取网络对所述点云数据进行特征提取,得到多个候选检测框;特征融合模块430,用于将各所述候选检测框输入所述特征融合网络,获得所述特征融合网络基于各所述候选检测框的距离特征对各所述候选检测框进行特征融合后得到的预测检测框;参数调整模块440,用于根据所述预测检测框和所述目标训练样本对应的标记检测框确定的损失函数值调整所述初始目标检测模型的参数。Wherein, the input module 410 is used to input the point cloud data corresponding to the target training sample into the initial target detection model; wherein, the initial target detection model includes: a feature extraction network and a feature fusion network; the feature extraction module 420 is used to pass the The feature extraction network performs feature extraction on the point cloud data to obtain a plurality of candidate detection frames; the feature fusion module 430 is used to input each of the candidate detection frames into the feature fusion network, and obtain the feature fusion network based on each The distance feature of the candidate detection frame is the predicted detection frame obtained after performing feature fusion on each of the candidate detection frames; the parameter adjustment module 440 is used to determine according to the predicted detection frame and the marked detection frame corresponding to the target training sample The value of the loss function adjusts the parameters of the initial object detection model.
可选的,所述特征提取模块420,包括:Optionally, the feature extraction module 420 includes:
采样单元,用于基于预设采样间隔对所述点云数据进行采样得到采样点云;划分单元,用于对所述采样点云对应的空间进行划分得到划分后每个体素对应的体素采样点云;聚合单元,用于对于各所述体素,基于多层感知机对所对应 的体素采样点云进行特征聚合,确定各所述体素采样点云对应的候选检测框。The sampling unit is used to sample the point cloud data based on a preset sampling interval to obtain a sampling point cloud; the division unit is used to divide the space corresponding to the sampling point cloud to obtain the voxel sampling corresponding to each voxel after division point cloud; an aggregation unit, configured to, for each voxel, perform feature aggregation on the corresponding voxel sampling point cloud based on the multi-layer perceptron, and determine a candidate detection frame corresponding to each voxel sampling point cloud.
可选的,所述聚合单元,具体用于:Optionally, the polymerization unit is specifically used for:
针对每个体素所对应的体素采样点云,确定所述体素采样点云的中心点;确定所述体素采样点云中每个点与所述中心点的偏移量;将所述体素采样点云和所述偏移量输入所述多层感知机,确定所述体素采样点云对应的中心点特征,将所述中心点特征进行回归得到候选检测框。For the voxel sampling point cloud corresponding to each voxel, determine the center point of the voxel sampling point cloud; determine the offset between each point in the voxel sampling point cloud and the center point; The voxel sampling point cloud and the offset are input into the multi-layer perceptron, the center point feature corresponding to the voxel sampling point cloud is determined, and the center point feature is regressed to obtain a candidate detection frame.
可选的,所述特征融合模块430包括:Optionally, the feature fusion module 430 includes:
融合单元,用于将各所述候选检测框输入通道融合层,获得所述通道融合层基于通道方向对所述候选检测框进行融合后得到的融合检测框;确定单元,用于将各所述候选检测框输入距离确定层,获得各所述候选检测框之间的距离特征;关联单元,用于将所述融合检测框和所述距离特征输入所述特征关联层,获得特征关联层基于所述距离特征对所述融合检测框进行特征关联后得到的预测检测框。a fusion unit, configured to input each of the candidate detection frames into the channel fusion layer, and obtain a fusion detection frame obtained after the channel fusion layer fuses the candidate detection frames based on the channel direction; a determination unit, configured to input each of the Candidate detection frames are input into the distance determination layer to obtain distance features between each of the candidate detection frames; an association unit is used to input the fusion detection frame and the distance feature into the feature association layer, and obtain the feature association layer based on the The predicted detection frame is obtained after performing feature association on the fusion detection frame with the distance feature.
可选的,所述融合单元,具体用于:Optionally, the fusion unit is specifically used for:
通过第一卷积层对各所述候选检测框在通道方向上的特征进行融合后得到融化后的特征;通过第二卷积层对所述融化后的特征进行卷积操作得到融合检测框。The features of each candidate detection frame in the channel direction are fused by the first convolution layer to obtain melted features; the melted features are convoluted by the second convolution layer to obtain the fused detection frame.
上述产品可执行本申请任意实施例所提供的目标检测模型的训练方法,具备执行方法相应的功能模块和效果。The above-mentioned products can execute the training method of the target detection model provided by any embodiment of the present application, and have the corresponding functional modules and effects of the execution method.
实施例五Embodiment five
图5为本申请实施例五提供的一种目标检测装置的结构示意图。本实施例可适用于待检测点云数据进行目标检测的情况,该装置可采用软件和/或硬件的方式实现,该装置可集成在任何提供目标检测功能的设备中,如图5所示,所述目标检测装置具体包括:获取模块510、输入模块520和确定模块530。FIG. 5 is a schematic structural diagram of a target detection device provided in Embodiment 5 of the present application. This embodiment can be applied to the situation where the point cloud data to be detected is used for target detection. The device can be implemented in the form of software and/or hardware. The device can be integrated in any device that provides target detection functions, as shown in FIG. 5 , The object detection device specifically includes: an acquisition module 510 , an input module 520 and a determination module 530 .
获取模块510,用于获取待检测点云数据;输入模块520,用于将所述待检测点云数据输入采用实施例一或实施例二所述的目标检测模型的训练方法训练得到的目标检测模型;确定模块530,用于获取所述目标检测模型输出的目标检测框,基于所述目标检测框确定待检测点云数据的目标检测结果。The acquisition module 510 is used to acquire the point cloud data to be detected; the input module 520 is used to input the point cloud data to be detected into the target detection obtained by the training method of the target detection model described in Embodiment 1 or Embodiment 2 Model; determination module 530, configured to acquire the target detection frame output by the target detection model, and determine the target detection result of the point cloud data to be detected based on the target detection frame.
上述产品可执行本申请任意实施例所提供的目标检测方法,具备执行方法相应的功能模块和效果。The above-mentioned products can execute the target detection method provided by any embodiment of the present application, and have corresponding functional modules and effects for executing the method.
实施例六Embodiment six
图6为本申请实施例六提供的一种计算机设备的结构框图,如图4所示,该计算机设备包括处理器610、存储器620、输入装置630和输出装置640;计算机设备中处理器610的数量可以是一个或多个,图6中以一个处理器610为例;计算机设备中的处理器610、存储器620、输入装置630和输出装置640可以通过总线或其他方式连接,图6中以通过总线连接为例。Fig. 6 is a structural block diagram of a computer device provided in Embodiment 6 of the present application. As shown in Fig. 4, the computer device includes a processor 610, a memory 620, an input device 630, and an output device 640; The quantity can be one or more, and a processor 610 is taken as an example in FIG. Take the bus connection as an example.
存储器620作为一种计算机可读存储介质,可用于存储软件程序、计算机可执行程序以及模块,如本申请实施例中的目标检测模型的训练方法对应的程序指令/模块(例如,目标检测模型的训练装置中的输入模块410、特征提取模块420、特征融合模块430和参数调整模块440),或者如本申请实施例中的目标检测方法对应的程序指令/模块(例如目标检测模型装置中的获取模块510、输入模块520和确定模块530)。处理器610通过运行存储在存储器620中的软件程序、指令以及模块,从而执行计算机设备的各种功能应用以及数据处理,即实现上述的目标检测模型的训练方法或者目标检测方法。The memory 620, as a computer-readable storage medium, can be used to store software programs, computer-executable programs and modules, such as the program instructions/modules corresponding to the training method of the target detection model in the embodiment of the present application (for example, the target detection model's input module 410, feature extraction module 420, feature fusion module 430 and parameter adjustment module 440) in the training device), or program instructions/modules corresponding to the target detection method in the embodiment of the application (for example, acquisition in the target detection model device module 510, input module 520 and determination module 530). The processor 610 executes various functional applications and data processing of the computer device by running the software programs, instructions and modules stored in the memory 620, that is, realizes the above-mentioned object detection model training method or object detection method.
存储器620可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序;存储数据区可存储根据终端的使用所创建的数据等。此外,存储器620可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件、闪存器件、或其他非易失性固态存储器件。在一些实例中,存储器620可进一步包括相对于处理器610远程设置的存储器,这些远程存储器可以通过网络连接至计算机设备。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。The memory 620 may mainly include a program storage area and a data storage area, wherein the program storage area may store an operating system and an application program required by at least one function; the data storage area may store data created according to the use of the terminal, and the like. In addition, the memory 620 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage devices. In some examples, the memory 620 may further include memory located remotely from the processor 610, and these remote memories may be connected to the computer device through a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
输入装置630可用于接收输入的数字或字符信息,以及产生与计算机设备的用户设置以及功能控制有关的键信号输入。输出装置640可包括显示屏等显示设备。The input device 630 can be used to receive input numbers or character information, and generate key signal input related to user settings and function control of the computer device. The output device 640 may include a display device such as a display screen.
实施例七Embodiment seven
本申请实施例七提供了一种计算机可读存储介质,其上存储有计算机程序,该程序被处理器执行时实现如本申请所有申请实施例提供的目标检测模型的训练方法:将目标训练样本对应的点云数据输入初始目标检测模型;其中,所述初始目标检测模型包括:特征提取网络和特征融合网络;通过所述特征提取网络对所述点云数据进行特征提取,得到多个候选检测框;将各所述候选检测框输入所述特征融合网络,获得所述特征融合网络基于各所述候选检测框的距离特征对各所述候选检测框进行特征融合后得到的预测检测框;根据所述预测检 测框和所述目标训练样本对应的标记检测框确定的损失函数值调整所述初始目标检测模型的参数。Embodiment 7 of the present application provides a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the training method of the target detection model provided by all the embodiments of the present application is implemented: the target training sample The corresponding point cloud data is input into the initial target detection model; wherein, the initial target detection model includes: a feature extraction network and a feature fusion network; the feature extraction network is used to perform feature extraction on the point cloud data to obtain a plurality of candidate detections frame; input each of the candidate detection frames into the feature fusion network, and obtain the predicted detection frame obtained after the feature fusion network performs feature fusion on each of the candidate detection frames based on the distance features of each of the candidate detection frames; according to A loss function value determined by the predicted detection frame and the marked detection frame corresponding to the target training sample adjusts the parameters of the initial target detection model.
或者,实现如本申请所有申请实施例提供的目标检测方法:获取待检测点云数据;将所述待检测点云数据输入采用本申请实施例一或实施例二所述的目标检测模型的训练方法训练得到的目标检测模型;获取所述目标检测模型输出的目标检测框,基于所述目标检测框确定待检测点云数据的目标检测结果。Alternatively, implement the target detection method provided in all application embodiments of the present application: obtain the point cloud data to be detected; input the point cloud data to be detected into the training of the target detection model described in Embodiment 1 or Embodiment 2 of the present application The method trains the target detection model; obtains the target detection frame output by the target detection model, and determines the target detection result of the point cloud data to be detected based on the target detection frame.
可以采用一个或多个计算机可读的介质的任意组合。计算机可读介质可以是计算机可读信号介质或者计算机可读存储介质。计算机可读存储介质例如可以是但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本文件中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer-readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or any combination thereof. More specific examples (non-exhaustive list) of computer readable storage media include: electrical connections with one or more leads, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), Erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above. In this document, a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读存储介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行系统、装置或者器件使用或者与其结合使用的程序。A computer readable signal medium may include a data signal carrying computer readable program code in baseband or as part of a carrier wave. Such propagated data signals may take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. A computer-readable signal medium may also be any computer-readable medium other than a computer-readable storage medium, which can send, propagate, or transmit a program for use by or in conjunction with an instruction execution system, apparatus, or device. .
计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于无线、电线、光缆、RF等等,或者上述的任意合适的组合。Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
可以以一种或多种程序设计语言或其组合来编写用于执行本申请操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言,诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言,诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络包括局域网(LAN)或广域网(WAN)连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。Computer program codes for performing the operations of the present application may be written in one or more programming languages or combinations thereof, including object-oriented programming languages such as Java, Smalltalk, C++, and conventional A procedural programming language, such as the "C" language or similar programming language. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. Where a remote computer is involved, the remote computer may be connected to the user computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (e.g. via the Internet using an Internet Service Provider). .

Claims (10)

  1. 一种目标检测模型的训练方法,包括:A training method for a target detection model, comprising:
    将目标训练样本对应的点云数据输入初始目标检测模型;其中,所述初始目标检测模型包括:特征提取网络和特征融合网络;Input the point cloud data corresponding to the target training sample into the initial target detection model; wherein, the initial target detection model includes: a feature extraction network and a feature fusion network;
    通过所述特征提取网络对所述点云数据进行特征提取,得到多个候选检测框;performing feature extraction on the point cloud data through the feature extraction network to obtain a plurality of candidate detection frames;
    将各所述候选检测框输入所述特征融合网络,获得所述特征融合网络基于各所述候选检测框的距离特征对各所述候选检测框进行特征融合后得到的预测检测框;Input each of the candidate detection frames into the feature fusion network, and obtain a predicted detection frame obtained by performing feature fusion on each of the candidate detection frames by the feature fusion network based on the distance features of each of the candidate detection frames;
    根据所述预测检测框和所述目标训练样本对应的标记检测框确定的损失函数值调整所述初始目标检测模型的参数。Adjusting the parameters of the initial target detection model according to the loss function value determined by the predicted detection frame and the marked detection frame corresponding to the target training sample.
  2. 根据权利要求1所述的方法,其中,所述通过所述特征提取网络对所述点云数据进行特征提取,得到多个候选检测框,包括:The method according to claim 1, wherein said feature extraction is performed on said point cloud data through said feature extraction network to obtain a plurality of candidate detection frames, comprising:
    基于预设采样间隔对所述点云数据进行采样得到采样点云;Sampling the point cloud data based on a preset sampling interval to obtain a sampled point cloud;
    对所述采样点云对应的空间进行划分得到划分后每个体素对应的体素采样点云;Dividing the space corresponding to the sampling point cloud to obtain a voxel sampling point cloud corresponding to each voxel after division;
    对于各所述体素,基于多层感知机对所对应的体素采样点云进行特征聚合,确定各所述体素采样点云对应的候选检测框。For each of the voxels, the feature aggregation of the corresponding voxel sampling point cloud is performed based on the multi-layer perceptron, and the candidate detection frame corresponding to each of the voxel sampling point clouds is determined.
  3. 根据权利要求2所述的方法,其中,对于各所述体素,基于多层感知机对所对应的体素采样点云进行特征聚合,确定各所述体素采样点云对应的候选检测框,包括:The method according to claim 2, wherein, for each of the voxels, the corresponding voxel sampling point cloud is subjected to feature aggregation based on a multi-layer perceptron, and the candidate detection frame corresponding to each of the voxel sampling point clouds is determined ,include:
    针对每个体素所对应的体素采样点云,确定所述体素采样点云的中心点;For the voxel sampling point cloud corresponding to each voxel, determine the center point of the voxel sampling point cloud;
    确定所述体素采样点云中每个点与所述中心点的偏移量;Determine the offset between each point in the voxel sampling point cloud and the center point;
    将所述体素采样点云和所述偏移量输入所述多层感知机,确定所述体素采样点云对应的中心点特征,将所述中心点特征进行回归得到候选检测框。Inputting the voxel sampling point cloud and the offset into the multi-layer perceptron, determining the center point feature corresponding to the voxel sampling point cloud, and regressing the center point feature to obtain a candidate detection frame.
  4. 根据权利要求1所述的方法,其中,特征融合网络包括:通道融合层、距离确定层和特征关联层,相应的,将各所述候选检测框输入所述特征融合网络,获得所述特征融合网络基于各所述候选检测框的距离特征对各所述候选检测框进行特征融合后得到的预测检测框,包括:The method according to claim 1, wherein the feature fusion network comprises: a channel fusion layer, a distance determination layer and a feature association layer, correspondingly, each of the candidate detection frames is input into the feature fusion network to obtain the feature fusion The predicted detection frame obtained after the network performs feature fusion on each of the candidate detection frames based on the distance features of each of the candidate detection frames includes:
    将各所述候选检测框输入通道融合层,获得所述通道融合层基于通道方向对所述候选检测框进行融合后得到的融合检测框;Input each of the candidate detection frames into the channel fusion layer, and obtain the fusion detection frame obtained after the channel fusion layer fuses the candidate detection frames based on the channel direction;
    将各所述候选检测框输入距离确定层,获得各所述候选检测框之间的距离 特征;Each of the candidate detection frames is input into the distance determination layer to obtain the distance feature between each of the candidate detection frames;
    将所述融合检测框和所述距离特征输入所述特征关联层,获得特征关联层基于所述距离特征对所述融合检测框进行特征关联后得到的预测检测框。Inputting the fused detection frame and the distance feature into the feature association layer, and obtaining a predicted detection frame obtained by performing feature association on the fused detection frame by the feature association layer based on the distance feature.
  5. 根据权利要求4所述的方法,其中,将各所述候选检测框输入通道融合层,获得所述通道融合层基于通道方向对所述候选检测框进行融合后得到的融合检测框,包括:The method according to claim 4, wherein each of the candidate detection frames is input into a channel fusion layer to obtain a fused detection frame obtained after the channel fusion layer fuses the candidate detection frames based on the channel direction, including:
    通过第一卷积层对各所述候选检测框在通道方向上的特征进行融合后得到融化后的特征;After fusing the features of each candidate detection frame in the channel direction through the first convolutional layer, the melted features are obtained;
    通过第二卷积层对所述融化后的特征进行卷积操作得到融合检测框。A fusion detection frame is obtained by performing a convolution operation on the melted features through a second convolution layer.
  6. 一种目标检测方法,包括:A target detection method, comprising:
    获取待检测点云数据;Obtain point cloud data to be detected;
    将所述待检测点云数据输入采用权利要求1-5任一所述的目标检测模型的训练方法训练得到的目标检测模型;The point cloud data to be detected is input into the target detection model obtained by the training method of the target detection model described in any one of claims 1-5;
    获取所述目标检测模型输出的目标检测框,基于所述目标检测框确定待检测点云数据的目标检测结果。Obtain the target detection frame output by the target detection model, and determine the target detection result of the point cloud data to be detected based on the target detection frame.
  7. 一种目标检测模型的训练装置,包括:A training device for a target detection model, comprising:
    输入模块,用于将目标训练样本对应的点云数据输入初始目标检测模型;其中,所述初始目标检测模型包括:特征提取网络和特征融合网络;The input module is used to input the point cloud data corresponding to the target training sample into the initial target detection model; wherein, the initial target detection model includes: a feature extraction network and a feature fusion network;
    特征提取模块,用于通过所述特征提取网络对所述点云数据进行特征提取,得到多个候选检测框;A feature extraction module, configured to perform feature extraction on the point cloud data through the feature extraction network to obtain a plurality of candidate detection frames;
    特征融合模块,用于将各所述候选检测框输入所述特征融合网络,获得所述特征融合网络基于各所述候选检测框的距离特征对各所述候选检测框进行特征融合后得到的预测检测框;A feature fusion module, configured to input each of the candidate detection frames into the feature fusion network, and obtain a prediction obtained by performing feature fusion on each of the candidate detection frames by the feature fusion network based on the distance characteristics of each of the candidate detection frames detection frame;
    参数调整模块,用于根据所述预测检测框和所述目标训练样本对应的标记检测框确定的损失函数值调整所述初始目标检测模型的参数。A parameter adjustment module, configured to adjust the parameters of the initial target detection model according to the loss function value determined by the predicted detection frame and the marked detection frame corresponding to the target training sample.
  8. 一种目标检测装置,包括:A target detection device, comprising:
    获取模块,用于获取待检测点云数据;An acquisition module, configured to acquire point cloud data to be detected;
    输入模块,用于将所述待检测点云数据输入采用权利要求1-5任一所述的目标检测模型的训练方法训练得到的目标检测模型;The input module is used to input the point cloud data to be detected into the target detection model obtained by training the target detection model training method according to any one of claims 1-5;
    确定模块,用于获取所述目标检测模型输出的目标检测框,基于所述目标 检测框确定待检测点云数据的目标检测结果。Determining module is used to obtain the target detection frame output by the target detection model, and determines the target detection result of the point cloud data to be detected based on the target detection frame.
  9. 一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述程序时实现如权利要求1-5中任一所述的目标检测模型的训练方法,或者实现如权利要求6所述的目标检测方法。A computer device, comprising a memory, a processor, and a computer program stored on the memory and operable on the processor, wherein, when the processor executes the program, the method according to any one of claims 1-5 is realized The training method of target detection model, or realize the target detection method as claimed in claim 6.
  10. 一种计算机可读存储介质,存储有计算机程序,其中,所述程序被处理器执行时实现如权利要求1-5中任一所述的目标检测模型的训练方法,或者实现如权利要求6所述的目标检测方法。A computer-readable storage medium storing a computer program, wherein, when the program is executed by a processor, the method for training a target detection model according to any one of claims 1-5 is realized, or the method according to claim 6 is realized. The target detection method described above.
PCT/CN2022/110147 2021-09-13 2022-08-04 Target detection method and apparatus, and device and storage medium WO2023035822A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111066892.1 2021-09-13
CN202111066892.1A CN113807350A (en) 2021-09-13 2021-09-13 Target detection method, device, equipment and storage medium

Publications (1)

Publication Number Publication Date
WO2023035822A1 true WO2023035822A1 (en) 2023-03-16

Family

ID=78895165

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/110147 WO2023035822A1 (en) 2021-09-13 2022-08-04 Target detection method and apparatus, and device and storage medium

Country Status (2)

Country Link
CN (1) CN113807350A (en)
WO (1) WO2023035822A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116256720A (en) * 2023-05-09 2023-06-13 武汉大学 Underground target detection method and device based on three-dimensional ground penetrating radar and electronic equipment
CN116721399A (en) * 2023-07-26 2023-09-08 之江实验室 Point cloud target detection method and device for quantitative perception training
CN116882031A (en) * 2023-09-01 2023-10-13 临沂大学 Building model construction method and system based on point cloud
CN117173692A (en) * 2023-11-02 2023-12-05 安徽蔚来智驾科技有限公司 3D target detection method, electronic device, medium and driving device

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113807350A (en) * 2021-09-13 2021-12-17 上海芯物科技有限公司 Target detection method, device, equipment and storage medium
CN114005110B (en) * 2021-12-30 2022-05-17 智道网联科技(北京)有限公司 3D detection model training method and device, and 3D detection method and device
CN114565916A (en) * 2022-02-07 2022-05-31 苏州浪潮智能科技有限公司 Target detection model training method, target detection method and electronic equipment
CN115457496B (en) * 2022-09-09 2023-12-08 北京百度网讯科技有限公司 Automatic driving retaining wall detection method and device and vehicle

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111199206A (en) * 2019-12-30 2020-05-26 上海眼控科技股份有限公司 Three-dimensional target detection method and device, computer equipment and storage medium
CN111583337A (en) * 2020-04-25 2020-08-25 华南理工大学 Omnibearing obstacle detection method based on multi-sensor fusion
WO2020199834A1 (en) * 2019-04-03 2020-10-08 腾讯科技(深圳)有限公司 Object detection method and apparatus, and network device and storage medium
CN113807350A (en) * 2021-09-13 2021-12-17 上海芯物科技有限公司 Target detection method, device, equipment and storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020199834A1 (en) * 2019-04-03 2020-10-08 腾讯科技(深圳)有限公司 Object detection method and apparatus, and network device and storage medium
CN111199206A (en) * 2019-12-30 2020-05-26 上海眼控科技股份有限公司 Three-dimensional target detection method and device, computer equipment and storage medium
CN111583337A (en) * 2020-04-25 2020-08-25 华南理工大学 Omnibearing obstacle detection method based on multi-sensor fusion
CN113807350A (en) * 2021-09-13 2021-12-17 上海芯物科技有限公司 Target detection method, device, equipment and storage medium

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116256720A (en) * 2023-05-09 2023-06-13 武汉大学 Underground target detection method and device based on three-dimensional ground penetrating radar and electronic equipment
CN116256720B (en) * 2023-05-09 2023-10-13 武汉大学 Underground target detection method and device based on three-dimensional ground penetrating radar and electronic equipment
CN116721399A (en) * 2023-07-26 2023-09-08 之江实验室 Point cloud target detection method and device for quantitative perception training
CN116721399B (en) * 2023-07-26 2023-11-14 之江实验室 Point cloud target detection method and device for quantitative perception training
CN116882031A (en) * 2023-09-01 2023-10-13 临沂大学 Building model construction method and system based on point cloud
CN116882031B (en) * 2023-09-01 2023-11-17 临沂大学 Building model construction method and system based on point cloud
CN117173692A (en) * 2023-11-02 2023-12-05 安徽蔚来智驾科技有限公司 3D target detection method, electronic device, medium and driving device
CN117173692B (en) * 2023-11-02 2024-02-02 安徽蔚来智驾科技有限公司 3D target detection method, electronic device, medium and driving device

Also Published As

Publication number Publication date
CN113807350A (en) 2021-12-17

Similar Documents

Publication Publication Date Title
WO2023035822A1 (en) Target detection method and apparatus, and device and storage medium
WO2022083402A1 (en) Obstacle detection method and apparatus, computer device, and storage medium
US11042762B2 (en) Sensor calibration method and device, computer device, medium, and vehicle
US10885352B2 (en) Method, apparatus, and device for determining lane line on road
CN107576960B (en) Target detection method and system for visual radar space-time information fusion
EP3506161A1 (en) Method and apparatus for recovering point cloud data
WO2019056845A1 (en) Road map generating method and apparatus, electronic device, and computer storage medium
CN109858552B (en) Target detection method and device for fine-grained classification
US20220156483A1 (en) Efficient three-dimensional object detection from point clouds
US20210012089A1 (en) Object detection in point clouds
WO2022040562A1 (en) Object-centric three-dimensional auto labeling of point cloud data
WO2022206414A1 (en) Three-dimensional target detection method and apparatus
EP3703008A1 (en) Object detection and 3d box fitting
CN114445310B (en) 3D target detection method and device, electronic equipment and medium
CN115797736B (en) Training method, device, equipment and medium for target detection model and target detection method, device, equipment and medium
US20230213646A1 (en) Machine learning based object detection using radar information
CN114782785A (en) Multi-sensor information fusion method and device
CN115147333A (en) Target detection method and device
CN113052039A (en) Method, system and server for detecting pedestrian density of traffic network
CN110909656B (en) Pedestrian detection method and system integrating radar and camera
WO2021218346A1 (en) Clustering method and device
CN116170779A (en) Collaborative awareness data transmission method, device and system
CN115880659A (en) 3D target detection method and device for road side system and electronic equipment
CN116030330A (en) Target detection method and device
US11804042B1 (en) Prelabeling of bounding boxes in video frames