WO2023185069A1 - 物体检测方法及装置、计算机可读存储介质及无人车 - Google Patents

物体检测方法及装置、计算机可读存储介质及无人车 Download PDF

Info

Publication number
WO2023185069A1
WO2023185069A1 PCT/CN2022/136555 CN2022136555W WO2023185069A1 WO 2023185069 A1 WO2023185069 A1 WO 2023185069A1 CN 2022136555 W CN2022136555 W CN 2022136555W WO 2023185069 A1 WO2023185069 A1 WO 2023185069A1
Authority
WO
WIPO (PCT)
Prior art keywords
point cloud
image
fused
detected
detection
Prior art date
Application number
PCT/CN2022/136555
Other languages
English (en)
French (fr)
Inventor
王丹
刘浩
Original Assignee
北京京东乾石科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京京东乾石科技有限公司 filed Critical 北京京东乾石科技有限公司
Publication of WO2023185069A1 publication Critical patent/WO2023185069A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Definitions

  • the present disclosure relates to the field of artificial intelligence, especially to the field of autonomous driving, and in particular to object detection methods and devices, computer-readable storage media and unmanned vehicles.
  • Two-dimensional target detection has become a research hotspot, and various new methods continue to emerge.
  • ordinary two-dimensional target detection cannot provide all the information needed to perceive the environment.
  • Two-dimensional target detection can only provide the position and corresponding category of the target object in the two-dimensional picture. .
  • a self-driving vehicle must detect and identify obstacles that may hinder driving. It needs information such as the length, width, and height of the target object in order to make reasonable avoidance actions based on different obstacle types and states. Therefore, 3D target detection plays a crucial role in path planning and control.
  • the imaging results of monocular cameras, binocular cameras, lidar and other equipment are mainly used to detect three-dimensional objects in the environment.
  • an object detection method including:
  • the semantic label corresponding to each point in the fused point cloud is determined
  • the three-dimensional detection model is used to generate a three-dimensional detection frame of the object to be detected based on the fused point cloud and the semantic label corresponding to each point in the fused point cloud.
  • generating a virtual point cloud based on candidate detection frames includes:
  • a virtual point cloud is generated.
  • each point in the virtual point cloud corresponds to a grid in the grid of candidate detection frames, and the density of the virtual point cloud is greater than the density of the original point cloud.
  • determining the semantic label corresponding to each point in the fused point cloud by projecting the fused point cloud into the image coordinate system includes:
  • the semantic label corresponding to each point in the fused point cloud is determined.
  • the fusion of the original point cloud and the virtual point cloud to obtain the fused point cloud includes:
  • the original point cloud and the virtual point cloud are superimposed to obtain a fused point cloud.
  • the three-dimensional detection model is used to generate a three-dimensional detection frame of the object to be detected based on the fused point cloud and the semantic label corresponding to each point in the fused point cloud, including:
  • the three-dimensional detection model is used to generate a three-dimensional detection frame of the object to be detected based on the fusion information of point cloud and image.
  • splicing the coordinates of each point in the fused point cloud with the semantic label of the point to obtain the fusion information of the point cloud and the image includes: fusing the coordinates of each point in the point cloud with the point's semantic label.
  • the semantic labels are concatenated into an array as the fusion information of point cloud and image.
  • the three-dimensional detection model includes a first feature extraction network and a detection network.
  • the three-dimensional detection model includes a first feature extraction network and a detection network.
  • the first detection network is used to generate a three-dimensional detection frame of the object to be detected based on the characteristics of the fusion information of the point cloud and the image.
  • the semantic labels of the pixels in the image are the categories of each pixel generated by semantic segmentation of the image.
  • the original point cloud of the object to be detected is obtained by scanning the object to be detected using a lidar, and the image of the object to be detected is obtained by photographing the object to be detected using a camera.
  • generating a candidate detection frame of the object to be detected based on the original point cloud includes:
  • the second detection network is used to generate candidate detection frames of the object to be detected based on the characteristics of the original point cloud.
  • an object detection device including:
  • the acquisition module is configured to obtain the original point cloud of the object to be detected, the image of the object to be detected, and the semantic labels of the pixels in the image;
  • the candidate detection frame generation module is configured to generate a candidate detection frame of the object to be detected based on the original point cloud;
  • the virtual point cloud generation module is configured to generate a virtual point cloud based on the candidate detection frame
  • the point cloud fusion module is configured to fuse the original point cloud and the virtual point cloud to obtain a fused point cloud
  • a determination module configured to determine the semantic label corresponding to each point in the fused point cloud by projecting the fused point cloud into the image coordinate system
  • the three-dimensional detection frame generation module is configured to use the three-dimensional detection model to generate a three-dimensional detection frame of the object to be detected based on the fused point cloud and the semantic label corresponding to each point in the fused point cloud.
  • an object detection device including:
  • a processor coupled to the memory, the processor configured to execute the object detection method according to any embodiment of the present disclosure based on instructions stored in the memory.
  • a computer-readable storage medium on which computer program instructions are stored.
  • the instructions are executed by a processor, the object detection method as described in any embodiment of the present disclosure is implemented.
  • an unmanned vehicle configured with the object detection device as described in any embodiment of the present disclosure.
  • Figure 1 shows a flow chart of an object detection method according to some embodiments of the present disclosure
  • FIG. 2A and 2B illustrate a schematic diagram of a virtual point cloud generation method according to some embodiments of the present disclosure
  • Figure 3 shows a schematic diagram of determining semantic labels for points in a fused point cloud according to some embodiments of the present disclosure
  • Figure 4 shows a flow chart for generating a three-dimensional detection frame according to other embodiments of the present disclosure
  • Figure 5 shows a block diagram of an object detection device according to some embodiments of the present disclosure
  • Figure 6 shows a block diagram of an object detection device according to other embodiments of the present disclosure.
  • Figure 7 illustrates a block diagram of a computer system for implementing some embodiments of the present disclosure.
  • any specific values are to be construed as illustrative only and not as limiting. Accordingly, other examples of the exemplary embodiments may have different values.
  • the first algorithm is to first fuse the original 3D point cloud data and the original 2D image data together, so that the fused data contains both RGB information and 3D information, and then use a detector to detect the fused data and output the detection results.
  • This algorithm needs to use two models to extract features from data from different sensors and then perform fusion detection, which increases the complexity of the algorithm.
  • the sparsity of radar data and the density of image data the fusion results of the two are unsatisfactory, making it difficult to use for effective feature learning and reducing the accuracy of target detection.
  • the present disclosure proposes an object detection method and device, a computer-readable storage medium and an unmanned vehicle.
  • the present disclosure can solve the alignment problem of the three-dimensional point cloud and the two-dimensional image by projecting the point cloud into the image coordinate system and determining the corresponding relationship between the points in the point cloud and the semantic label of each point in the two-dimensional image coordinate system.
  • the present disclosure uses a three-dimensional detection model to generate a three-dimensional detection frame of the object to be detected based on the fused point cloud and the semantic label corresponding to each point in the fused point cloud.
  • a three-dimensional detection model uses only one three-dimensional detection model, which reduces the complexity of the model and facilitates deployment to the vehicle.
  • both the depth information of the 3D point cloud and the semantic label information of the 2D image are retained, providing more information for the 3D detection model and improving the efficiency of 3D target detection. Accuracy.
  • Figure 1 shows a flowchart of an object detection method according to some embodiments of the present disclosure.
  • the following image repair method is performed by an object detection device.
  • the object detection method includes steps S1 to S6.
  • step S1 the original point cloud of the object to be detected, the image of the object to be detected, and the semantic labels of the pixels in the image are obtained.
  • the original point cloud and image can be obtained by scanning the same object with different sensors.
  • Sensors can be lidar, monocular cameras, binocular cameras, etc.
  • the original point cloud of the object to be detected is obtained by scanning the object to be detected using a lidar, and the image of the object to be detected is obtained by photographing the object to be detected using a camera.
  • the semantic labels of pixels in the image are the categories of each pixel generated by semantic segmentation of the image.
  • a two-dimensional semantic segmentation model is used to perform semantic segmentation on a two-dimensional image at the pixel level.
  • the input of the model is the color information (RGB) of the three channels of red, green, and blue of the image
  • the output is the semantic information of each pixel. category.
  • pixels belonging to the same category can be classified into one category, and the label of the category to which each pixel belongs can be obtained.
  • pixels belonging to people are divided into one category, and pixels belonging to cars are divided into another category.
  • Semantic tags can be, for example, "obstacle”, “non-obstacle bicycle", “pedestrian”, and "background”.
  • step S2 a candidate detection frame of the object to be detected is generated based on the original point cloud.
  • generating a candidate detection frame of the object to be detected based on the original point cloud includes: using a second feature extraction network to extract features of the original point cloud; using the second detection network to generate a candidate detection frame based on the features of the original point cloud.
  • Candidate detection frames for the object to be detected includes: using a second feature extraction network to extract features of the original point cloud; using the second detection network to generate a candidate detection frame based on the features of the original point cloud.
  • PointPillar point pillar
  • VoxelNet voxel network
  • a three-dimensional target detection method based on BEV can be used to project the lidar point cloud on the X-Y coordinate plane and obtain the BEV feature map after discretization.
  • BEV images present point clouds in the form of images while retaining the spatial relationship of obstacles in the three-dimensional world. Then, based on this feature map, the detector is used to generate candidate detection frames.
  • NMS Non-Maximum Suppression
  • This disclosure uses point cloud data to extract the foreground points of possible candidate frames.
  • this method after the information of the candidate frames is subsequently integrated with the image data, it is then used for three-dimensional object detection, which can reduce the calibration between different sensors. Detection errors caused by inaccurate parameters.
  • step S3 a virtual point cloud is generated based on the candidate detection frames.
  • generating a virtual point cloud based on the candidate detection frame includes: generating a grid of candidate detection frames; generating a virtual point cloud based on the grid of candidate detection frames.
  • a virtual point cloud can be obtained by rasterizing the candidate detection frames.
  • each point in the virtual point cloud corresponds to a grid in the grid of candidate detection frames, and the density of the virtual point cloud is greater than the density of the original point cloud.
  • FIGS. 2A and 2B illustrate a schematic diagram of a virtual point cloud generation method according to some embodiments of the present disclosure.
  • each candidate detection frame is rasterized, as shown in Figure 2A.
  • the candidate detection frames are divided into grids of equal size, and then the coordinates of each grid are used as a virtual point.
  • the obtained The virtual points are shown as circles in Figure 2B.
  • the density of the virtual point cloud can be made greater than the density of the original point cloud by adjusting the size of the grid.
  • the virtual point cloud Since the density of the virtual point cloud is greater than the density of the original point cloud, the virtual point cloud also becomes dense, which solves the problem of the excessive density gap between the image and the point cloud and improves the accuracy of three-dimensional target detection of objects.
  • step S4 the original point cloud and the virtual point cloud are fused to obtain a fused point cloud.
  • fusing the original point cloud and the virtual point cloud to obtain the fused point cloud includes: superimposing the original point cloud and the virtual point cloud to obtain the fused point cloud.
  • the virtual point cloud includes information about candidate detection frames, which is equivalent to providing supervision information in the subsequent three-dimensional detection process and improving the accuracy of three-dimensional target detection of objects.
  • step S5 by projecting the fused point cloud into the image coordinate system, the semantic label corresponding to each point in the fused point cloud is determined.
  • determining the semantic label corresponding to each point in the fused point cloud includes: projecting the fused point cloud into the image coordinate system, determining the fused point cloud. The corresponding relationship between each point and each pixel in the image; according to the corresponding relationship between each point in the fused point cloud and each pixel in the image, and the semantic label of each pixel in the image, determine each point in the fused point cloud semantic labels corresponding to points
  • FIG. 3 illustrates a schematic diagram of determining semantic labels for points in a fused point cloud according to some embodiments of the present disclosure.
  • P x , P y , and P z respectively represent the coordinates of the points in the fused point cloud in the image coordinate system
  • K represents the internal parameter matrix of the camera
  • R and T respectively represent the translation transformation from the point cloud coordinate system to the image coordinate system.
  • Matrix and rotation transformation matrix are determined based on the calibration external parameters of the camera and point cloud.
  • z c represents the image depth information
  • u and v represent the projection of the points in the fused point cloud into the image coordinate system. The obtained horizontal and vertical coordinates of the pixel in the image coordinate system.
  • the correspondence between the three-dimensional fused point cloud and the two-dimensional image is obtained.
  • the semantic labels of the pixels in the image have been obtained in the previous step of image semantic segmentation, so that the points in the fused point cloud and the pixels in the image can be deduced.
  • the corresponding relationship between the semantic labels C of the points makes each point in the fused point cloud have a corresponding semantic label.
  • This disclosure uses calibration equations to project the fused point cloud onto the image.
  • the semantic label of each point of the fused point cloud is obtained, thereby combining the points in the three-dimensional point cloud with the two-dimensional
  • the semantic segmentation results of the image are aligned, avoiding the data alignment problem caused by the method of directly fusing the original point cloud data and the original RGB information of the two-dimensional image.
  • the present invention projects the point cloud into the image coordinate system, avoiding projection errors caused by the lack of image depth information.
  • the purpose of projecting the point cloud onto the image in this disclosure is to obtain the corresponding relationship between the point cloud and the semantic label.
  • What is subsequently sent to the three-dimensional detection model is the semantic label and the point cloud coordinates in the three-dimensional space, rather than the point cloud in the image coordinate system. Projected coordinates. Therefore, the final data used for three-dimensional detection includes the depth information of the point cloud, which improves the accuracy of detection.
  • step S6 the three-dimensional detection model is used to generate a three-dimensional detection frame of the object to be detected based on the fused point cloud and the semantic label corresponding to each point in the fused point cloud.
  • using a three-dimensional detection model to generate a three-dimensional detection frame of the object to be detected based on the fused point cloud and the semantic label corresponding to each point in the fused point cloud includes: combining the coordinates of each point in the fused point cloud with The semantic labels of the points are spliced to obtain the fusion information of the point cloud and the image; the three-dimensional detection model is used to generate the three-dimensional detection frame of the object to be detected based on the fusion information of the point cloud and the image.
  • splicing the coordinates of each point in the fused point cloud with the semantic label of the point to obtain the fusion information of the point cloud and the image includes: fusing the coordinates of each point in the point cloud with the semantic label of the point The labels are concatenated into an array as fusion information of point cloud and image.
  • the fusion information can be obtained by splicing (cat) the coordinates of the points in the fused point cloud in the spatial coordinate system with the semantic labels obtained in the semantic segmentation of the image.
  • the splicing here refers to along the existing A certain dimension of the data is spliced, and the total dimension of the data remains unchanged after the operation. For example, follow the following formula to splice the coordinates of each point in the fused point cloud and the semantic label of the point:
  • P i is the fusion information of each point
  • (P x , P y , P z ) is the coordinate of the point in the spatial coordinate system
  • C is the semantic label of the point.
  • the three-dimensional detection model includes a first feature extraction network and a detection network.
  • the three-dimensional detection model includes a first feature extraction network and a detection network.
  • the three-dimensional detection model is used to generate the object to be detected based on the fusion information of the point cloud and the image.
  • the three-dimensional detection frame includes: using the first feature extraction network to extract features of the fusion information of the point cloud and the image; using the first detection network to generate the three-dimensional detection frame of the object to be detected based on the features of the fusion information of the point cloud and the image.
  • the 3D detection model includes a backbone network for feature extraction and a detection network for generating detection frames.
  • a backbone network for feature extraction
  • a detection network for generating detection frames.
  • common point cloud representation methods such as PointPillar, voxelNet, etc.
  • the fusion information of point cloud and image is processed into a data structure that allows the network to learn, and then a feature extraction network (backbone network) is used to learn the processed two
  • the fusion information of the point cloud and image of each sensor is finally connected to the detection head of a detection task to generate a three-dimensional detection frame and obtain the final 3D detection result.
  • Figure 4 shows a flowchart of generating a three-dimensional detection frame according to other embodiments of the present disclosure.
  • the steps to generate a three-dimensional detection frame are as follows: first perform a voxelization operation on the original point cloud, send the result of the voxelization operation to the three-dimensional backbone network, extract features, and use the detection head (detection head) ), generate a proposal (three-dimensional detection frame) based on the extracted features; use the non-maximum suppression method to remove redundant three-dimensional detection frames to obtain candidate detection frames; perform a rasterization operation on the candidate detection frames to obtain a virtual point cloud , fuse the original point cloud and the virtual point cloud to obtain the fused point cloud; perform semantic segmentation on the two-dimensional image to obtain the semantic label; determine the corresponding point of each point in the fused point cloud by projecting the fused point cloud into the image coordinate system semantic labels; by splicing the coordinates of each point in the fused point cloud with the semantic label of the point, the fusion information of the point cloud and the image is obtained; using common point cloud representation methods, the fusion information of the point
  • This disclosure combines the semantic information (semantic tags) of the fused point cloud and the image, performs basic encoding operations, and enters the 3D target detector to obtain the final detection result.
  • the final detection stage only one 3D detection model is used to obtain the 3D detection results. Using only one model can reduce the complexity of the model and facilitate the deployment of the model on the vehicle side.
  • the present disclosure retains both the depth information of the three-dimensional point cloud and the semantic label information of the two-dimensional image in the final data used for object detection, providing more information for the three-dimensional detection model and improving the accuracy of object detection. Rate.
  • Figure 5 shows a block diagram of an object detection device according to some embodiments of the present disclosure.
  • the object detection device 5 includes an acquisition module 51 , a candidate detection frame generation module 52 , a virtual point cloud generation module 53 , a point cloud fusion module 54 , a determination module 55 and a three-dimensional detection frame generation module 56 .
  • the acquisition module 51 is configured to acquire the original point cloud of the object to be detected, the image of the object to be detected, and the semantic labels of the pixels in the image, for example, performing step S1 as shown in Figure 1 .
  • the candidate detection frame generation module 52 is configured to generate a candidate detection frame of the object to be detected based on the original point cloud, for example, performing step S2 as shown in Figure 1.
  • the virtual point cloud generation module 53 is configured to generate a virtual point cloud according to the candidate detection frame, for example, performing step S3 as shown in FIG. 1 .
  • the point cloud fusion module 54 is configured to fuse the original point cloud and the virtual point cloud to obtain a fused point cloud, for example, performing step S4 as shown in FIG. 1 .
  • the determination module 55 is configured to determine the semantic label corresponding to each point in the fused point cloud by projecting the fused point cloud into the image coordinate system, for example, performing step S5 as shown in Figure 1 .
  • the three-dimensional detection frame generation module 56 is configured to use the three-dimensional detection model to generate a three-dimensional detection frame of the object to be detected based on the fused point cloud and the semantic label corresponding to each point in the fused point cloud. For example, perform the steps shown in Figure 1 S6.
  • the object detection device of the present disclosure by projecting the point cloud into the image coordinate system, the corresponding relationship between the points in the point cloud and the semantic label of each point in the two-dimensional image coordinate system is determined, and it can solve the problem of three-dimensional point cloud and Two-dimensional image alignment problem.
  • the object detection device of the present disclosure only utilizes a three-dimensional detection model, which reduces the complexity of the model and facilitates deployment to the vehicle.
  • the final data used for object detection both the depth information of the three-dimensional point cloud and the semantic label information of the two-dimensional image are retained, which provides more information for the three-dimensional detection model and can improve the accuracy of object detection. Rate.
  • FIG. 6 shows a block diagram of an object detection device according to other embodiments of the present disclosure.
  • the object detection device 6 includes a memory 61; and a processor 62 coupled to the memory 61.
  • the memory 61 is used to store instructions for executing corresponding embodiments of the object detection method.
  • the processor 62 is configured to execute the object detection method in any embodiment of the present disclosure based on instructions stored in the memory 61 .
  • the embodiment of the present disclosure provides an unmanned vehicle equipped with an object detection device 5 or an object detection device 6 .
  • the accuracy of object detection can be improved, thereby enabling the unmanned vehicle to avoid obstacles based on the detected objects, thereby improving the safety of the unmanned vehicle driving.
  • Figure 7 illustrates a block diagram of a computer system for implementing some embodiments of the present disclosure.
  • Computer system 70 may be embodied in the form of a general purpose computing device.
  • Computer system 70 includes memory 710, processor 720, and bus 700 connecting various system components.
  • Memory 710 may include, for example, system memory, non-volatile storage media, or the like.
  • System memory stores, for example, operating systems, applications, boot loaders, and other programs.
  • System memory may include volatile storage media such as random access memory (RAM) and/or cache memory.
  • RAM random access memory
  • the non-volatile storage medium stores, for example, instructions for executing corresponding embodiments of at least one of the object detection methods.
  • Non-volatile storage media includes but is not limited to disk storage, optical storage, flash memory, etc.
  • Processor 720 may be implemented as a general purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA) or other programmable logic device, discrete hardware components such as discrete gates or transistors.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • each module such as the judgment module and the determination module, can be implemented by instructions executing corresponding steps in a central processing unit (CPU) running memory, or by dedicated circuits executing corresponding steps.
  • Bus 700 may use any of a variety of bus structures.
  • bus structures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MCA) bus, and Peripheral Component Interconnect (PCI) bus.
  • ISA Industry Standard Architecture
  • MCA Micro Channel Architecture
  • PCI Peripheral Component Interconnect
  • the computer system 70 may also include an input/output interface 730, a network interface 740, a storage interface 750, and the like. These interfaces 730, 740, 750, the memory 710 and the processor 720 may be connected through a bus 700.
  • the input and output interface 730 can provide a connection interface for input and output devices such as a monitor, mouse, and keyboard.
  • Network interface 740 provides a connection interface for various networked devices.
  • the storage interface 750 provides a connection interface for external storage devices such as floppy disks, USB disks, and SD cards.
  • These computer readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable device to produce a machine, such that execution of the instructions by the processor produces implementations in one or more blocks of the flowcharts and/or block diagrams.
  • a device with specified functions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable device to produce a machine, such that execution of the instructions by the processor produces implementations in one or more blocks of the flowcharts and/or block diagrams.
  • Computer-readable program instructions which may also be stored in computer-readable memory, cause the computer to operate in a specific manner to produce an article of manufacture, including implementing the functions specified in one or more blocks of the flowcharts and/or block diagrams. instructions.
  • the disclosure may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects.
  • the accuracy of object detection is improved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

本公开涉及物体检测方法及装置、计算机可读存储介质及无人车,涉及人工智能、智能驾驶领域。物体检测方法包括:获取待检测物体的原始点云、待检测物体的图像,以及图像中的像素点的语义标签;根据原始点云,生成待检测物体的候选检测框;根据候选检测框,生成虚拟点云;将原始点云与虚拟点云融合,得到融合点云;通过将融合点云投影到图像坐标系中,确定融合点云中每个点对应的语义标签;利用三维检测模型,根据融合点云以及融合点云中每个点对应的语义标签,生成待检测物体的三维检测框。根据本公开,提高了物体检测的准确性。

Description

物体检测方法及装置、计算机可读存储介质及无人车
相关申请的交叉引用
本申请是以中国申请号为202210339110.5,申请日为2022年04月01日的申请为基础,并主张其优先权,该中国申请的公开内容在此作为整体引入本申请中。
技术领域
本公开涉及人工智能领域,尤其涉及自动驾驶领域,特别涉及物体检测方法及装置、计算机可读存储介质及无人车。
背景技术
随着将卷积神经网络引入到目标检测领域,二维目标检测已经称为研究的热点,各种新的方法不断涌现。但是在无人驾驶、机器人、增强现实的应用场景下,普通二维目标检测并不能提供感知环境所需要的全部信息,二维目标检测仅能提供目标物体在二维图片中的位置和对应类别。
但是在真实的三维世界中,物体都是有三维形状的。例如在自动驾驶场景下,自动驾驶的车辆必须对可能阻碍行驶的障碍物进行检测识别,需要有目标物体的长宽高等信息,以便根据不同的障碍物类型和状态做出合理的回避动作。因此,三维目标检测对于路径规划和控制具有至关重要的作用。
目前主要是利用单目相机、双目相机、激光雷达等设备的成像结果,来对环境中的三维物体进行三维目标检测。
发明内容
根据本公开的第一方面,提供了一种物体检测方法,包括:
获取待检测物体的原始点云、待检测物体的图像,以及图像中的像素点的语义标签;
根据原始点云,生成待检测物体的候选检测框;
根据候选检测框,生成虚拟点云;
将原始点云与虚拟点云融合,得到融合点云;
通过将融合点云投影到图像坐标系中,确定融合点云中每个点对应的语义标签;
利用三维检测模型,根据融合点云以及融合点云中每个点对应的语义标签,生成待检测物体的三维检测框。
在一些实施例中,所述根据候选检测框,生成虚拟点云,包括:
生成候选检测框的栅格;
根据候选检测框的栅格,生成虚拟点云。
在一些实施例中,虚拟点云中的每个点与候选检测框的栅格中的一个网格对应,虚拟点云的密度大于原始点云的密度。
在一些实施例中,所述通过将融合点云投影到图像坐标系中,确定融合点云中每个点对应的语义标签,包括:
通过将融合点云投影到图像坐标系中,确定融合点云中每个点与图像中每个像素点的对应关系;
根据融合点云中每个点与图像中每个像素点的对应关系,以及图像中每个像素点的语义标签,确定融合点云中每个点对应的语义标签。
在一些实施例中,所述将原始点云与虚拟点云融合,得到融合点云,包括:
将原始点云与虚拟点云叠加,得到融合点云。
在一些实施例中,所述利用三维检测模型,根据融合点云以及融合点云中每个点对应的语义标签,生成待检测物体的三维检测框,包括:
将融合点云中每个点的坐标与该点的语义标签拼接,得到点云和图像的融合信息;
利用三维检测模型,根据点云和图像的融合信息,生成待检测物体的三维检测框。
在一些实施例中,所述将融合点云中每个点的坐标与该点的语义标签拼接,得到点云和图像的融合信息,包括:将融合点云中每个点的坐标与该点的语义标签串联成数组,作为点云和图像的融合信息。
在一些实施例中,所述三维检测模型包括第一特征提取网络和检测网络,所述三维检测模型包括第一特征提取网络和检测网络,所述利用三维检测模型,根据点云和图像的融合信息,生成待检测物体的三维检测框,包括:
利用第一特征提取网络,提取点云和图像的融合信息的特征;
利用第一检测网络,根据点云和图像的融合信息的特征,生成待检测物体的三维检测 框。
在一些实施例中,所述图像中的像素点的语义标签为对图像进行语义分割生成的每个像素点的类别。
在一些实施例中,所述待检测物体的原始点云通过激光雷达对待检测物体扫描得到、待检测物体的图像通过相机对待检测物体拍摄得到。
在一些实施例中,所述根据原始点云,生成待检测物体的候选检测框,包括:
利用第二特征提取网络,提取原始点云的特征;
利用第二检测网络,根据原始点云的特征,生成待检测物体的候选检测框。
根据本公开的第二方面,提供了一种物体检测装置,包括:
获取模块,被配置为获取待检测物体的原始点云、待检测物体的图像,以及图像中的像素点的语义标签;
候选检测框生成模块,被配置为根据原始点云,生成待检测物体的候选检测框;
虚拟点云生成模块,被配置为根据候选检测框,生成虚拟点云;
点云融合模块,被配置为将原始点云与虚拟点云融合,得到融合点云;
确定模块,被配置为通过将融合点云投影到图像坐标系中,确定融合点云中每个点对应的语义标签;
三维检测框生成模块,被配置为利用三维检测模型,根据融合点云以及融合点云中每个点对应的语义标签,生成待检测物体的三维检测框。
根据本公开的第三方面,提供了一种物体检测装置,包括:
存储器;以及
耦接至所述存储器的处理器,所述处理器被配置为基于存储在所述存储器的指令,执行如本公开任一实施例所述的物体检测方法。
根据本公开的第四方面,提供了一种计算机可读存储介质,其上存储有计算机程序指令,该指令被处理器执行时实现如本公开任一实施例所述的物体检测方法。
根据本公开的第五方面,提供了一种无人车,配置有如本公开任一实施例所述的物体检测装置。
附图说明
构成说明书的一部分的附图描述了本公开的实施例,并且连同说明书一起用于解释本公开的原理。
参照附图,根据下面的详细描述,可以更加清楚地理解本公开,其中:
图1示出了根据本公开一些实施例的物体检测方法的流程图;
图2A和2B示出了本公开一些实施例的虚拟点云生成方法的示意图;
图3示出了根据本公开一些实施例的确定融合点云中的点的语义标签的示意图;
图4示出了根据本公开另一些实施例的生成三维检测框的流程图;
图5示出了根据本公开一些实施例的物体检测装置的框图;
图6示出了根据本公开另一些实施例的物体检测装置的框图;
图7示出用于实现本公开一些实施例的计算机系统的框图。
具体实施方式
现在将参照附图来详细描述本公开的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
在这里示出和讨论的所有示例中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它示例可以具有不同的值。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
相关技术中,三维目标检测主要有两类算法。第一种算法是先将原始3D点云的数据和原始2D图像的数据融合到一起,使得融合数据既有RGB信息,也有三维信息,然后再用检测器对融合数据进行检测,输出检测结果。但是这种算法存在几个问题。首先,因为 点云的成像方式和图像的成像方式是从不同的视角进行的,所以在将点云数据和图像数据融合的时候,点云数据和图像数据难以对齐。其次,这种算法需要使用两个模型分别对不同传感器的数据进行特征提取,再进行融合检测,增大了算法的复杂度。最后,由于雷达数据的稀疏性和图像数据的稠密性,导致二者的融合结果不理想,难以用于有效的特征学习,降低了目标检测的准确率。
为了解决上述问题,本公开提出了一种物体检测方法及装置、计算机可读存储介质及无人车。本公开通过将点云投影到图像坐标系中,确定点云中的点与二维的图像坐标系中每个点的语义标签的对应关系,能够解决三维点云与二维图像的对齐问题。
此外,本公开利用一个三维检测模型,根据融合点云以及融合点云中每个点对应的语义标签,生成待检测物体的三维检测框。一方面只利用了一个三维检测模型,降低了模型的复杂度,有利于部署到车端。另一方面,在最后用于物体检测的数据中,既保留了三维点云的深度信息,也包括二维图像的语义标签信息,为三维检测模型提供更多的信息,能够提高三维目标检测的准确率。
图1示出根据本公开一些实施例的物体检测方法的流程图。在一些实施例中,下列图像修复方法由物体检测装置执行。
如图1所示,物体检测方法包括步骤S1-步骤S6。在步骤S1中,获取待检测物体的原始点云、待检测物体的图像,以及图像中的像素点的语义标签。
例如,原始点云和图像可以分别由不同传感器对同一物体扫描得到。传感器可以是激光雷达、单目相机和双目相机等。
在一些实施例中,待检测物体的原始点云通过激光雷达对待检测物体扫描得到、待检测物体的图像通过相机对待检测物体拍摄得到。
在一些实施例中,图像中的像素点的语义标签为对图像进行语义分割生成的每个像素点的类别。
例如,利用二维的语义分割模型,在像素级别上对二维图像进行语义分割,模型的输入为图像的红、绿、蓝三个通道的颜色信息(RGB),输出为每个像素的语义类别。从而能够将属于同一类的像素归为一类,得到每个像素所属类别的标签。比如说将属于人的像素都分成一类,属于汽车的像素分成一类。语义标签可以是例如“障碍物”、“非障碍物自行车”、“行人”和“背景”等。
在步骤S2中,根据原始点云,生成待检测物体的候选检测框。
在一些实施例中,根据原始点云,生成待检测物体的候选检测框,包括:利用第二特征提取网络,提取原始点云的特征;利用第二检测网络,根据原始点云的特征,生成待检测物体的候选检测框。
例如,先根据点云生成候选检测框。具体地,利用PointPillar(点柱式)、VoxelNet(体素网)等方法,对原始点云进行体素化操作,然后将体素化操作的结果送入三维的骨干网络进行特征的提取。最后,利用detection head(检测头),根据提取的特征生成proposal(三维检测框)。
例如,可以利用基于BEV(bird’s eye view,鸟瞰图)的三维目标检测方法,将激光雷达点云投影在X-Y坐标平面,并经过离散化处理后得到BEV特征图。BEV图像将点云用图像的形式呈现,同时保留了障碍物在三维世界的空间关系。然后,基于此特征图,利用检测器,生成候选检测框.
此外,还可以利用非极大值抑制(Non-Maximum Suppression,NMS)方法,去除冗余的三维检测框,得到候选检测框。
本公开使用点云数据提取有可能的候选框的前景点,通过这种方法,在后续将候选框的信息融合了图像数据后,再用于三维物体检测,能够减小不同传感器之间的标定参数不准所带来的检测误差。
在步骤S3中,根据候选检测框,生成虚拟点云。
在一些实施例中,根据候选检测框,生成虚拟点云,包括:生成候选检测框的栅格;根据候选检测框的栅格,生成虚拟点云。
例如,可以通过对候选检测框进行栅格化操作,得到虚拟点云。
在一些实施例中,虚拟点云中的每个点与候选检测框的栅格中的一个网格对应,虚拟点云的密度大于原始点云的密度。
图2A和2B示出本公开一些实施例的虚拟点云生成方法的示意图。
在得到候选检测框后,将每个候选检测框进行栅格化,如图2A所示,将候选检测框划分称为同等尺寸大小的格子,然后每个格子的坐标作为一个虚拟点,得到的虚拟点如图2B中的圆点所示。并且,可以通过调整格子的尺寸,令虚拟点云的密度大于原始点云的密度。
由于虚拟点云的密度大于原始点云的密度,使得虚拟点云也变得稠密,解决了图像和点云稠密度差距过大的问题,提高对物体的三维目标检测的准确率。
在步骤S4中,将原始点云与虚拟点云融合,得到融合点云。
在一些实施例中,将原始点云与虚拟点云融合,得到融合点云,包括:将原始点云与虚拟点云叠加,得到融合点云。
例如,由于点云是通过坐标来表示的,因此,把原始点云中的坐标点和虚拟点云中的坐标点合并在一起,就得到融合点云,融合点云的密度大于原始点云或虚拟点云。并且,生成融合点云中没有重合的点。融合点云为“原始点云+虚拟点云”。通过将原始点云与虚拟点云融合,得到融合点云,使得融合点云的密度大于原始点云的密度,解决了图像和点云稠密度差距过大的问题。此外,虚拟点云包括了候选检测框的信息,相当于在后续的三维检测过程中提供监督信息,提高对物体的三维目标检测的准确率。
在步骤S5中,通过将融合点云投影到图像坐标系中,确定融合点云中每个点对应的语义标签。
在一些实施例中,通过将融合点云投影到图像坐标系中,确定融合点云中每个点对应的语义标签,包括:通过将融合点云投影到图像坐标系中,确定融合点云中每个点与图像中每个像素点的对应关系;根据融合点云中每个点与图像中每个像素点的对应关系,以及图像中每个像素点的语义标签,确定融合点云中每个点对应的语义标签
图3示出了根据本公开一些实施例的确定融合点云中的点的语义标签的示意图。
如图3所示,首先利用相机的标定信息,将融合点云投影到图像坐标系中,投影公式如下:
Figure PCTCN2022136555-appb-000001
其中,P x,P y,P z分别代表融合点云的中的点在图像坐标系中的坐标,K代表相机的内参矩阵,R和T分别代表点云坐标系到图像坐标系的平移变换矩阵和旋转变换矩阵,这两个变换矩阵是根据相机和点云的标定外参确定的,z c表示图像深度信息,u,v表示将融合点云中的点投影到图像坐标系中后,得到的图像坐标系中像素点的横纵坐标。
根据融合点云中的点投影到图像坐标系后获得的坐标(u,v),得到三维的融合点云和二维图像之间的对应关系。将融合点云中的点与图像中的像素点对应起来,之前在图像语义分割的步骤中已经得到图像中的像素点的语义标签,从而能够推导出融合点云中的点与图像中的像素点的语义标签C的对应关系,使得融合点云中的每个点都有对应的语义标 签。
本公开利用标定方程将融合点云投影到图像上,通过融合点云在图像坐标系中的投影,得到融合点云的每个点的语义标签,从而将三维的点云中的点与二维的图像的语义分割结果对齐,避免了直接将原始点云数据和二维图像的原始RGB信息融合的方法导致的数据对齐问题。
此外,与将图像绘制到点云上的方法相比,本发明将点云投影到图像坐标系中,避免了由于图像深度信息缺失带来的投影错误。本公开将点云投影到图像上的目的是得到点云与语义标签的对应关系,后续送入三维检测模型的是语义标签和三维空间中的点云坐标,而不是图像坐标系中点云的投影坐标。因此,最后用于三维检测的数据包括点云的深度信息,提高了检测的准确率。
在步骤S6中,利用三维检测模型,根据融合点云以及融合点云中每个点对应的语义标签,生成待检测物体的三维检测框。
在一些实施例中,利用三维检测模型,根据融合点云以及融合点云中每个点对应的语义标签,生成待检测物体的三维检测框,包括:将融合点云中每个点的坐标与该点的语义标签拼接,得到点云和图像的融合信息;利用三维检测模型,根据点云和图像的融合信息,生成待检测物体的三维检测框。
在一些实施例中,将融合点云中每个点的坐标与该点的语义标签拼接,得到点云和图像的融合信息,包括:将融合点云中每个点的坐标与该点的语义标签串联成数组,作为点云和图像的融合信息。
例如,可以通过将融合点云中的点在空间坐标系中的坐标与图像语义分割中得到的语义标签拼接(cat)在一起,得到融合信息,此处的拼接指的是沿着已有的数据的某一维度进行拼接,操作后数据的总维度不变。例如,按照以下公式对融合点云中每个点的坐标与该点的语义标签拼接:
P i=(P x,P y,P z,C)
其中,P i为每个点的融合信息,(P x,P y,P z)为该点在空间坐标系中的坐标,C是该点的语义标签。
在一些实施例中,三维检测模型包括第一特征提取网络和检测网络,三维检测模型包括第一特征提取网络和检测网络,利用三维检测模型,根据点云和图像的融合信息,生成待检测物体的三维检测框,包括:利用第一特征提取网络,提取点云和图像的融合信息的 特征;利用第一检测网络,根据点云和图像的融合信息的特征,生成待检测物体的三维检测框。
三维检测模型包括用于特征提取的骨干网络和用于生成检测框的检测网络。利用常见的点云表征方法(例如PointPillar、voxelNet等),将点云和图像的融合信息处理成能够让网络去学习的数据结构,然后用一个特征提取网络(骨干网络)去学习处理后的两个传感器的点云和图像的融合信息,最后连接一个检测任务的检测头(detection head),生成三维检测框,得到最终的3D检测结果。
图4示出了根据本公开另一些实施例的生成三维检测框的流程图。
如图4所示,生成三维检测框的步骤如下:先对原始点云进行体素化操作,将体素化操作的结果送入三维的骨干网络,进行特征的提取,利用detection head(检测头),根据提取的特征生成proposal(三维检测框);利用非极大值抑制方法,去除冗余的三维检测框,得到候选检测框;通过对候选检测框进行栅格化操作,得到虚拟点云,将原始点云与虚拟点云融合,得到融合点云;通过对二维图像进行语义分割,得到语义标签;通过将融合点云投影到图像坐标系中,确定融合点云中每个点对应的语义标签;通过将融合点云中每个点的坐标与该点的语义标签拼接,得到点云和图像的融合信息;利用常见的点云表征方法,将点云和图像的融合信息处理成能够让网络去学习的数据结构;用一个特征提取网络(骨干网络)去学习处理后的两个传感器的点云和图像的融合信息;最后连接一个检测任务的检测头(detection head),生成三维检测框,得到最终的3D检测结果。
本公开将融合点云和图像的语义信息(语义标签)进行结合,进行基础编码操作,进入3D目标检测器,得到最终的检测结果。在最后的检测阶段只利用了一个3D检测模型得到3D检测结果,只使用一个模型能够降低模型的复杂度,有利于模型在车端的部署。
另外,本公开在最后用于物体检测的数据中,既保留了三维点云的深度信息,也包括二维图像的语义标签信息,为三维检测模型提供更多的信息,能够提高物体检测的准确率。
图5示出根据本公开一些实施例的物体检测装置的框图。
如图5所示,物体检测装置5包括获取模块51、候选检测框生成模块52、虚拟点云生成模块53、点云融合模块54、确定模块55和三维检测框生成模块56。
获取模块51,被配置为获取待检测物体的原始点云、待检测物体的图像,以及图像中的像素点的语义标签,例如执行如图1所示的步骤S1。
候选检测框生成模块52,被配置为根据原始点云,生成待检测物体的候选检测框,例 如执行如图1所示的步骤S2。
虚拟点云生成模块53,被配置为根据候选检测框,生成虚拟点云,例如执行如图1所示的步骤S3。
点云融合模块54,被配置为将原始点云与虚拟点云融合,得到融合点云,例如执行如图1所示的步骤S4。
确定模块55,被配置为通过将融合点云投影到图像坐标系中,确定融合点云中每个点对应的语义标签,例如执行如图1所示的步骤S5。
三维检测框生成模块56,被配置为利用三维检测模型,根据融合点云以及融合点云中每个点对应的语义标签,生成待检测物体的三维检测框,例如执行如图1所示的步骤S6。
根据本公开的物体检测装置,通过将点云投影到图像坐标系中,确定了点云中的点与二维的图像坐标系中每个点的语义标签的对应关系,能够解决三维点云与二维图像的对齐问题。
此外,本公开的物体检测装置只利用了一个三维检测模型,降低了模型的复杂度,有利于部署到车端。另一方面,在最后用于物体检测的数据中,既保留了三维点云的深度信息,也包括二维图像的语义标签信息,为三维检测模型提供更多的信息,能够提高物体检测的准确率。
图6示出根据本公开另一些实施例的物体检测装置的框图。
如图6所示,物体检测装置6包括存储器61;以及耦接至该存储器61的处理器62,存储器61用于存储执行物体检测方法对应实施例的指令。处理器62被配置为基于存储在存储器61中的指令,执行本公开中任意一些实施例中的物体检测方法。
本公开实施例提供了一种无人车,配置有物体检测装置5或物体检测装置6。根据本公开,能够提高物体检测的准确率,进而使得无人车基于检测到的物体进行避障,从而提高无人车行驶的安全性。
图7示出用于实现本公开一些实施例的计算机系统的框图。
如图7所示,计算机系统70可以通用计算设备的形式表现。计算机系统70包括存储器710、处理器720和连接不同系统组件的总线700。
存储器710例如可以包括系统存储器、非易失性存储介质等。系统存储器例如存储有操作系统、应用程序、引导装载程序(Boot Loader)以及其他程序等。系统存储器可以包括易失性存储介质,例如随机存取存储器(RAM)和/或高速缓存存储器。非易失性存储介 质例如存储有执行物体检测方法中的至少一种的对应实施例的指令。非易失性存储介质包括但不限于磁盘存储器、光学存储器、闪存等。
处理器720可以用通用处理器、数字信号处理器(DSP)、应用专用集成电路(ASIC)、现场可编程门阵列(FPGA)或其它可编程逻辑设备、分立门或晶体管等分立硬件组件方式来实现。相应地,诸如判断模块和确定模块的每个模块,可以通过中央处理器(CPU)运行存储器中执行相应步骤的指令来实现,也可以通过执行相应步骤的专用电路来实现。
总线700可以使用多种总线结构中的任意总线结构。例如,总线结构包括但不限于工业标准体系结构(ISA)总线、微通道体系结构(MCA)总线、外围组件互连(PCI)总线。
计算机系统70还可以包括输入输出接口730、网络接口740、存储接口750等。这些接口730、740、750以及存储器710和处理器720之间可以通过总线700连接。输入输出接口730可以为显示器、鼠标、键盘等输入输出设备提供连接接口。网络接口740为各种联网设备提供连接接口。存储接口750为软盘、U盘、SD卡等外部存储设备提供连接接口。
这里,参照根据本公开实施例的方法、装置和计算机程序产品的流程图和/或框图描述了本公开的各个方面。应当理解,流程图和/或框图的每个框以及各框的组合,都可以由计算机可读程序指令实现。
这些计算机可读程序指令可提供到通用计算机、专用计算机或其他可编程装置的处理器,以产生一个机器,使得通过处理器执行指令产生实现在流程图和/或框图中一个或多个框中指定的功能的装置。
这些计算机可读程序指令也可存储在计算机可读存储器中,这些指令使得计算机以特定方式工作,从而产生一个制造品,包括实现在流程图和/或框图中一个或多个框中指定的功能的指令。
本公开可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。
通过上述实施例中的物体检测方法及装置、计算机可读存储介质及无人车,提高了物体检测的准确率。
至此,已经详细描述了根据本公开的物体检测方法及装置、计算机可读存储介质及无人车。为了避免遮蔽本公开的构思,没有描述本领域所公知的一些细节。本领域技术人员根据上面的描述,完全可以明白如何实施这里公开的技术方案。

Claims (17)

  1. 一种物体检测方法,包括:
    获取待检测物体的原始点云、待检测物体的图像,以及图像中的像素点的语义标签;
    根据原始点云,生成待检测物体的候选检测框;
    根据候选检测框,生成虚拟点云;
    将原始点云与虚拟点云融合,得到融合点云;
    通过将融合点云投影到图像坐标系中,确定融合点云中每个点对应的语义标签;
    利用三维检测模型,根据融合点云以及融合点云中每个点对应的语义标签,生成待检测物体的三维检测框。
  2. 根据权利要求1所述的物体检测方法,其中,所述根据候选检测框,生成虚拟点云,包括:
    生成候选检测框的栅格;
    根据候选检测框的栅格,生成虚拟点云。
  3. 根据权利要求2所述的物体检测方法,其中,虚拟点云中的每个点与候选检测框的栅格中的一个网格对应,虚拟点云的密度大于原始点云的密度。
  4. 根据权利要求1所述的物体检测方法,其中,所述通过将融合点云投影到图像坐标系中,确定融合点云中每个点对应的语义标签,包括:
    通过将融合点云投影到图像坐标系中,确定融合点云中每个点与图像中每个像素点的对应关系;
    根据融合点云中每个点与图像中每个像素点的对应关系,以及图像中每个像素点的语义标签,确定融合点云中每个点对应的语义标签。
  5. 根据权利要求1所述的物体检测方法,其中,所述将原始点云与虚拟点云融合,得到融合点云,包括:
    将原始点云与虚拟点云叠加,得到融合点云。
  6. 根据权利要求1所述的物体检测方法,其中,所述利用三维检测模型,根据融合点云以及融合点云中每个点对应的语义标签,生成待检测物体的三维检测框,包括:
    将融合点云中每个点的坐标与该点的语义标签拼接,得到点云和图像的融合信息;
    利用三维检测模型,根据点云和图像的融合信息,生成待检测物体的三维检测框。
  7. 根据权利要求6所述的物体检测方法,其中,所述将融合点云中每个点的坐标与该点的语义标签拼接,得到点云和图像的融合信息,包括:
    将融合点云中每个点的坐标与该点的语义标签串联成数组,作为点云和图像的融合信息。
  8. 根据权利要求6所述的物体检测方法,其中,所述三维检测模型包括第一特征提取网络和检测网络,所述利用三维检测模型,根据点云和图像的融合信息,生成待检测物体的三维检测框,包括:
    利用第一特征提取网络,提取点云和图像的融合信息的特征;
    利用第一检测网络,根据点云和图像的融合信息的特征,生成待检测物体的三维检测框。
  9. 根据权利要求1所述的物体检测方法,其中,所述图像中的像素点的语义标签为对图像进行语义分割生成的每个像素点的类别。
  10. 根据权利要求1所述的物体检测方法,其中,所述待检测物体的原始点云通过激光雷达对待检测物体扫描得到、待检测物体的图像通过相机对待检测物体拍摄得到。
  11. 根据权利要求1所述的物体检测方法,其中,所述根据原始点云,生成待检测物体的候选检测框,包括:
    利用第二特征提取网络,提取原始点云的特征;
    利用第二检测网络,根据原始点云的特征,生成待检测物体的候选检测框。
  12. 一种物体检测装置,包括:
    获取模块,被配置为获取待检测物体的原始点云、待检测物体的图像,以及图像中的像素点的语义标签;
    候选检测框生成模块,被配置为根据原始点云,生成待检测物体的候选检测框;
    虚拟点云生成模块,被配置为根据候选检测框,生成虚拟点云;
    点云融合模块,被配置为将原始点云与虚拟点云融合,得到融合点云;
    确定模块,被配置为通过将融合点云投影到图像坐标系中,确定融合点云中每个点对应的语义标签;
    三维检测框生成模块,被配置为利用三维检测模型,根据融合点云以及融合点云中每个点对应的语义标签,生成待检测物体的三维检测框。
  13. 一种物体检测装置,包括:
    存储器;以及
    耦接至所述存储器的处理器,所述处理器被配置为基于存储在所述存储器的指令,执行如权利要求1至11任一项所述的物体检测方法。
  14. 一种计算机可读存储介质,其上存储有计算机程序指令,该指令被处理器执行时实现如权利要求1至11任一项所述的物体检测方法。
  15. 一种无人车,配置有如权利要求12或13所述的物体检测装置。
  16. 根据权利要求15所述的无人车,还包括激光雷达和相机中的至少一种,其中,所述激光雷达被配置为扫描待检测物体,得到待检测物体的原始点云,所述相机被配置为拍摄待检测物体,得到待检测物体的图像。
  17. 一种计算机程序,包括:
    指令,所述指令当由处理器执行时使所述处理器执行如权利要求1至11任一项所述的物体检测方法。
PCT/CN2022/136555 2022-04-01 2022-12-05 物体检测方法及装置、计算机可读存储介质及无人车 WO2023185069A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210339110.5 2022-04-01
CN202210339110.5A CN114648758A (zh) 2022-04-01 2022-04-01 物体检测方法及装置、计算机可读存储介质及无人车

Publications (1)

Publication Number Publication Date
WO2023185069A1 true WO2023185069A1 (zh) 2023-10-05

Family

ID=81994705

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/136555 WO2023185069A1 (zh) 2022-04-01 2022-12-05 物体检测方法及装置、计算机可读存储介质及无人车

Country Status (2)

Country Link
CN (1) CN114648758A (zh)
WO (1) WO2023185069A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117475110A (zh) * 2023-12-27 2024-01-30 北京市农林科学院信息技术研究中心 叶片的语义三维重建方法、装置、电子设备及存储介质
CN117740186A (zh) * 2024-02-21 2024-03-22 微牌科技(浙江)有限公司 隧道设备温度检测方法、装置和计算机设备
CN117740186B (zh) * 2024-02-21 2024-05-10 微牌科技(浙江)有限公司 隧道设备温度检测方法、装置和计算机设备

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114842456A (zh) * 2022-06-29 2022-08-02 北京科技大学 一种基于无人快递车的物流配送方法
CN116416223B (zh) * 2023-03-20 2024-01-09 北京国信会视科技有限公司 一种复杂装备调试方法、系统、电子设备及存储介质
CN116778262B (zh) * 2023-08-21 2023-11-10 江苏源驶科技有限公司 一种基于虚拟点云的三维目标检测方法和系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
No relevant documents disclosed *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117475110A (zh) * 2023-12-27 2024-01-30 北京市农林科学院信息技术研究中心 叶片的语义三维重建方法、装置、电子设备及存储介质
CN117475110B (zh) * 2023-12-27 2024-04-05 北京市农林科学院信息技术研究中心 叶片的语义三维重建方法、装置、电子设备及存储介质
CN117740186A (zh) * 2024-02-21 2024-03-22 微牌科技(浙江)有限公司 隧道设备温度检测方法、装置和计算机设备
CN117740186B (zh) * 2024-02-21 2024-05-10 微牌科技(浙江)有限公司 隧道设备温度检测方法、装置和计算机设备

Also Published As

Publication number Publication date
CN114648758A (zh) 2022-06-21

Similar Documents

Publication Publication Date Title
WO2023185069A1 (zh) 物体检测方法及装置、计算机可读存储介质及无人车
CN110264416B (zh) 稀疏点云分割方法及装置
WO2020135446A1 (zh) 一种目标定位方法和装置、无人机
JP6031554B2 (ja) 単眼カメラに基づく障害物検知方法及び装置
JP7205613B2 (ja) 画像処理装置、画像処理方法及びプログラム
WO2020206708A1 (zh) 障碍物的识别方法、装置、计算机设备和存储介质
CN108648194B (zh) 基于cad模型三维目标识别分割和位姿测量方法及装置
Bruls et al. The right (angled) perspective: Improving the understanding of road scenes using boosted inverse perspective mapping
CN110706269B (zh) 一种基于双目视觉slam的动态场景密集建模方法
CN110570457A (zh) 一种基于流数据的三维物体检测与跟踪方法
WO2021098567A1 (zh) 生成带深度信息的全景图的方法、装置及存储介质
WO2021114776A1 (en) Object detection method, object detection device, terminal device, and medium
CN113888458A (zh) 用于对象检测的方法和系统
CN112097732A (zh) 一种基于双目相机的三维测距方法、系统、设备及可读存储介质
US20160275359A1 (en) Information processing apparatus, information processing method, and computer readable medium storing a program
US20240029303A1 (en) Three-dimensional target detection method and apparatus
CN114761997A (zh) 目标检测方法、终端设备和介质
CN115171096A (zh) 一种基于rgb图像与激光点云融合的3d目标检测方法
TWI716874B (zh) 影像處理裝置、影像處理方法、及影像處理程式
CN114662587A (zh) 一种基于激光雷达的三维目标感知方法、装置及系统
Beltrán et al. A method for synthetic LiDAR generation to create annotated datasets for autonomous vehicles perception
KR20220043458A (ko) 설명가능인공지능 및 포인트클라우드 기반 3차원 터널시설물 안전점검 자동화 시스템
Xiao et al. Research on uav multi-obstacle detection algorithm based on stereo vision
CN115236693A (zh) 轨道侵限检测方法、装置、电子设备和存储介质
CN114648639A (zh) 一种目标车辆的检测方法、系统及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22934887

Country of ref document: EP

Kind code of ref document: A1