WO2022141720A1 - 一种基于三维热图的三维点云目标检测方法和装置 - Google Patents

一种基于三维热图的三维点云目标检测方法和装置 Download PDF

Info

Publication number
WO2022141720A1
WO2022141720A1 PCT/CN2021/074231 CN2021074231W WO2022141720A1 WO 2022141720 A1 WO2022141720 A1 WO 2022141720A1 CN 2021074231 W CN2021074231 W CN 2021074231W WO 2022141720 A1 WO2022141720 A1 WO 2022141720A1
Authority
WO
WIPO (PCT)
Prior art keywords
point cloud
dimensional
feature map
predicted
point
Prior art date
Application number
PCT/CN2021/074231
Other languages
English (en)
French (fr)
Inventor
陈延艺
夏启明
杜静
黄尚锋
陈延行
江文涛
Original Assignee
罗普特科技集团股份有限公司
罗普特(厦门)系统集成有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 罗普特科技集团股份有限公司, 罗普特(厦门)系统集成有限公司 filed Critical 罗普特科技集团股份有限公司
Publication of WO2022141720A1 publication Critical patent/WO2022141720A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/40Scaling of whole images or parts thereof, e.g. expanding or contracting
    • G06T3/4007Scaling of whole images or parts thereof, e.g. expanding or contracting based on interpolation, e.g. bilinear interpolation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Definitions

  • the present disclosure relates to the field of target detection, and in particular to a three-dimensional point cloud target detection method and device based on a three-dimensional heat map.
  • Object detection algorithms based on 3D point clouds are important tools for exploring unmanned tasks.
  • the task of the front object detection algorithm is to detect three classes: cars, pedestrians and bicycles.
  • Deep learning based on neural network has achieved fruitful results in object detection research.
  • most of them are aimed at image datasets with rich semantic information, but the 3D spatial structure features in image datasets are discarded. Therefore, models generated using 2D datasets cannot perform efficient detection when applied to real-world scenarios of autonomous driving.
  • the 3D point cloud data scanned by LiDAR meets the needs of vehicle front target detection.
  • Vehicle LiDAR performs laser scanning of the space in front of the vehicle to obtain 3D point clouds of objects, which provides data support for stable detection algorithms.
  • mainstream algorithms include point-based, voxel-based, and both point- and voxel-based algorithms.
  • Point-based algorithms use PointNet to learn point-wise features of point clouds. These methods obtain feature representations for each point and use convolutional neural networks to obtain richer semantic features.
  • Voxel-based algorithms use voxels to replace the point cloud representation by dividing the point cloud into voxels, which can solve the problem of disorder and sparsity.
  • point-based and voxel-based algorithms combine the advantages of point-based and voxel-based algorithms to form a fast and effective target detection algorithm.
  • the more popular target detection algorithms can be roughly divided into the following two categories: one is a single-stage target detection algorithm, and the other is a two-stage target detection algorithm.
  • PointRCNN the first two-stage 3D point cloud object detection network.
  • the PointRCNN network is a point-based algorithm that uses a bottom-up detection approach. The method obtains candidate regions by dividing foreground points, and then expands the range of candidate regions to obtain regions of interest. The detection frame is refined in the region of interest, and finally the prediction frame is obtained by the Anchor-free method. But the results obtained by PointRCNN after the pooling operation performed on the region of interest in the second stage are ambiguous. Shi Shaoshuai et al. found that the point cloud contains data supervision information, so they further proposed a new two-stage network: Part-A2 network.
  • the first stage of the Part-A2 network performs semantic segmentation on the raw point cloud using a V-Net-like network structure to obtain coarse candidate boxes.
  • a local pooling method is used to improve the ambiguity caused by PointRCNN pooling.
  • Shi Shaoshuai and others proposed PV-RCNN.
  • the network combines the advantages of point-based and voxel-based, making the detection results temporarily in the dominant position in the KITTI detection task.
  • PV-RCNN takes full advantage of voxel-based efficiency and speed in the first-level network, and uses 3D sparse convolution as the backbone to generate candidate boxes.
  • Voxelnet an end-to-end object detection network based on traditional 3D convolution. Voxelnet first voxelizes the raw point cloud in order to convert the disordered and sparse point cloud into regular voxels that can be learned. Voxelnet took the lead in proposing a single-stage object detection algorithm, but it is still unable to detect object positions quickly. But due to the existence of 3D convolution, the speed of most algorithms is greatly limited.
  • SECOND proposes a sparse convolution algorithm to solve this problem efficiently.
  • the sparse convolution algorithm is still a special type of 3D convolution, which cannot overcome the bottleneck of slow 3D convolution.
  • 2D convolution to solve the object detection problem of 3D point cloud is a new challenge.
  • PointPillars proposes a different approach to point cloud processing. According to the meaning of these points, the whole point cloud scene is compressed to the x-y plane to obtain a pseudo-image that can represent the semantic information of the whole point cloud scene. PointPillras uses 2D convolutions on fake images and achieves fast and efficient detection of objects in front of the vehicle.
  • SASSD proposes a new idea.
  • an auxiliary network is used to convert the voxel features in the single-stage detector into point-level features, supplemented by supervision signals, and the auxiliary network does not need to participate in the calculation during the model inference process. Achieve both speed and precision.
  • the purpose of the embodiments of the present application is to propose a method and apparatus for detecting a 3D point cloud target based on a 3D heat map to solve the technical problems mentioned in the above background technology section.
  • an embodiment of the present application provides a method for detecting a 3D point cloud target based on a 3D heat map, including the following steps:
  • the point cloud is input into the multi-layer sparse convolution to obtain the first feature map, and the bilinear interpolation method is used to interpolate the first feature map to the original spatial position of the point cloud to obtain the second feature map.
  • the feature maps are spliced, and the N*C dimension feature map is output, where N refers to the number of points in the point cloud, and C refers to the number of channels;
  • the N*C-dimensional feature map is input into the first fully connected neural network, and the N*4-dimensional feature map is output.
  • the output data includes the coordinates of the predicted point cloud and the predicted thermal response value, where the value of each point is The thermal response value represents the relationship between the position information of the point in the object and the center position of the object;
  • the N*C-dimensional feature map is input into the second fully-connected neural network, and a five-dimensional vector is output, which respectively represents the length, width, height, category and deflection angle of the predicted object;
  • the predicted thermal response value is regressed through a three-dimensional Gaussian distribution to obtain the mean value of the Gaussian distribution, that is, the predicted center coordinate of the object;
  • the length, width, height, category, deflection angle and center coordinates of the predicted object are combined to obtain the target detection result.
  • the coordinates of the center of the object are u(a, b, c), and the coordinates of any point in the object are q(m, n, t), Then the thermal response value of this point is:
  • the thermal response value becomes a quantitative index, which is convenient for calculation.
  • the real thermal response value of each point is calculated according to the spatial position of the point cloud, and the first fully connected neural network is trained by using the real thermal response value.
  • the 3D sparse convolution backbone network is used to process the original point cloud and predict the thermal response value of the original scene point cloud.
  • the step of determining the center coordinates specifically includes: assuming that the predicted thermal response value of the i-th point is Y i , and the coordinates of the point are q i (m i ,n i ,t i ), then the three-dimensional Gaussian distribution, the mean of the three-dimensional normal distribution where the point is located is ⁇ ( ⁇ 1 , ⁇ 2 , ⁇ 3 ), where the covariance matrix variance of the normal distribution is a diagonal matrix, and the value of the diagonal sets the hyperparameter ⁇ 1 , ⁇ 2 , ⁇ 3 ,
  • the three-dimensional normal distribution is regressed from the thermal response value, and the mean value of the three-dimensional normal distribution is the position of the center point of the object.
  • splicing the second feature map in the point cloud processing step specifically includes: adding feature values of corresponding positions of the second feature map.
  • the first fully connected neural network includes a first input layer, a first hidden layer, and a first output layer, and the fit converges using a smooth L1 loss function.
  • the predicted three-dimensional coordinates of the point cloud and the predicted thermal response value can be calculated accurately and quickly through the first fully connected neural network.
  • the second fully connected neural network includes a second input layer, a second hidden layer and a second output layer
  • the size of the prediction box is obtained using the smooth L1 loss function
  • the category is predicted using the focal loss loss.
  • the length, width, height, category and deflection angle of the object are predicted through the second fully connected neural network, and the calculation speed is fast.
  • an embodiment of the present application proposes a three-dimensional point cloud target detection device based on a three-dimensional heat map, including:
  • the point cloud processing module is configured to input the point cloud into multi-layer sparse convolution to obtain the first feature map, and use the bilinear interpolation method to sequentially interpolate the first feature map to the original spatial position of the point cloud to obtain the second feature map, Splicing the second feature map to output an N*C dimension feature map, where N refers to the number of points in the point cloud, and C represents the number of channels;
  • the first prediction module is configured to input the N*C dimension feature map into the first fully connected layer, and output the N*4 dimension feature map, and the output data includes the coordinates of the predicted point cloud and the predicted thermal response value, wherein each The thermal response value of the point represents the relationship between the position information of the point in the object and the center position of the object;
  • the second prediction module is configured to input the N*C-dimensional feature map into the second fully-connected layer, and output a five-dimensional vector, which respectively represents the length, width, height, category and deflection angle of the predicted object;
  • a center coordinate determination module configured to regress the predicted thermal response value through a three-dimensional Gaussian distribution to obtain a mean value of the Gaussian distribution, that is, the predicted center coordinate of the object;
  • the target detection result output module is configured to combine the length, width, height, category, deflection angle and center coordinates of the predicted object to obtain the target detection result.
  • embodiments of the present application provide an electronic device, including: one or more processors; a storage device for storing one or more programs, when the one or more programs are executed by the one or more processors , causing one or more processors to implement the method as described in any implementation manner of the first aspect.
  • an embodiment of the present application provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the method described in any implementation manner of the first aspect.
  • the present disclosure proposes a three-dimensional point cloud target detection method and device based on a three-dimensional heat map.
  • the present disclosure is based on the idea of a three-dimensional heat map for innovation and promotion, and combines the three-dimensional thermal response value to form a heat map with three-dimensional target detection for the first time.
  • the present disclosure uses a three-dimensional sparse convolution backbone network to process the original point cloud, predicts the thermal response value of the point cloud of the original scene, and then regresses the three-dimensional normal distribution from the thermal response value.
  • the mean of the three-dimensional normal distribution is the center of the object point location.
  • the present disclosure retains the spatial structure information of the original point cloud, and experiments show that the target detection accuracy and classification ability are improved compared to the current high-speed target detection methods.
  • FIG. 1 is an exemplary device architecture diagram to which an embodiment of the present application may be applied;
  • FIG. 2 is a schematic flowchart of a method for detecting a 3D point cloud target based on a 3D heat map according to an embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of a backbone network of a three-dimensional heatmap-based 3D point cloud target detection method according to an embodiment of the present disclosure
  • FIG. 4 is a schematic diagram of a heat map of an object of a method for detecting a 3D point cloud target based on a 3D heat map according to an embodiment of the present disclosure
  • FIG. 5 is a schematic diagram of a network structure of a second fully connected neural network of a method for detecting a 3D point cloud target based on a 3D heat map according to an embodiment of the present disclosure
  • FIG. 6 is a schematic diagram of a three-dimensional point cloud target detection device based on a three-dimensional heat map according to an embodiment of the present disclosure
  • FIG. 7 is a schematic structural diagram of a computer device suitable for implementing the electronic device according to the embodiment of the present application.
  • FIG. 1 shows an exemplary apparatus architecture 100 to which the 3D heatmap-based 3D point cloud target detection method or the 3D heatmap-based 3D point cloud target detection apparatus according to the embodiments of the present application can be applied.
  • the apparatus architecture 100 may include terminal devices 101 , 102 , and 103 , a network 104 and a server 105 .
  • the network 104 is a medium used to provide a communication link between the terminal devices 101 , 102 , 103 and the server 105 .
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
  • the user can use the terminal devices 101, 102, 103 to interact with the server 105 through the network 104 to receive or send messages and the like.
  • Various applications may be installed on the terminal devices 101 , 102 and 103 , such as data processing applications, file processing applications, and the like.
  • the terminal devices 101, 102, and 103 may be hardware or software.
  • the terminal devices 101, 102, 103 can be various electronic devices, including but not limited to smart phones, tablet computers, laptop computers, desktop computers, and the like.
  • the terminal devices 101, 102, and 103 are software, they can be installed in the electronic devices listed above. It can be implemented as a plurality of software or software modules (eg, software or software modules for providing distributed services), or can be implemented as a single software or software module. There is no specific limitation here.
  • the server 105 may be a server that provides various services, such as a background data processing server that processes files or data uploaded by the terminal devices 101 , 102 , and 103 .
  • the background data processing server can process the acquired files or data to generate processing results.
  • the 3D point cloud target detection method based on the 3D heatmap provided by the embodiment of the present application may be executed by the server 105 or by the terminal devices 101 , 102 , and 103 .
  • the point cloud target detection apparatus may be installed in the server 105 or in the terminal devices 101 , 102 , and 103 .
  • terminal devices, networks and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks and servers according to implementation needs.
  • the above-mentioned apparatus architecture may not include a network, but only need a server or a terminal device.
  • FIG. 2 shows that an embodiment of the present application discloses a method for detecting a 3D point cloud target based on a 3D heat map, which includes the following steps:
  • Step S1 input the point cloud into multi-layer sparse convolution to obtain the first feature map, and use the bilinear interpolation method to interpolate the first feature map to the original spatial position of the point cloud in turn to obtain the second feature map.
  • Splicing is performed to output an N*C dimensional feature map, where N refers to the number of points in the point cloud and C refers to the number of channels.
  • the schematic diagram of the backbone network formed by multi-layer sparse convolution is shown in Figure 3.
  • the point cloud data is obtained through multi-layer sparse convolution to obtain first feature maps of different scales, and then use
  • the bilinear interpolation method interpolates the first feature map to the spatial position of the original point cloud in turn.
  • the spatial position of the original point cloud is the position of the point cloud in the world coordinate system with the radar scanner as the origin, and the second feature obtained by interpolation
  • the graphs are spliced to obtain an N*C dimensional feature map.
  • the splicing specifically includes: adding the feature values of the corresponding positions of the second feature map.
  • C is preferably 64 in network design.
  • Step S2 the N*C dimension feature map is input into the first fully connected neural network, and the N*4 dimension feature map is output, and the output data includes the coordinates of the predicted point cloud and the predicted thermal response value, wherein the described value of each point is The thermal response value represents the relationship between the position information of the point in the object and the center position of the object.
  • the coordinates of the center of the object are set as u(a,b,c), and the coordinates of any point in the object are set as q(m,n,t) , then the thermal response value of this point is:
  • the thermal response value becomes a quantitative index, which is convenient for calculation.
  • the real thermal response value of each point is calculated according to the spatial position of the point cloud, and the first fully connected neural network is trained by using the real thermal response value.
  • the thermal response value of each point reflects the position information of the point in the object where it is located.
  • the value range of the thermal response value is (0, 1], and the thermal response value from small to large indicates the distance between the point and the center of the object. From far to near, when the thermal response value is 1, the coordinates of the point are the center coordinates of the object, so the thermal map of the object can be formed according to the thermal response value corresponding to each point cloud position, as shown in the figure. Refer to formula (1) for the calculation formula of the real thermal response value of the point.
  • the predicted thermal response value is supervised by the calculated real thermal response value. Therefore, the first fully connected neural network After training, the accurate predicted thermal response value is obtained.
  • the original point cloud is processed by the three-dimensional sparse convolution backbone network, and the thermal response value of the original scene point cloud is predicted.
  • the input dimension of the first fully-connected network is set to be C
  • the output dimension is set to 4, so that an N*4-dimensional feature map can be obtained
  • the output information is the predicted three-dimensional coordinates of the point cloud (x, y, z ) and the corresponding predicted thermal response values.
  • the first fully connected neural network includes a first input layer, a first hidden layer and a first output layer, and uses a smooth L1 loss function to fit and converge. The predicted three-dimensional coordinates of the point cloud and the predicted thermal response value can be calculated accurately and quickly through the first fully connected neural network.
  • Step S3 the N*C-dimensional feature map is input into the second fully connected neural network, and a five-dimensional vector is output, representing the length, width, height, category, and deflection angle of the predicted object, respectively.
  • the second fully-connected neural network includes a second input layer, a second hidden layer and a second output layer, the smooth L1 loss function is used to obtain the size of the prediction frame, and the focal loss loss is used to predict the category.
  • the network structure of the second fully connected neural network is shown in Figure 5, and the second hidden layer has only one layer. The length, width, height, category and deflection angle of the object are predicted through the second fully connected neural network, and the calculation speed is fast.
  • step S4 the predicted thermal response value is regressed through a three-dimensional Gaussian distribution to obtain the mean value of the Gaussian distribution, that is, the predicted center coordinate of the object.
  • step S4 specifically includes: setting the predicted thermal response value of the i-th point as Y i , and the coordinates of the point as q i (m i , n i , t i ), then the three-dimensional Gaussian distribution, the mean of the three-dimensional normal distribution where the point is located is ⁇ ( ⁇ 1 , ⁇ 2 , ⁇ 3 ), where the covariance matrix variance of the normal distribution is a diagonal matrix, and the value of the diagonal is set to the hyperparameter ⁇ 1 , ⁇ 2 , ⁇ 3 ,
  • the three-dimensional normal distribution is regressed from the heat map, and the mean of the three-dimensional normal distribution is the position of the center point of the object.
  • Step S5 combine the predicted length, width, height, category, deflection angle, and center coordinates of the object to obtain a target detection result.
  • the target detection result can be obtained by combining the length, width, height, category and deflection angle of the object predicted in step S3 with the center coordinates predicted in step S4.
  • the present application provides an embodiment of a three-dimensional point cloud target detection device based on a three-dimensional heat map, which is implemented with the method shown in FIG. 2 .
  • the apparatus can be specifically applied to various electronic devices.
  • a 3D point cloud target detection device based on a 3D heat map proposed by an embodiment of the present application includes:
  • the point cloud processing module 1 is configured to input the point cloud into multi-layer sparse convolution to obtain a first feature map, and use bilinear interpolation to sequentially interpolate the first feature map to the original spatial position of the point cloud to obtain a second feature map , splicing the second feature map, and outputting the N*C dimension feature map, where N refers to the number of points in the point cloud, and C refers to the number of channels;
  • the first prediction module 2 is configured to input the N*C dimension feature map into the first fully connected layer, and output the N*4 dimension feature map, and the output data includes the coordinates of the predicted point cloud and the predicted thermal response value, wherein each The thermal response value of each point represents the relationship between the position information of the point in the object and the center position of the object;
  • the second prediction module 3 is configured to input the N*C-dimensional feature map into the second fully connected layer, and output a five-dimensional vector, which respectively represents the length, width, height, category and deflection angle of the predicted object;
  • a center coordinate determination module 4 configured to regress the predicted thermal response value through a three-dimensional Gaussian distribution to obtain a mean value of the Gaussian distribution, that is, the predicted center coordinate of the object;
  • the target detection result output module 5 is configured to combine the predicted length, width, height, category, deflection angle and center coordinates of the object to obtain the target detection result.
  • the present disclosure proposes a three-dimensional point cloud target detection method and device based on a three-dimensional heat map.
  • the present disclosure is based on the idea of a three-dimensional heat map for innovation and promotion, and combines the three-dimensional thermal response value to form a heat map with three-dimensional target detection for the first time.
  • the present disclosure uses a three-dimensional sparse convolution backbone network to process the original point cloud, predicts the thermal response value of the point cloud of the original scene, and then regresses the three-dimensional normal distribution from the thermal response value.
  • the mean of the three-dimensional normal distribution is the center of the object point location.
  • the present disclosure retains the spatial structure information of the original point cloud, and experiments show that the target detection accuracy and classification ability are improved compared to the current high-speed target detection methods.
  • FIG. 7 it shows a schematic structural diagram of a computer apparatus 700 suitable for implementing the electronic device (for example, the server or terminal device shown in FIG. 1 ) according to the embodiment of the present application.
  • the electronic device shown in FIG. 7 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present application.
  • a computer apparatus 700 includes a central processing unit (CPU) 701 and a graphics processing unit (GPU) 702, which can be loaded into random access according to a program stored in a read only memory (ROM) 703 or from a storage section 709 Various appropriate actions and processes are executed by the programs in the memory (RAM) 704 .
  • RAM 704 various programs and data required for the operation of the device 700 are also stored.
  • the CPU 701, GPU 702, ROM 703, and RAM 704 are connected to each other through a bus 705.
  • An input/output (I/O) interface 706 is also connected to bus 705 .
  • the following components are connected to the I/O interface 706: an input section 707 including a keyboard, a mouse, etc.; an output section 708 including a liquid crystal display (LCD), etc., and a speaker, etc.; a storage section 709 including a hard disk, etc.; The communication part 710 of a network interface card such as a modem, etc. The communication section 710 performs communication processing via a network such as the Internet.
  • a driver 711 may also be connected to the I/O interface 706 as desired.
  • a removable medium 712 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 711 as needed so that a computer program read therefrom is installed into the storage section 709 as needed.
  • embodiments of the present disclosure include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method illustrated in the flowchart.
  • the computer program may be downloaded and installed from the network via the communication portion 710, and/or installed from the removable medium 712.
  • CPU central processing unit
  • GPU graphics processing unit
  • the computer-readable medium described in this application may be a computer-readable signal medium or a computer-readable medium, or any combination of the above two.
  • the computer readable medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor device, apparatus or device, or a combination of any of the above. More specific examples of computer readable media may include, but are not limited to, electrical connections having one or more wires, portable computer disks, hard disks, random access memory (RAM), read only memory (ROM), erasable programmable Read only memory (EPROM or flash memory), fiber optics, portable compact disk read only memory (CD-ROM), optical storage devices, magnetic storage devices, or any suitable combination of the foregoing.
  • a computer-readable medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution apparatus, apparatus, or device.
  • a computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, carrying computer-readable program code therein. Such propagated data signals may take a variety of forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing.
  • a computer-readable signal medium can also be any computer-readable medium other than a computer-readable medium that can transmit, propagate, or transport the program for use by or in connection with the instruction execution apparatus, apparatus, or device.
  • Program code embodied on a computer readable medium may be transmitted using any suitable medium including, but not limited to, wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
  • Computer program code for performing the operations of the present application may be written in one or more programming languages, including object-oriented programming languages—such as Java, Smalltalk, C++, but also conventional procedural programming language - such as "C" language or similar programming language.
  • the program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer, or entirely on the remote computer or server.
  • the remote computer may be connected to the user's computer through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computer (eg, using an Internet service provider through Internet connection).
  • LAN local area network
  • WAN wide area network
  • Internet service provider e.g., using an Internet service provider through Internet connection.
  • each block in the flowchart or block diagrams may represent a module, segment, or portion of code that contains one or more logical functions for implementing the specified functions executable instructions.
  • the functions noted in the blocks may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block of the block diagrams and/or flowchart illustrations, and combinations of blocks in the block diagrams and/or flowchart illustrations can be implemented by dedicated hardware-based devices that perform the specified functions or operations , or can be implemented in a combination of dedicated hardware and computer instructions.
  • the modules involved in the embodiments of the present application may be implemented in a software manner, and may also be implemented in a hardware manner.
  • the described modules may also be provided in a processor.
  • the present application also provides a computer-readable medium.
  • the computer-readable medium may be included in the electronic device described in the above embodiments; it may also exist alone without being assembled into the electronic device. middle.
  • the above-mentioned computer-readable medium carries one or more programs, and when the above-mentioned one or more programs are executed by the electronic device, the electronic device is made to: input the point cloud into the multi-layer sparse convolution to obtain the first feature map, and use the double-line
  • the linear interpolation method sequentially interpolates the first feature map to the original spatial position of the point cloud to obtain the second feature map, splices the second feature map, and outputs the N*C dimension feature map, where N refers to the number of points in the point cloud, and C represents the channel Input the N*C-dimensional feature map into the first fully connected neural network, and output the N*4-dimensional feature map.
  • the output data includes the coordinates of the predicted point cloud and the predicted thermal response value, wherein the thermal response value of each point is The response value represents the relationship between the position information of the point in the object and the center position of the object; the N*C dimension feature map is input into the second fully connected neural network, and a five-dimensional vector is output, representing the length and width of the predicted object respectively. Height, category, and deflection angle; regress the predicted thermal response value through a three-dimensional Gaussian distribution to obtain the mean value of the Gaussian distribution, that is, the center coordinates of the predicted object; combine the predicted length, width, height, category, deflection angle, and center coordinates , get the target detection result.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

公开了一种基于三维热图的三维点云目标检测方法和装置,将点云输入多层稀疏卷积得到第一特征图,利用双线性插值法将第一特征图依次插值到点云原始的空间位置,得到第二特征图,将第二特征图进行拼接,输出N*C维特征图;将N*C维特征图输入第一全连接神经网络,输出N*4维特征图,输出数据包括预测的点云的坐标以及预测的热力响应值;将N*C维特征图输入第二全连接神经网络,输出五维向量,分别表示预测的物体的长宽高、类别以及偏转角度;将预测的热力响应值通过三维高斯分布进行回归,得到高斯分布的均值,即预测的物体的中心坐标;将长宽高、类别、偏转角度以及中心坐标进行合并,得到目标检测结果。本公开能够提高目标的检测精度和分类能力。

Description

一种基于三维热图的三维点云目标检测方法和装置
相关申请
本申请要求保护在2020年12月31日提交的申请号为202011633077.4的中国专利申请的优先权,该申请的全部内容以引用的方式结合到本文中。
技术领域
本公开涉及目标检测领域,具体涉及一种基于三维热图的三维点云目标检测方法和装置。
背景技术
基于3D点云的目标检测算法是探索无人驾驶任务的重要工具。车前目标检测算法的任务是检测三类:汽车,行人和自行车。基于神经网络的深度学习在目标检测研究中取得了丰硕的成果。在传统的目标识别算法中,大多数都是针对具有丰富语义信息的图像数据集,但是图像数据集中的三维空间结构特征却被丢弃了。因此,使用二维数据集生成的模型在应用于自动驾驶的真实场景时并不能完成高效的检测。LiDAR扫描的三维点云数据满足了车前目标检测的需求。
车载LiDAR对车辆前方的空间进行激光扫描,以获得物体的三维点云,这为稳定的检测算法提供了数据支持。但是点云的混乱和稀疏给点云处理带来了新的挑战。当面对基于点云的车辆检测问题时,主流算法包括基于点、基于体素和同时基于点以及体素的算法。基于点的算法使用PointNet来学习点云的逐点特征。这些方法获得每个点的特征表示,并使用卷积神经网络获得更丰富的语义特征。基于体素的算法使用体素通过将点云划分为体素来替换点云表示,可以解决无序和稀疏的问题。同时基于点以及体素的算法结合了基于点和基于素的优点,从而形成了快速有效的目标检测算法。
目前,比较流行的目标检测算法大致可分为以下两类:一类为单阶段目标检测算法,还有一类是两阶段目标检测算法。
2019年,石少帅等人提出了PointRCNN,这是第一个两阶段3D点云对象检测网络。 PointRCNN网络是一种基于点的算法,它使用了自下而上的检测方法。该方法通过划分前景点来获得候选区域,然后扩展候选区域的范围以获得感兴趣的区域。在感兴趣的区域中对检测框架进行提炼,最后采用Anchor-free的方法获得预测框。但是PointRCNN在第二阶段对感兴趣区域执行的池化操作后得到的结果是模棱两可的。史少帅等人发现点云包含数据监管信息,因此他们进一步提出了一个新的两阶段网络:Part-A2网络。Part-A2网络的第一阶段使用类似于V-Net的网络结构在原始点云上执行语义分段,以获得粗糙的候选框。在第二阶段,使用局部池化方法来改善由PointRCNN池化引起的歧义。2020年,石少帅等人提出了PV-RCNN。该网络结合了基于点和基于体素的优点,使检测结果暂时处于KITTI检测任务中的优势位置。PV-RCNN在一级网络中充分利用了基于体素的高效性和速度,并以3D稀疏卷积为骨干生成候选框。在第二阶段,使用超级点代替原始点云,并在保留原始点云的空间结构的同时大大减少了计算量。通过学习超级积分的特征来完善该候选框。最后,使用Anchor-free方法生成高质量的预测结果。
相对而言,二级网络的检测精度高于一级网络,但运行速度较慢。因此难以实现实时检测。因此,高精度单级网络的研究至关重要。2017年提出了基于传统3D卷积的端到端对象检测网络Voxelnet。Voxelnet首先对原始点云进行体素化,以便将无序且稀疏的点云转换为可以学习的常规体素。Voxelnet率先提出了单阶段目标检测算法,但仍无法快速检测目标位置。但是由于3D卷积的存在,大多数算法的速度受到极大限制。并且点云的稀疏性导致大量零值。这些零值将产生很多不必要的计算量。SECOND提出了一种稀疏卷积算法来有效解决该问题。稀疏卷积算法仍然是3D卷积的一种特殊类型,它不能克服慢速3D卷积的瓶颈。使用2D卷积解决3D点云的目标检测问题是一个新的挑战。PointPillars提出了一种不同的点云处理方法。根据这些点的意义,将整个点云场景压缩到x-y平面,以获得可以表示整个点云场景的语义信息的伪图像。PointPillras在伪图像上使用了2D卷积,并且实现了对车辆前方物体的快速有效检测。SASSD提出了全新的思路,在模型训练过程中利用一个辅助网络将单阶段检测器中的体素特征转化为点级特征,并辅以监督信号,而模型推理过程中辅助网络无需参与计算,最终实现速度和精度的兼得。
公开内容
针对上述提到现有技术中基于3D稀疏卷积的三维目标检测算法的原始数据经过多层 稀疏卷积后会导致空间结构信息的损失、目标检测精度较低等问题。本申请的实施例的目的在于提出了一种基于三维热图的三维点云目标检测方法和装置来解决以上背景技术部分提到的技术问题。
第一方面,本申请的实施例提供了一种基于三维热图的三维点云目标检测方法,包括以下步骤:
点云处理步骤,将点云输入多层稀疏卷积得到第一特征图,利用双线性插值法将第一特征图依次插值到点云原始的空间位置,得到第二特征图,将第二特征图进行拼接,输出N*C维特征图,其中N指点云的点数,C表示通道数;
第一预测步骤,将N*C维特征图输入第一全连接神经网络,输出N*4维特征图,输出数据包括预测的点云的坐标以及预测的热力响应值,其中,每个点的所述热力响应值表示该点在所在物体中的位置信息与所述物体的中心位置的关系;
第二预测步骤,将N*C维特征图输入第二全连接神经网络,输出五维向量,分别表示预测的物体的长宽高、类别以及偏转角度;
中心坐标确定步骤,将预测的热力响应值通过三维高斯分布进行回归,得到高斯分布的均值,即预测的物体的中心坐标;以及
目标检测结果输出步骤,将预测的物体的长宽高、类别、偏转角度以及中心坐标进行合并,得到目标检测结果。
在一些实施例中,在以雷达扫描仪为原点的世界坐标系下,设物体的中心坐标为u(a,b,c),物体中任意一点的坐标为q(m,n,t),则该点的热力响应值为:
Figure PCTCN2021074231-appb-000001
通过以上公式将热力响应值成为量化的指标,方便进行计算。
在一些实施例中,在第一预测步骤中通过点云的空间位置计算出每个点的真实的热力响应值,并采用真实的热力响应值对第一全连接神经网络进行训练。利用三维稀疏卷积的主干网络对原始点云进行处理,预测出原始场景点云的热力响应值。
在一些实施例中,中心坐标确定步骤中具体包括:设第i个点的预测的热力响应值为Y i,该点的坐标为q i(m i,n i,t i),则由三维高斯分布,该点所在的三维正态分布的均值为μ(μ 123),其中,正太分布的协方差矩阵方差取对角矩阵,对角线的取值设置超参数σ 1,σ 2, σ 3
则有:
Figure PCTCN2021074231-appb-000002
Figure PCTCN2021074231-appb-000003
Figure PCTCN2021074231-appb-000004
Figure PCTCN2021074231-appb-000005
则对原式进行展开有:
Figure PCTCN2021074231-appb-000006
对于
Figure PCTCN2021074231-appb-000007
有:
Figure PCTCN2021074231-appb-000008
运用最小二乘法:
Figure PCTCN2021074231-appb-000009
回归出高斯分布的均值,即预测的物体的中心坐标。
从热力响应值中回归出三维正太分布,三维正太分布的均值即为物体的中心点位置。
在一些实施例中,在点云处理步骤中将第二特征图进行拼接具体包括:将第二特征图对应位置的特征值进行相加。
在一些实施例中,第一全连接神经网络包括第一输入层,第一隐藏层和第一输出层,使用smooth L1损失函数来拟合收敛。通过第一全连接神经网络可以准确并快速地计算出预测的点云的三维坐标以及预测的热力响应值。
在一些实施例中,第二全连接神经网络包括第二输入层,第二隐藏层和第二输出层,使用smooth L1损失函数得到预测框的大小,使用focal loss损失来预测类别。通过第二全连接神经网络预测出物体的长宽高、类别以及偏转角度,计算速度快。
第二方面,本申请的实施例中提出了一种基于三维热图的三维点云目标检测装置,包括:
点云处理模块,被配置为将点云输入多层稀疏卷积得到第一特征图,利用双线性插值法将第一特征图依次插值到点云原始的空间位置,得到第二特征图,将第二特征图进行拼接,输出N*C维特征图,其中N指点云的点数,C表示通道数;
第一预测模块,被配置为将N*C维特征图输入第一全连接层,输出N*4维特征图,输出数据包括预测的点云的坐标以及预测的热力响应值,其中,每个点的所述热力响应值表示该点在所在物体中的位置信息与所述物体的中心位置的关系;
第二预测模块,被配置为将N*C维特征图输入第二全连接层,输出五维向量,分别表示预测的物体的长宽高、类别以及偏转角度;
中心坐标确定模块,被配置为将预测的热力响应值通过三维高斯分布进行回归,得到高斯分布的均值,即预测的物体的中心坐标;以及
目标检测结果输出模块,被配置为将预测的物体的长宽高、类别、偏转角度以及中心坐标进行合并,得到目标检测结果。
第三方面,本申请实施例提供了一种电子设备,包括:一个或多个处理器;存储装置,用于存储一个或多个程序,当一个或多个程序被一个或多个处理器执行,使得一个或多个处理器实现如第一方面中任一实现方式描述的方法。
第四方面,本申请实施例提供了一种计算机可读存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现如第一方面中任一实现方式描述的方法。
本公开提出了一种基于三维热图的三维点云目标检测方法和装置,本公开是基于三维热图的思想进行创新与推广,首次将三维热力响应值形成热图与三维目标检测进行结合。本公开利用三维稀疏卷积的主干网络对原始点云进行处理,预测出原始场景点云的热力响应值,再从热力响应值中回归出三维正太分布,三维正太分布的均值即为物体的中心点位置。再结合N*C维特征图预测的物体长宽高、类别以及物体的偏转角结果,即可获得物体的位置以及类别的检测。本公开保留了原始点云的空间结构信息,实验表明在目标的检测精度以及分类能力上相对于当前高速目标检测方法都有提高。
附图说明
为了更清楚地说明本公开实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简要介绍,显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域的普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请的一个实施例可以应用于其中的示例性装置架构图;
图2为本公开的实施例的基于三维热图的三维点云目标检测方法的流程示意图;
图3为本公开的实施例的基于三维热图的三维点云目标检测方法的主干网络的示意图;
图4为本公开的实施例的基于三维热图的三维点云目标检测方法的物体的热力图的示意图;
图5为本公开的实施例的基于三维热图的三维点云目标检测方法的第二全连接神经网络的网络结构示意图;
图6为本公开的实施例的基于三维热图的三维点云目标检测装置的示意图;
图7是适于用来实现本申请实施例的电子设备的计算机装置的结构示意图。
具体实施方式
为了使本公开的目的、技术方案和优点更加清楚,下面将结合附图对本公开作进一步地详细描述,显然,所描述的实施例仅仅是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其它实施例,都属于本公开保护的范围。
图1示出了可以应用本申请实施例的基于三维热图的三维点云目标检测方法或基于三维热图的三维点云目标检测装置的示例性装置架构100。
如图1所示,装置架构100可以包括终端设备101、102、103,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。
用户可以使用终端设备101、102、103通过网络104与服务器105交互,以接收或发送消息等。终端设备101、102、103上可以安装有各种应用,例如数据处理类应用、文件处理类应用等。
终端设备101、102、103可以是硬件,也可以是软件。当终端设备101、102、103 为硬件时,可以是各种电子设备,包括但不限于智能手机、平板电脑、膝上型便携计算机和台式计算机等等。当终端设备101、102、103为软件时,可以安装在上述所列举的电子设备中。其可以实现成多个软件或软件模块(例如用来提供分布式服务的软件或软件模块),也可以实现成单个软件或软件模块。在此不做具体限定。
服务器105可以是提供各种服务的服务器,例如对终端设备101、102、103上传的文件或数据进行处理的后台数据处理服务器。后台数据处理服务器可以对获取的文件或数据进行处理,生成处理结果。
需要说明的是,本申请实施例所提供的基于三维热图的三维点云目标检测方法可以由服务器105执行,也可以由终端设备101、102、103执行,相应地,基于三维热图的三维点云目标检测装置可以设置于服务器105中,也可以设置于终端设备101、102、103中。
应该理解,图1中的终端设备、网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。在所处理的数据不需要从远程获取的情况下,上述装置架构可以不包括网络,而只需服务器或终端设备。
图2示出了本申请的实施例公开了一种基于三维热图的三维点云目标检测方法,包括以下步骤:
步骤S1,将点云输入多层稀疏卷积得到第一特征图,利用双线性插值法将第一特征图依次插值到点云原始的空间位置,得到第二特征图,将第二特征图进行拼接,输出N*C维特征图,其中N指点云的点数,C表示通道数。
在具体的实施例中,多层稀疏卷积形成的主干网络的示意图如图3所示,在主干网络中,将点云数据通过多层稀疏卷积得到不同尺度的第一特征图,再利用双线性插值法将第一特征图依次插值到原始点云的空间位置,原始点云的空间位置是以雷达扫描仪为原点的世界坐标系下点云的位置,将插值得到的第二特征图进行拼接,得到N*C维特征图。在优选的实施例中,拼接具体包括:将第二特征图对应位置的特征值进行相加。在网络设计中C优选为64。
步骤S2,将N*C维特征图输入第一全连接神经网络,输出N*4维特征图,输出数据包括预测的点云的坐标以及预测的热力响应值,其中,每个点的所述热力响应值表示该点在所在物体中的位置信息与所述物体的中心位置的关系。
在具体的实施例中,在以雷达扫描仪为原点的世界坐标系下,设物体的中心坐标为u(a,b,c),物体中任意一点的坐标为q(m,n,t),则该点的热力响应值为:
Figure PCTCN2021074231-appb-000010
通过以上公式将热力响应值成为量化的指标,方便进行计算。
在具体的实施例中,在步骤S2中通过点云的空间位置计算出每个点的真实的热力响应值,并采用真实的热力响应值对第一全连接神经网络进行训练。其中,每个点的热力响应值反映出该点在所在物体中的位置信息,热力响应值的取值范围为(0,1],热力响应值从小到大表示该点距离所在物体的中心位置由远到近,当热力响应值为1时,该点的坐标为物体的中心坐标,因此可以根据每个点云位置对应的热力响应值形成该物体的热力图,如图所示。每个点的真实的热力响应值的计算公式参考公式(1),在进行热力响应值预测时,通过计算得到的真实的热力响应值对预测的热力响应值进行监督,因此对第一全连接神经网络进行训练,得到准确的预测的热力响应值。利用三维稀疏卷积的主干网络对原始点云进行处理,预测出原始场景点云的热力响应值。
在具体的实施例中,设置第一全连接网络的输入维度为C,输出维度为4,即可得到N*4维特征图,输出信息为预测的点云三维的坐标(x,y,z)以及对应的预测的热力响应值。在具体的实施例中,第一全连接神经网络包括第一输入层,第一隐藏层和第一输出层,使用smooth L1损失函数来拟合收敛。通过第一全连接神经网络可以准确并快速地计算出预测的点云的三维坐标以及预测的热力响应值。
步骤S3,将N*C维特征图输入第二全连接神经网络,输出五维向量,分别表示预测的物体的长宽高、类别以及偏转角度。
在具体的实施例中,第二全连接神经网络包括第二输入层,第二隐藏层和第二输出层,使用smooth L1损失函数得到预测框的大小,使用focal loss损失来预测类别。第二全连接神经网络的网络结构如图5所示,第二隐藏层只有1层。通过第二全连接神经网络预测出物体的长宽高、类别以及偏转角度,计算速度快。
步骤S4,将预测的热力响应值通过三维高斯分布进行回归,得到高斯分布的均值,即预测的物体的中心坐标。
在具体的实施例中,步骤S4中具体包括:设第i个点的预测的热力响应值为Y i,该点 的坐标为q i(m i,n i,t i),则由三维高斯分布,该点所在的三维正态分布的均值为μ(μ 123),其中,正太分布的协方差矩阵方差取对角矩阵,对角线的取值设置超参数σ 1,σ 2,σ 3
则有:
Figure PCTCN2021074231-appb-000011
Figure PCTCN2021074231-appb-000012
Figure PCTCN2021074231-appb-000013
Figure PCTCN2021074231-appb-000014
则对原式进行展开有:
Figure PCTCN2021074231-appb-000015
对于
Figure PCTCN2021074231-appb-000016
有:
Figure PCTCN2021074231-appb-000017
运用最小二乘法:
Figure PCTCN2021074231-appb-000018
回归出高斯分布的均值,即预测的物体的中心坐标。
从热力图中回归出三维正太分布,三维正太分布的均值即为物体的中心点位置。
步骤S5,将预测的物体的长宽高、类别、偏转角度以及中心坐标进行合并,得到目标检测结果。
将步骤S3预测得到的物体的长宽高、类别以及偏转角度结合步骤S4预测得到的中心坐标就可以得到目标检测结果。
进一步参考图6,作为对上述各图所示方法的实现,本申请提供了一种基于三维热图的三维点云目标检测装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。
本申请的实施例提出的一种基于三维热图的三维点云目标检测装置,包括:
点云处理模块1,被配置为将点云输入多层稀疏卷积得到第一特征图,利用双线性插值法将第一特征图依次插值到点云原始的空间位置,得到第二特征图,将第二特征图进行拼接,输出N*C维特征图,其中N指点云的点数,C表示通道数;
第一预测模块2,被配置为将N*C维特征图输入第一全连接层,输出N*4维特征图,输出数据包括预测的点云的坐标以及预测的热力响应值,其中,每个点的所述热力响应值表示该点在所在物体中的位置信息与所述物体的中心位置的关系;
第二预测模块3,被配置为将N*C维特征图输入第二全连接层,输出五维向量,分别表示预测的物体的长宽高、类别以及偏转角度;
中心坐标确定模块4,被配置为将预测的热力响应值通过三维高斯分布进行回归,得到高斯分布的均值,即预测的物体的中心坐标;以及
目标检测结果输出模块5,被配置为将预测的物体的长宽高、类别、偏转角度以及中心坐标进行合并,得到目标检测结果。
以上模块的功能与方法相对应,在此不再赘述。
本公开提出了一种基于三维热图的三维点云目标检测方法和装置,本公开是基于三维热图的思想进行创新与推广,首次将三维热力响应值形成热图与三维目标检测进行结合。本公开利用三维稀疏卷积的主干网络对原始点云进行处理,预测出原始场景点云的热力响应值,再从热力响应值中回归出三维正太分布,三维正太分布的均值即为物体的中心点位置。再结合N*C维特征图预测的物体长宽高、类别以及物体的偏转角结果,即可获得物体的位置以及类别的检测。本公开保留了原始点云的空间结构信息,实验表明在目标的检测精度以及分类能力上相对于当前高速目标检测方法都有提高。
下面参考图7,其示出了适于用来实现本申请实施例的电子设备(例如图1所示的服务器或终端设备)的计算机装置700的结构示意图。图7示出的电子设备仅仅是一个示例,不应对本申请实施例的功能和使用范围带来任何限制。
如图7所示,计算机装置700包括中央处理单元(CPU)701和图形处理器(GPU)702,其可以根据存储在只读存储器(ROM)703中的程序或者从存储部分709加载到随机访问存储器(RAM)704中的程序而执行各种适当的动作和处理。在RAM 704中,还存储有装置700操作所需的各种程序和数据。CPU 701、GPU702、ROM 703以及RAM  704通过总线705彼此相连。输入/输出(I/O)接口706也连接至总线705。
以下部件连接至I/O接口706:包括键盘、鼠标等的输入部分707;包括诸如、液晶显示器(LCD)等以及扬声器等的输出部分708;包括硬盘等的存储部分709;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分710。通信部分710经由诸如因特网的网络执行通信处理。驱动器711也可以根据需要连接至I/O接口706。可拆卸介质712,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器711上,以便于从其上读出的计算机程序根据需要被安装入存储部分709。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分710从网络上被下载和安装,和/或从可拆卸介质712被安装。在该计算机程序被中央处理单元(CPU)701和图形处理器(GPU)702执行时,执行本申请的方法中限定的上述功能。
需要说明的是,本申请所述的计算机可读介质可以是计算机可读信号介质或者计算机可读介质或者是上述两者的任意组合。计算机可读介质例如可以是——但不限于——电、磁、光、电磁、红外线、或半导体的装置、装置或器件,或者任意以上的组合。计算机可读介质的更具体的例子可以包括但不限于:具有一个或多个导线的电连接、便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本申请中,计算机可读介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行装置、装置或者器件使用或者与其结合使用。而在本申请中,计算机可读的信号介质可以包括在基带中或者作为载波一部分传播的数据信号,其中承载了计算机可读的程序代码。这种传播的数据信号可以采用多种形式,包括但不限于电磁信号、光信号或上述的任意合适的组合。计算机可读的信号介质还可以是计算机可读介质以外的任何计算机可读介质,该计算机可读介质可以发送、传播或者传输用于由指令执行装置、装置或者器件使用或者与其结合使用的程序。计算机可读介质上包含的程序代码可以用任何适当的介质传输,包括但不限于:无线、电线、光缆、RF等等,或者上述的任意合适的组合。
可以以一种或多种程序设计语言或其组合来编写用于执行本申请的操作的计算机程序代码,所述程序设计语言包括面向对象的程序设计语言—诸如Java、Smalltalk、C++,还包括常规的过程式程序设计语言—诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算机上执行、部分地在用户计算机上执行、作为一个独立的软件包执行、部分在用户计算机上部分在远程计算机上执行、或者完全在远程计算机或服务器上执行。在涉及远程计算机的情形中,远程计算机可以通过任意种类的网络——包括局域网(LAN)或广域网(WAN)—连接到用户计算机,或者,可以连接到外部计算机(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本申请各种实施例的装置、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,该模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图和/或流程图中的每个方框、以及框图和/或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的装置来实现,或者可以用专用硬件与计算机指令的组合来实现。
描述于本申请实施例中所涉及到的模块可以通过软件的方式实现,也可以通过硬件的方式来实现。所描述的模块也可以设置在处理器中。
作为另一方面,本申请还提供了一种计算机可读介质,该计算机可读介质可以是上述实施例中描述的电子设备中所包含的;也可以是单独存在,而未装配入该电子设备中。上述计算机可读介质承载有一个或者多个程序,当上述一个或者多个程序被该电子设备执行时,使得该电子设备:将点云输入多层稀疏卷积得到第一特征图,利用双线性插值法将第一特征图依次插值到点云原始的空间位置,得到第二特征图,将第二特征图进行拼接,输出N*C维特征图,其中N指点云的点数,C表示通道数;将N*C维特征图输入第一全连接神经网络,输出N*4维特征图,输出数据包括预测的点云的坐标以及预测的热力响应值,其中,每个点的所述热力响应值表示该点在所在物体中的位置信息与所述物体的中心位置的关系;将N*C维特征图输入第二全连接神经网络,输出五维向量,分别表示预测的物体 的长宽高、类别以及偏转角度;将预测的热力响应值通过三维高斯分布进行回归,得到高斯分布的均值,即预测的物体的中心坐标;将预测的长宽高、类别、偏转角度以及中心坐标进行合并,得到目标检测结果。
以上描述仅为本申请的较佳实施例以及对所运用技术原理的说明。本领域技术人员应当理解,本申请中所涉及的公开范围,并不限于上述技术特征的特定组合而成的技术方案,同时也应涵盖在不脱离上述公开构思的情况下,由上述技术特征或其等同特征进行任意组合而形成的其它技术方案。例如上述特征与本申请中公开的(但不限于)具有类似功能的技术特征进行互相替换而形成的技术方案。

Claims (10)

  1. 一种基于三维热图的三维点云目标检测方法,其特征在于,包括以下步骤:
    点云处理步骤,将点云输入多层稀疏卷积得到第一特征图,利用双线性插值法将所述第一特征图依次插值到所述点云原始的空间位置,得到第二特征图,将所述第二特征图进行拼接,输出N*C维特征图,其中N指所述点云的点数,C表示通道数;
    第一预测步骤,将所述N*C维特征图输入第一全连接神经网络,输出N*4维特征图,输出数据包括预测的点云的坐标以及预测的热力响应值,其中,每个点的所述热力响应值表示该点在所在物体中的位置信息与所述物体的中心位置的关系;
    第二预测步骤,将所述N*C维特征图输入第二全连接神经网络,输出五维向量,分别表示预测的物体的长宽高、类别以及偏转角度;
    中心坐标确定步骤,将所述预测的热力响应值通过三维高斯分布进行回归,得到高斯分布的均值,即预测的物体的中心坐标;以及
    目标检测结果输出步骤,将所述预测的物体的长宽高、类别、偏转角度以及所述中心坐标进行合并,得到目标检测结果。
  2. 根据权利要求1所述的基于三维热图的三维点云目标检测方法,其特征在于,在以雷达扫描仪为原点的世界坐标系下,设物体的中心坐标为u(a,b,c),物体中任意一点的坐标为q(m,n,t),则该点的热力响应值为:
    Figure PCTCN2021074231-appb-100001
  3. 根据权利要求2所述的基于三维热图的三维点云目标检测方法,其特征在于,在所述第一预测步骤中通过所述点云的空间位置计算出每个点的真实的热力响应值,并采用所述真实的热力响应值对所述第一全连接神经网络进行训练。
  4. 根据权利要求2所述的基于三维热图的三维点云目标检测方法,其特征在于,所述中心坐标确定步骤中具体包括:设第i个点的预测的热力响应值为Y i,该点的坐标为q i(m i,n i,t i),则由三维高斯分布,该点所在的三维正态分布的均值为μ(μ 123),其中,正太分布的协方差矩阵方差取对角矩阵,对角线的取值设置超参数σ 1,σ 2,σ 3
    则有:
    Figure PCTCN2021074231-appb-100002
    Figure PCTCN2021074231-appb-100003
    Figure PCTCN2021074231-appb-100004
    Figure PCTCN2021074231-appb-100005
    则对原式进行展开有:
    Figure PCTCN2021074231-appb-100006
    对于
    Figure PCTCN2021074231-appb-100007
    有:
    Figure PCTCN2021074231-appb-100008
    运用最小二乘法:
    Figure PCTCN2021074231-appb-100009
    回归出高斯分布的均值,即预测的物体的中心坐标。
  5. 根据权利要求1-4中任一项所述的基于三维热图的三维点云目标检测方法,其特征在于,在所述点云处理步骤中将所述第二特征图进行拼接具体包括:将所述第二特征图对应位置的特征值进行相加。
  6. 根据权利要求1-4中任一项所述的基于三维热图的三维点云目标检测方法,其特征在于,所述第一全连接神经网络包括第一输入层,第一隐藏层和第一输出层,使用smooth L1损失函数来拟合收敛。
  7. 根据权利要求1-4中任一项所述的基于三维热图的三维点云目标检测方法,其特征在于,所述第二全连接神经网络包括第二输入层,第二隐藏层和第二输出层,使用smooth L1损失函数得到预测框的大小,使用focal loss损失来预测类别。
  8. 一种基于三维热图的三维点云目标检测装置,其特征在于,包括:
    点云处理模块,被配置为将点云输入多层稀疏卷积得到第一特征图,利用双线性插值 法将所述第一特征图依次插值到所述点云原始的空间位置,得到第二特征图,将所述第二特征图进行拼接,输出N*C维特征图,其中N指所述点云的点数,C表示通道数;
    第一预测模块,被配置为将所述N*C维特征图输入第一全连接层,输出N*4维特征图,输出数据包括预测的点云的坐标以及预测的热力响应值,其中,每个点的所述热力响应值表示该点在所在物体中的位置信息与所述物体的中心位置的关系;
    第二预测模块,被配置为将所述N*C维特征图输入第二全连接层,输出五维向量,分别表示预测的物体的长宽高、类别以及偏转角度;
    中心坐标确定模块,被配置为将所述预测的热力响应值通过三维高斯分布进行回归,得到高斯分布的均值,即预测的物体的中心坐标;以及
    目标检测结果输出模块,被配置为将所述预测的物体的长宽高、类别、偏转角度以及所述中心坐标进行合并,得到目标检测结果。
  9. 一种电子设备,包括:
    一个或多个处理器;
    存储装置,用于存储一个或多个程序,
    当所述一个或多个程序被所述一个或多个处理器执行,使得所述一个或多个处理器实现如权利要求1-7中任一所述的方法。
  10. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,该程序被处理器执行时实现如权利要求1-7中任一所述的方法。
PCT/CN2021/074231 2020-12-31 2021-01-28 一种基于三维热图的三维点云目标检测方法和装置 WO2022141720A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011633077.4 2020-12-31
CN202011633077.4A CN112699806A (zh) 2020-12-31 2020-12-31 一种基于三维热图的三维点云目标检测方法和装置

Publications (1)

Publication Number Publication Date
WO2022141720A1 true WO2022141720A1 (zh) 2022-07-07

Family

ID=75513621

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/074231 WO2022141720A1 (zh) 2020-12-31 2021-01-28 一种基于三维热图的三维点云目标检测方法和装置

Country Status (2)

Country Link
CN (1) CN112699806A (zh)
WO (1) WO2022141720A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115330753A (zh) * 2022-10-10 2022-11-11 博志生物科技(深圳)有限公司 椎骨识别方法、装置、设备及存储介质
CN115345908A (zh) * 2022-10-18 2022-11-15 四川启睿克科技有限公司 一种基于毫米波雷达的人体姿态识别方法
CN116664874A (zh) * 2023-08-02 2023-08-29 安徽大学 一种单阶段细粒度轻量化点云3d目标检测系统及方法

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113657925B (zh) * 2021-07-28 2023-08-22 黄淮学院 基于人工智能的土木工程造价管理方法
CN113807184A (zh) * 2021-08-17 2021-12-17 北京百度网讯科技有限公司 障碍物检测方法、装置、电子设备及自动驾驶车辆
CN114998890B (zh) * 2022-05-27 2023-03-10 长春大学 一种基于图神经网络的三维点云目标检测算法

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170124476A1 (en) * 2015-11-04 2017-05-04 Zoox, Inc. Automated extraction of semantic information to enhance incremental mapping modifications for robotic vehicles
CN111681212A (zh) * 2020-05-21 2020-09-18 中山大学 一种基于激光雷达点云数据的三维目标检测方法

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170124476A1 (en) * 2015-11-04 2017-05-04 Zoox, Inc. Automated extraction of semantic information to enhance incremental mapping modifications for robotic vehicles
CN111681212A (zh) * 2020-05-21 2020-09-18 中山大学 一种基于激光雷达点云数据的三维目标检测方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
SHIH-HUNG LIU; SHANG-YI YU; SHAO-CHI WU; HWANN-TZONG CHEN; TYNG-LUH LIU: "Learning Gaussian Instance Segmentation in Point Clouds", ARXIV.ORG, 20 July 2020 (2020-07-20), pages 1 - 22, XP081723863 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115330753A (zh) * 2022-10-10 2022-11-11 博志生物科技(深圳)有限公司 椎骨识别方法、装置、设备及存储介质
CN115330753B (zh) * 2022-10-10 2022-12-20 博志生物科技(深圳)有限公司 椎骨识别方法、装置、设备及存储介质
CN115345908A (zh) * 2022-10-18 2022-11-15 四川启睿克科技有限公司 一种基于毫米波雷达的人体姿态识别方法
CN115345908B (zh) * 2022-10-18 2023-03-07 四川启睿克科技有限公司 一种基于毫米波雷达的人体姿态识别方法
CN116664874A (zh) * 2023-08-02 2023-08-29 安徽大学 一种单阶段细粒度轻量化点云3d目标检测系统及方法
CN116664874B (zh) * 2023-08-02 2023-10-20 安徽大学 一种单阶段细粒度轻量化点云3d目标检测系统及方法

Also Published As

Publication number Publication date
CN112699806A (zh) 2021-04-23

Similar Documents

Publication Publication Date Title
WO2022141720A1 (zh) 一种基于三维热图的三维点云目标检测方法和装置
US11610115B2 (en) Learning to generate synthetic datasets for training neural networks
CN110363058B (zh) 使用单触发卷积神经网络的用于避障的三维对象定位
CN109902806B (zh) 基于卷积神经网络的噪声图像目标边界框确定方法
US11734851B2 (en) Face key point detection method and apparatus, storage medium, and electronic device
WO2021190451A1 (zh) 训练图像处理模型的方法和装置
US11967152B2 (en) Video classification model construction method and apparatus, video classification method and apparatus, device, and medium
US11940803B2 (en) Method, apparatus and computer storage medium for training trajectory planning model
JP7273129B2 (ja) 車線検出方法、装置、電子機器、記憶媒体及び車両
WO2022012179A1 (zh) 生成特征提取网络的方法、装置、设备和计算机可读介质
CN112734931B (zh) 一种辅助点云目标检测的方法及系统
JP7226696B2 (ja) 機械学習方法、機械学習システム及び非一時的コンピュータ可読記憶媒体
US20220277581A1 (en) Hand pose estimation method, device and storage medium
CN114037985A (zh) 信息提取方法、装置、设备、介质及产品
US20180165539A1 (en) Visual-saliency driven scene description
CN113536920B (zh) 一种半监督三维点云目标检测方法
Cao et al. QuasiVSD: efficient dual-frame smoke detection
EP4207072A1 (en) Three-dimensional data augmentation method, model training and detection method, device, and autonomous vehicle
Chen et al. Research on object detection algorithm based on improved Yolov5
Tan et al. 3D detection transformer: Set prediction of objects using point clouds
CN115457365A (zh) 一种模型的解释方法、装置、电子设备及存储介质
CN115170662A (zh) 基于yolov3和卷积神经网络的多目标定位方法
Vu et al. Scalable SoftGroup for 3D Instance Segmentation on Point Clouds
Zhang et al. An Improved Detection Algorithm For Pre-processing Problem Based On PointPillars
US20230229916A1 (en) Scalable tensor network contraction using reinforcement learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21912500

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21912500

Country of ref document: EP

Kind code of ref document: A1