WO2020253121A1 - 目标检测方法和装置及智能驾驶方法、设备和存储介质 - Google Patents

目标检测方法和装置及智能驾驶方法、设备和存储介质 Download PDF

Info

Publication number
WO2020253121A1
WO2020253121A1 PCT/CN2019/121774 CN2019121774W WO2020253121A1 WO 2020253121 A1 WO2020253121 A1 WO 2020253121A1 CN 2019121774 W CN2019121774 W CN 2019121774W WO 2020253121 A1 WO2020253121 A1 WO 2020253121A1
Authority
WO
WIPO (PCT)
Prior art keywords
point cloud
frame
initial
scenic spot
grid
Prior art date
Application number
PCT/CN2019/121774
Other languages
English (en)
French (fr)
Inventor
史少帅
王哲
王晓刚
李鸿升
Original Assignee
商汤集团有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 商汤集团有限公司 filed Critical 商汤集团有限公司
Priority to JP2020567923A priority Critical patent/JP7033373B2/ja
Priority to SG11202011959SA priority patent/SG11202011959SA/en
Priority to KR1020207035715A priority patent/KR20210008083A/ko
Priority to US17/106,826 priority patent/US20210082181A1/en
Publication of WO2020253121A1 publication Critical patent/WO2020253121A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • G06T17/20Finite element generation, e.g. wire-frame surface description, tesselation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T17/00Three dimensional [3D] modelling, e.g. data description of 3D objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • G06V10/267Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion by performing operations on regions, e.g. growing, shrinking or watersheds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10028Range image; Depth image; 3D point clouds
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/12Bounding box
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2210/00Indexing scheme for image generation or computer graphics
    • G06T2210/56Particle system, point based geometry or rendering

Definitions

  • the present disclosure relates to target detection technology, and in particular to a target detection method, intelligent driving method, target detection device, electronic equipment, and computer storage medium.
  • a core issue is how to perceive surrounding objects; in related technologies, the collected point cloud data can be projected to the top view, and the frame of the top view can be obtained using two-dimensional (2D) detection technology; The original information of the point cloud is lost during quantization, and it is difficult to detect occluded objects when detecting from 2D images.
  • 2D two-dimensional
  • the embodiments of the present disclosure expect to provide a technical solution for target detection.
  • the embodiment of the present disclosure provides a target detection method, the method includes:
  • the former scenic spot represents the point cloud data belonging to the target in the point cloud data, and the location information of the former scenic spot is used to characterize the location of the former scenic spot Relative position within the target;
  • the embodiment of the present disclosure also proposes an intelligent driving method, which is applied to an intelligent driving device, and the intelligent driving method includes:
  • a driving strategy is generated.
  • the embodiment of the present disclosure also provides a target detection device, the device includes an acquisition module, a first processing module, and a second processing module, wherein:
  • An acquiring module configured to acquire 3D point cloud data; determine the semantic feature of the point cloud corresponding to the 3D point cloud data according to the 3D point cloud data;
  • the first processing module is configured to determine the location information of the former scenic spot based on the semantic feature of the point cloud; the former scenic spot represents the point cloud data belonging to the target in the point cloud data, and the location information of the former scenic spot is used To characterize the relative position of the front scenic spot within the target; extract at least one initial 3D frame based on the point cloud data;
  • the second processing module is configured to determine the 3D detection frame of the target according to the semantic feature of the point cloud corresponding to the point cloud data, the position information of the front scenic spot, and the at least one initial 3D frame. There are targets in the area.
  • the embodiment of the present disclosure also provides an electronic device including a processor and a memory configured to store a computer program that can run on the processor; wherein,
  • any one of the aforementioned target detection methods is executed.
  • the embodiment of the present disclosure also proposes a computer storage medium on which a computer program is stored, and when the computer program is executed by a processor, any one of the foregoing target detection methods is implemented.
  • the embodiments of the present disclosure also provide a computer program product, the computer program product includes computer executable instructions, and after the computer executable instructions are executed, any target detection method provided in the embodiments of the present disclosure can be used.
  • 3D point cloud data from the target detection method, intelligent driving method, target detection device, electronic equipment, and computer storage medium proposed by the embodiments of the present disclosure; determine the point corresponding to the 3D point cloud data according to the 3D point cloud data Cloud semantic feature; based on the point cloud semantic feature, determine the location information of the previous scenic spot; the previous scenic spot represents the point cloud data belonging to the target in the point cloud data, and the location information of the previous scenic spot is used to characterize all The relative position of the previous scenic spot within the target; at least one initial 3D frame is extracted based on the point cloud data; the semantic feature of the point cloud corresponding to the point cloud data, the location information of the previous scenic spot, and the at least one The initial 3D frame determines the 3D detection frame of the target, and the target exists in the area within the detection frame.
  • the target detection method provided by the embodiments of the present disclosure can directly obtain point cloud semantic features from 3D point cloud data to determine the location information of the previous scenic spot, and then according to the point cloud semantic feature, the location information of the previous scenic spot, and at least one
  • the 3D frame determines the 3D detection frame of the target without projecting the 3D point cloud data to the top view.
  • the 2D detection technology is used to obtain the top view frame, which avoids the loss of the original information of the point cloud during quantification, and avoids the problem when projecting onto the top view
  • the occluded object is difficult to detect defects.
  • FIG. 1 is a flowchart of a target detection method according to an embodiment of the disclosure
  • FIG. 2 is a schematic diagram of a comprehensive framework of 3D part perception and aggregation neural network in the application embodiment of the disclosure
  • FIG. 3 is a block diagram of modules for sparse upsampling and feature correction in an application embodiment of the disclosure
  • FIG. 4 is a detailed error statistics diagram of the target position obtained from the VAL segmentation set of the KITTI data set of different difficulty levels in the application embodiment of the present disclosure
  • FIG. 5 is a schematic diagram of the composition structure of a target detection device according to an embodiment of the disclosure.
  • FIG. 6 is a schematic diagram of the hardware structure of an electronic device according to an embodiment of the disclosure.
  • the terms "including”, “including” or any other variations thereof are intended to cover non-exclusive inclusion, so that a method or device including a series of elements not only includes what is clearly stated Elements, but also include other elements not explicitly listed, or elements inherent to the implementation of the method or device. Without more restrictions, the element defined by the sentence “including a" does not exclude the existence of other related elements (such as steps or steps in the method) in the method or device that includes the element.
  • the unit in the device for example, the unit may be part of a circuit, part of a processor, part of a program or software, etc.).
  • the target detection method or smart driving method provided by the embodiment of the present disclosure includes a series of steps, but the target detection method or smart driving method provided by the embodiment of the present disclosure is not limited to the recorded steps.
  • the embodiment of the present disclosure The provided target detection device includes a series of modules, but the device provided by the embodiments of the present disclosure is not limited to include the explicitly recorded modules, and may also include modules that need to be set for obtaining relevant information or processing based on information.
  • the embodiments of the present disclosure can be applied to a computer system composed of a terminal and a server, and can operate with many other general-purpose or special-purpose computing system environments or configurations.
  • the terminal can be a thin client, a thick client, a handheld or laptop device, a microprocessor-based system, a set-top box, a programmable consumer electronic product, a network personal computer, a small computer system, etc.
  • the server can be a server computer System small computer system, large computer system and distributed cloud computing technology environment including any of the above systems, etc.
  • Electronic devices such as terminals and servers can be described in the general context of computer system executable instructions (such as program modules) executed by a computer system.
  • program modules may include routines, programs, object programs, components, logic, data structures, etc., which perform specific tasks or implement specific abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment. In the distributed cloud computing environment, tasks are executed by remote processing equipment linked through a communication network.
  • program modules may be located on a storage medium of a local or remote computing system including a storage device.
  • 3D target detection technology based on point cloud data has attracted more and more attention.
  • point cloud data can be obtained based on radar sensors; although it is performed from images
  • Significant achievements have been made in 2D target detection.
  • 3D target detection methods to three-dimensional (3D) target detection based on point clouds. This is mainly due to the points generated by LiDAR sensors. Cloud data is sparse and irregular. How to extract and recognize semantic features of point clouds from irregular points, and segment the foreground and background based on the extracted features to determine the 3D detection frame is still a challenging problem.
  • 3D target detection is a very important research direction; for example, through 3D target detection, it is possible to determine the specific position, shape, movement direction, etc. of surrounding vehicles and pedestrians in 3D space. Information to help autonomous vehicles or robots make decisions about actions.
  • the point cloud is often projected onto the top view
  • 2D detection technology is used to obtain the top view frame
  • the 2D image is directly used to first select the candidate frame, and then return to the corresponding 3D on the point cloud of a specific area frame.
  • the frame of the top view obtained by the 2D detection technology is a 2D frame
  • the 2D frame represents a frame of a two-dimensional plane used to identify the point cloud data of the target
  • the 2D frame may be a rectangle or other two-dimensional planar shapes.
  • the original information of the point cloud is lost when projecting onto the top view, while it is difficult to detect the occluded target when detecting from the 2D image.
  • the position information of the target is not separately considered. For example, for a car, the position information of the front, rear, and wheels of the car is helpful for the 3D detection of the target.
  • a target detection method is proposed.
  • the embodiments of the present disclosure can be implemented in scenarios such as automatic driving and robot navigation.
  • FIG. 1 is a flowchart of a target detection method according to an embodiment of the disclosure. As shown in FIG. 1, the process may include:
  • Step 101 Obtain 3D point cloud data.
  • point cloud data can be collected based on radar sensors.
  • Step 102 Determine the semantic feature of the point cloud corresponding to the 3D point cloud data according to the 3D point cloud data.
  • point cloud data in order to segment the foreground and background and predict the location information of the 3D target location of the previous scenic spot, it is necessary to learn distinctive point-by-point features from the point cloud data; for the realization of the point cloud semantic features corresponding to the point cloud data, Exemplarily, the entire point cloud may be subjected to 3D gridding processing to obtain a 3D grid; the semantic feature of the point cloud corresponding to the 3D point cloud data is extracted from the non-empty grid of the 3D grid; 3D point cloud data The corresponding semantic feature of the point cloud can represent the coordinate information of the 3D point cloud data and so on.
  • the center of each grid can be regarded as a new point to obtain a gridded point cloud approximately equivalent to the initial point cloud; the aforementioned gridded point cloud is usually sparse.
  • the point-by-point feature of the grid point cloud can be extracted based on the sparse convolution operation.
  • the point-by-point feature of the grid point cloud here is the semantic feature of each point of the gridded point cloud.
  • the foreground and background can be segmented to obtain the front scenic spot and background point;
  • the former scenic spot represents the point cloud data belonging to the target, and the background point represents the point cloud data not belonging to the target;
  • the target can be Vehicles, human bodies, and other objects that need to be recognized;
  • foreground and background segmentation methods include, but are not limited to, threshold-based segmentation methods, region-based segmentation methods, edge-based segmentation methods, and segmentation methods based on specific theories.
  • the non-empty grid in the aforementioned 3D grid represents a grid that contains point cloud data
  • the empty grid in the aforementioned 3D grid represents a grid that does not contain point cloud data
  • the size of the entire 3D space is 70m*80m*4m, and the size of each grid is 5cm*5cm*10cm; for KITTI
  • Each 3D scene on the dataset generally has 16000 non-empty grids.
  • Step 103 Determine the location information of the former scenic spot based on the semantic feature of the point cloud; the former scenic spot represents the point cloud data belonging to the target in the point cloud data, and the location information of the former scenic spot is used to represent the The relative position of the former scenic spot within the target.
  • the foreground and background of the point cloud data can be segmented according to the above-mentioned point cloud semantic feature to determine the former scenic spot;
  • the former scenic spot is the point cloud data.
  • the aforementioned neural network is obtained by training using a training data set including the annotation information of the 3D frame, and the annotation information of the 3D frame includes at least the position information of the former scenic spot of the point cloud data of the training data set.
  • the method for segmenting the foreground and the background is not limited.
  • a focal loss method may be used to achieve the segmentation of the foreground and the background.
  • the training data set can be a pre-acquired data set.
  • radar sensors can be used to obtain point cloud data in advance, and then the point cloud data can be segmented and divided out 3D box, and add annotation information in the 3D box to obtain a training data set.
  • the annotation information can represent the position information of the previous scenic spot in the 3D box.
  • the 3D box in the training data set can be recorded as a ground-truth box.
  • the 3D box represents a three-dimensional box used to identify the point cloud data of the target, and the 3D box may be a cuboid or other three-dimensional boxes.
  • the label information of the 3D frame of the training data set may be used, and the binary cross entropy loss may be used as the part regression loss to predict the position information of the previous scenic spot.
  • all points inside or outside the ground-truth box are used as positive and negative samples for training.
  • the labeling information of the 3D frame mentioned above includes accurate position information, which is characterized by rich information and can be obtained for free; that is, the technical solution of the embodiment of the present disclosure can be based on the labeling information of the 3D candidate frame.
  • the inferred free supervision information predicts the target internal position information of the previous scenic spot.
  • the information of the original point cloud data can be directly extracted based on the sparse convolution operation, which can be used to segment the foreground and background and predict the position information of each front scenic spot (that is, in the target 3D frame Position information in ), and then can quantify the information representing which part of the target each point belongs to.
  • This avoids the quantization loss caused by projecting the point cloud onto the top view and the occlusion problem of 2D image detection in the related technology, so that the point cloud semantic feature extraction process can be more natural and efficient.
  • Step 104 Extract at least one initial 3D box based on the point cloud data.
  • a Region Proposal Network may be used to extract at least one 3D candidate box, and each 3D candidate box is an initial 3D box.
  • RPN Region Proposal Network
  • the position information of the various points of the initial 3D frame can be aggregated to help the generation of the final 3D frame; that is, the predicted position information of each front scenic spot can help the generation of the final 3D frame.
  • Step 105 Determine a 3D detection frame of the target according to the semantic feature of the point cloud corresponding to the point cloud data, the location information of the front scenic spot, and the at least one initial 3D frame, and the target exists in the area within the detection frame.
  • the location information of the front scenic spot and the point cloud semantic feature pooling operation can be performed to obtain the location information of each initial 3D frame after pooling And point cloud semantic features; according to the location information and point cloud semantic features of each initial 3D box after pooling, each initial 3D box is revised and/or the confidence of each initial 3D box is determined to determine the The 3D detection frame of the target.
  • the final 3D frame can be obtained, which can be used to realize the detection of the target; and the confidence of the initial 3D frame can be used to represent the position information of the front scenic spot in the initial 3D frame Confidence, and further, determining the confidence of the initial 3D frame is beneficial for correcting the initial 3D frame to obtain the final 3D detection frame.
  • the 3D detection frame of the target may represent the 3D frame used for target detection.
  • the information of the target in the image can be determined according to the 3D detection frame of the target, for example, according to The 3D detection frame of the target determines the position and size of the target in the image.
  • the features of all points in the initial 3D box can be directly acquired and aggregated for scoring and correction of the confidence of the 3D box; that is, the position information and points of the initial 3D box can be directly obtained
  • the cloud semantic features are pooled to achieve the confidence score and/or correction of the initial 3D frame; due to the sparsity of the point cloud, the method of the first example above cannot recover the initial 3D frame from the pooled features The shape of the original 3D box is lost.
  • each of the above-mentioned initial 3D boxes can be evenly divided into multiple grids, and the location information of the front scenic spot and the semantic feature of the point cloud can be pooled for each grid to obtain the pooled Position information and point cloud semantic features of each initial 3D box.
  • each initial 3D frame may be uniformly meshed in the 3D space according to the set resolution, and the set resolution is recorded as the pooling resolution.
  • any one of the grids is an empty grid.
  • the position information of the any one of the grids can be marked as empty to obtain
  • the position information of the former scenic spot after the grid pooling is described above, and the point cloud semantic feature of the grid is set to zero to obtain the point cloud semantic feature after the grid pooling.
  • the position information of the front scenic spot of the grid may be uniformly pooled to obtain the position information of the front scenic spot after the grid is pooled.
  • the point cloud semantic feature of the front scenic spot of the grid is maximized pooling processing to obtain the point cloud semantic feature after the grid is pooled.
  • uniform pooling can refer to: taking the average of the location information of the previous scenic spots in the neighborhood as the location information of the previous scenic spots after grid pooling;
  • maximizing pooling can refer to: taking the location information of the previous scenic spots in the neighborhood The maximum value of the part position information is used as the part position information of the former scenic spot after the grid pooling.
  • the pooled location information can approximately represent the center location information of each grid.
  • the position of each initial 3D frame after pooling can be obtained Information and point cloud semantic features; here, the position information of each initial 3D frame after pooling includes the position information of the former scenic spot after each grid pooling corresponding to the initial 3D frame, and each initial 3D pooled
  • the point cloud semantic feature of the frame includes the point cloud semantic feature after each grid pooling corresponding to the initial 3D frame.
  • the empty grid is also processed accordingly. Therefore, the pooled initial 3D box
  • the position information of the part and the semantic feature of the point cloud can better encode the geometric information of the 3D initial frame. Furthermore, it can be considered that the embodiment of the present disclosure proposes a pooling operation sensitive to the initial 3D frame.
  • the pooling operation that is sensitive to the initial 3D frame proposed in the embodiments of the present disclosure can obtain pooled features of the same resolution from initial 3D frames of different sizes, and can restore the shape of the 3D initial frame from the pooled features; in addition, The pooled features can facilitate the integration of position information within the initial 3D frame, and in turn, facilitate the confidence score of the initial 3D frame and the correction of the initial 3D frame.
  • each initial 3D frame and/or determining the confidence of each initial 3D frame according to the position information and point cloud semantic features of each initial 3D frame after pooling for example, The position information of each initial 3D frame after the above pooling and the semantic feature of the point cloud are merged, and each initial 3D frame is revised and/or the confidence of each initial 3D frame is determined according to the merged characteristics.
  • the position information and point cloud semantic feature of each initial 3D frame after pooling can be converted into the same feature dimension, and then the position information of the same feature dimension and the point cloud semantic feature can be connected , To achieve the merging of location information of the same feature dimension and point cloud semantic features.
  • the position information and point cloud semantic features of each initial 3D frame after pooling can be represented by feature maps.
  • the pooled feature maps can be converted to the same Feature dimensions, and then merge these two feature maps.
  • the merged feature may be a matrix of m*n*k, and m, n, and k are all positive integers; the merged feature may be used for subsequent integration of position information in the 3D frame, and then , Based on the integration of the position information inside the initial 3D frame, the confidence prediction of the position information of the part in the 3D frame and the correction of the 3D frame can be performed.
  • PointNet is usually used to integrate the point cloud information. Due to the sparsity of the point cloud, this operation loses the information of the initial 3D frame, which is not conducive to the location information of the 3D part. Integration.
  • the process of correcting each initial 3D frame and/or determining the confidence of each initial 3D frame according to the merged features can be implemented in the following manners.
  • the merged feature may be vectorized into a feature vector, and each initial 3D box may be corrected and/or the confidence level of each initial 3D box may be determined according to the feature vector.
  • each initial 3D box may be corrected and/or the confidence level of each initial 3D box may be determined according to the feature vector.
  • FC layers fully connected layers
  • the confidence level of each initial 3D box here, the fully connected layer belongs to a basic unit in the neural network, which can integrate the local information with category discrimination in the convolutional layer or the pooling layer.
  • a sparse convolution operation can be performed to obtain a feature map after the sparse convolution operation; according to the feature map after the sparse convolution operation, each initial 3D box is corrected and/or each The confidence level of the initial 3D box.
  • the convolution operation can be used to gradually aggregate the features from the local scale to the global scale, so as to correct each initial 3D box and/or determine each The confidence level of the initial 3D box.
  • the second method can be used to correct each initial 3D frame and/or determine the confidence of each initial 3D frame.
  • the sparse convolution operation is performed to obtain the feature map after the sparse convolution operation; the feature map after the sparse convolution operation is down-sampled, and according to the down-sampled feature map, each initial The 3D box is corrected and/or the confidence level of each initial 3D box is determined.
  • each initial 3D box can be corrected more effectively and/or the confidence of each initial 3D box can be determined, and computing resources can be saved.
  • the feature map after the sparse convolution operation can be down-sampled through a pooling operation; for example, the pooling of the feature map after the sparse convolution operation is used here.
  • the operation is sparse max-pooling operation.
  • a feature vector is obtained for integration of the position information of the part.
  • the gridded feature can be gradually down-sampled into an encoded feature
  • the vector is used to integrate the position information of the 3D part; then, the encoded feature vector can be used to correct each initial 3D frame and/or determine the confidence of each initial 3D frame.
  • the embodiment of the present disclosure proposes an integration operation of 3D part position information based on a sparse convolution operation, which can encode the 3D part position information of each initial 3D frame pooled feature layer by layer; this operation is the same as that of the initial 3D frame.
  • the combination of sensitive pooling operations can better aggregate the 3D position information for the final confidence prediction of the initial 3D frame and/or the correction of the initial 3D frame to obtain the 3D detection frame of the target.
  • steps 101 to 103 can be implemented based on the processor of the electronic device.
  • the aforementioned processor can be an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), or a digital signal processor.
  • Signal processing device Digital Signal Processing Device, DSPD
  • Programmable Logic Device Programmable Logic Device, PLD
  • Field Programmable Gate Array Field Programmable Gate Array
  • FPGA Field Programmable Gate Array
  • CPU Central Processing Unit
  • Control At least one of a device, a microcontroller, and a microprocessor. It is understandable that for different electronic devices, the electronic devices used to implement the above-mentioned processor functions may also be other, and the embodiment of the present disclosure does not specifically limit it.
  • the target detection method provided by the embodiments of the present disclosure can directly obtain point cloud semantic features from 3D point cloud data to determine the location information of the previous scenic spot, and then according to the point cloud semantic feature, the location information of the previous scenic spot and At least one 3D frame determines the 3D detection frame of the target without projecting the 3D point cloud data to the top view.
  • the frame of the top view is obtained by using 2D detection technology, which avoids the loss of the original information of the point cloud during quantification, and also avoids projecting onto the top view. It is difficult to detect defects caused by obscured objects.
  • an embodiment of the present disclosure also proposes an intelligent driving method, which is applied to an intelligent driving device.
  • the intelligent driving method includes: obtaining the surroundings of the intelligent driving device according to any of the foregoing target detection methods.
  • the 3D detection frame of the target according to the 3D detection frame of the target, a driving strategy is generated.
  • the smart driving device includes autonomously driving vehicles, robots, blind guide devices, etc. At this time, the smart driving device can control the driving according to the generated driving strategy; in another example, the smart driving device includes installing At this time, the generated driving strategy can be used to guide the driver to control the driving of the vehicle.
  • a 3D part perception and aggregation neural network (which can be named Part-A 2 network) for target detection from the original point cloud is proposed.
  • the framework of this network is a new point cloud-based
  • the two-stage framework of three-dimensional target detection can be composed of the following two stages, where the first stage is the part perception stage, and the second stage is the part aggregation stage.
  • free supervision information can be inferred based on the annotation information of the 3D frame, and the initial 3D frame and accurate part location information (intra-object part locations) can be predicted at the same time;
  • the position information of the parts is aggregated to realize the effective representation of the coding of the 3D frame features.
  • the part aggregation stage consider the spatial relationship of the pooled part position information for re-scoring (confidence scoring) and correcting the position of the 3D frame; a large number of experiments were carried out on the KITTI data set to prove the predicted former attractions
  • the position information of the part is conducive to 3D target detection, and the above-mentioned target detection method based on 3D part perception and aggregated neural network is superior to the target detection method in related technologies that feeds a point cloud as an input.
  • the segmentation label is directly derived from the annotation information of the 3D box in the training data set; however, the annotation information of the 3D box not only provides the segmentation mask, but also provides the precise position of all points in the 3D box.
  • the Part-A 2 network described above is proposed in some embodiments; specifically, in the first part perception stage, the network estimates the position information of the target parts of all previous scenic spots through learning.
  • the labeling information and segmentation mask can be directly generated from the real information manually labeled.
  • the real information manually labeled can be recorded as Ground-truth.
  • the real information manually labeled can be a three-dimensional frame manually labeled.
  • the motivation of the part aggregation stage is that given a set of points in a 3D candidate frame, the Part-A 2 network should be able to evaluate the quality of the candidate frame and optimize it by learning the spatial relationship of the predicted target part positions of all these points The candidate box.
  • a novel sensing point cloud pooling module can be proposed, which can be recorded as the RoI sensing point cloud pooling module; the RoI sensing point cloud pooling module can be through the new pooling module Operation to eliminate the ambiguity in the area pooling on the point cloud; unlike the pooling operation on all point clouds or non-empty voxels in the pooling operation scheme in the related technology, the RoI perception point cloud pooling module is All grids in the 3D box (including non-empty grids and empty grids) are pooled, which is the key to generating an effective representation of 3D box scores and position corrections, because empty grids also encode 3D box information.
  • the above-mentioned network can use sparse convolution and pooling operations to aggregate location information; experimental results show that the aggregate location features can significantly improve the quality of candidate frames, and achieve the most advanced performance on the three-dimensional detection benchmark.
  • the 3D part perception and aggregation neural network only uses point cloud data as input, and can obtain similar or even better results with related technologies. 3D detection results; further, in the framework of the above-mentioned 3D part perception and aggregation neural network, the rich information provided by the annotation information of the 3D box is further explored, and accurate target part position information is learned and predicted to improve the performance of 3D target detection; Further, the application embodiment of the present disclosure proposes a backbone network with a U-shaped network structure, which can use sparse convolution and deconvolution to extract and identify point cloud features for predicting target location information and three-dimensional target detection.
  • Fig. 2 is a schematic diagram of the comprehensive framework of 3D part perception and aggregation neural network in the application embodiment of the present disclosure.
  • the framework of the 3D part perception and aggregation neural network includes a part perception stage and a part aggregation stage.
  • the sensing stage by inputting the original point cloud data into the backbone network of the newly designed U-shaped network structure, the position of the target part can be accurately estimated and 3D candidate frames can be generated; in the part aggregation stage, the proposed RoI-based sensing point cloud pooling is carried out The pooling operation of the module, specifically, grouping the internal position information of each 3D candidate frame, and then using the part aggregation network to consider the spatial relationship between the parts, so as to score and position the 3D frame.
  • the ground-truth box of 3D target detection automatically provides accurate target position and segmentation mask for each 3D point; this is very different from 2D target detection ,
  • the 2D target frame may only contain part of the target due to the occlusion, so it cannot provide an accurate target position for each 2D pixel.
  • the target monitoring method in the embodiments of the present disclosure can be applied to a variety of scenarios.
  • the above-mentioned target detection method can be used to perform 3D target monitoring in an autonomous driving scene by detecting the location, size, and moving direction of surrounding targets. Information helps autopilot decision-making;
  • the above-mentioned target detection method can be used to achieve 3D target tracking.
  • the above-mentioned target detection method can be used to achieve 3D target detection at every moment, and the detection result can be used as 3D target tracking
  • the above-mentioned target detection method can be used to pool the point cloud in the 3D frame.
  • the sparse point cloud in different 3D frames can be pooled into a 3D with a fixed resolution The characteristics of the box.
  • Part-A 2 network is proposed in the application embodiment of the present disclosure for 3D target detection from the point cloud. Specifically, we introduce 3D part position labels and segmentation labels as additional supervision information to facilitate the generation of 3D candidate frames; in the part aggregation stage, the predicted 3D target part position information in each 3D candidate frame is aggregated, To score the candidate frame and correct the position.
  • the application embodiment of the present disclosure designs a U-shaped network structure, which can learn the point-by-point features of the previous scenic spot by performing sparse convolution and sparse deconvolution on the obtained sparse grid Representation; in Figure 2, you can perform 3 sparse convolution operations on the point cloud data with a step size of 2, so that the spatial resolution of the point cloud data can be reduced to 1/8 of the initial spatial resolution through downsampling, each time sparse
  • the convolution operation has several submanifold sparse convolutions; here, the step size of the sparse convolution operation can be determined according to the spatial resolution of the point cloud data.
  • the step size of the sparse convolution operation needs to be set to be longer; after performing 3 sparse convolution operations on the point cloud data, perform sparse upsampling and feature correction on the features obtained after 3 sparse convolution operations; implementation of the present disclosure
  • the up-sampling block based on the sparse operation (used to perform the sparse up-sampling operation) can be used to modify the fusion feature and save computing resources.
  • FIG. 3 is a block diagram of the sparse upsampling and feature correction module in the application embodiment of the disclosure.
  • the module is applied to the backbone of the U-shaped network structure based on sparse convolution In the decoder of the network; referring to Figure 3, the lateral features and bottom features are first fused through sparse convolution, and then the fused features are upsampled through sparse deconvolution.
  • sparse convolution 3 ⁇ 3 ⁇ 3 means sparse convolution with a convolution kernel size of 3 ⁇ 3 ⁇ 3
  • channel connection means the connection of the feature vector in the channel direction
  • channel reduction means the reduction of the feature vector in the channel direction. Represents the addition in the channel direction according to the feature vector; it can be seen that referring to Figure 3, sparse convolution, channel connection, channel reduction, sparse deconvolution and other operations can be performed for the horizontal feature and the bottom feature, and the horizontal Feature correction of features and bottom features.
  • semantic segmentation and target location prediction can also be performed on the features after performing sparse upsampling and feature correction.
  • the internal position information of the target is essential; for example, the side of the vehicle is also a plane perpendicular to the ground, and the two wheels are always close to the ground.
  • the neural network has developed the ability to infer the shape and posture of the object, which is conducive to 3D target detection.
  • two branches can be added, which are respectively used to segment the previous scenic spots and predict their object position; when predicting the object position of the previous scenic spot , The prediction can be made based on the annotation information of the 3D box of the training data set.
  • the training data set all the points inside or outside the ground-truth box are used as positive and negative samples for training.
  • the 3D ground-truth box automatically provides 3D location labels; the location labels (p x , p y , p z ) of the front scenic spot are known parameters, here, (p x , p y , p z ) can be converted to the location location Label (O x , O y , O z ) to indicate its relative position in the corresponding target; 3D box is represented by (C x , C y , C z , h, w, l, ⁇ ), where (C x , Cy , C z ) represent the center position of the 3D frame, (h, w, l) represents the size of the bird's-eye view corresponding to the 3D frame, and ⁇ represents the direction of the 3D frame in the corresponding bird's-eye view, that is, the 3D frame The angle between the orientation in the corresponding bird's-eye view and the X-axis direction of the bird's-eye view.
  • O x , O y , O z ⁇ [0,1] the position of the target center is (0.5, 0.5, 0.5); here, the coordinates involved in formula (1) are all expressed in KITTI's lidar coordinate system, Among them, the z direction is perpendicular to the ground, and the x and y directions are on the horizontal plane.
  • the binary cross-entropy loss can be used as the part regression loss to learn the position of the former scenic spot along the 3-dimensional, the expression is as follows:
  • P u represents the predicted internal position of the target after the Sigmoid Layer
  • L part (P u ) represents the predicted position information of the 3D point.
  • the position of the front sight can be predicted only.
  • 3D candidate frames can also be generated.
  • a 3D candidate frame needs to be generated to aggregate the target position information of the pre-estimated scenic spot from the same target; in actual implementation, as shown in Figure 2,
  • the feature map generated by the sparse convolutional encoder (that is, the feature map obtained after 3 sparse convolution operations on the point cloud data) is appended with the same RPN header; in order to generate a 3D candidate frame, the feature map is sampled 8 times, and The features at different heights of the same bird's-eye view position are aggregated to generate a 2D bird's-eye view feature map for 3D candidate frame generation.
  • the pooling operation can be performed in the part aggregation stage.
  • the point cloud area pooling operation is proposed, and the 3D candidate frame can be The point-by-point feature in the pooling operation is performed, and then, based on the feature mapping after the pooling operation, the 3D candidate frame is corrected; however, this pooling operation will lose the 3D candidate frame information because the points in the 3D candidate frame are not The distribution is regular, and there is the ambiguity of recovering the 3D box from the pooled point.
  • Figure 4 is a schematic diagram of the point cloud pooling operation in the application embodiment of the disclosure.
  • the previous point cloud pooling operation represents the point cloud area pooling operation described above, and the circle represents the point after pooling. You can see Out, if the point cloud area pooling operation described above is used, different 3D candidate frames will result in the same pooled point, that is, the point cloud area pooling operation described above is ambiguous and cannot be used.
  • the previous point cloud pooling method restores the initial 3D candidate frame shape, which will have a negative impact on subsequent candidate frame corrections.
  • the ROI-aware point cloud pooling operation is proposed.
  • the specific process of the ROI-aware point cloud pooling operation is: evenly dividing each 3D candidate frame into multiple Grids, when any one of the plurality of grids does not contain the previous scenic spot, the any one of the grids is an empty grid, at this time, the position information of the any one of the grids can be marked as Empty, and set the point cloud semantic feature of any one of the grids to zero; perform uniform pooling processing on the position information of the previous scenic spot of each grid, and perform a uniform pooling process on the previous scenic spot of each grid
  • the semantic feature of the point cloud is maximized pooling, and the position information and the semantic feature of the point cloud of each 3D candidate frame after pooling are obtained.
  • the ROI-aware point cloud pooling operation can encode the shape of the 3D candidate frame by preserving the empty grid, while sparse convolution can effectively perform the shape of the candidate frame (empty grid). deal with.
  • the 3D candidate frame can be evenly divided into a regular grid with a fixed spatial shape (H*W*L), where H, W, and L are respectively Represents the height, width, and length hyperparameters of the pooling resolution in each dimension, and is independent of the size of the 3D candidate frame.
  • a learning-based method which can reliably aggregate the position information of the parts for 3D candidate frame scoring (ie, confidence) and position correction.
  • For each 3D candidate frame we apply the proposed ROI-sensing point cloud pooling operation on the position information and point cloud semantic features of the 3D candidate frame respectively to generate two sizes of (14*14*14*4) and ( 14*14*14*C) feature mapping, where the predicted location information corresponds to a 4-dimensional mapping, where 3 dimensions represent XYZ dimensions, used to represent location locations, 1 dimension represents foreground segmentation scores, and C represents location The feature size of the point-by-point feature from the perception stage.
  • a sparse maximization pooling operation with a kernel size of 2*2*2 and a step size of 2*2*2 can be applied .
  • a sparse maximization pooling operation with a kernel size of 2*2*2 and a step size of 2*2*2 can be applied .
  • the feature map obtained by the sparse convolution operation can also be vectorized (corresponding to the FC in Figure 2), A feature vector is obtained; after the feature vector is obtained, two branches can be added to perform the final 3D candidate frame scoring and 3D candidate frame position correction; exemplary, the 3D candidate frame score represents the confidence score of the 3D candidate frame, and the 3D candidate frame The confidence score of at least represents the score of the position information of the front scenic spot in the 3D candidate frame.
  • the execution process of the part aggregation stage proposed in the application embodiment of the present disclosure can effectively aggregate features from a local to a global scale, thereby learning prediction The spatial distribution of location.
  • sparse convolution it also saves a lot of computing resources and parameters, because the pooled grid is very sparse; and related technologies cannot ignore it (that is, sparse convolution cannot be used for position aggregation), This is because in related technologies, each grid needs to be encoded as a feature at a specific position in the 3D candidate frame.
  • the position-corrected 3D frame can be obtained, that is, the final 3D frame can be obtained, which can be used to realize 3D target detection.
  • two branches can be appended to the vectorized feature vector aggregated from the predicted part information.
  • the 3D candidate box scoring (ie confidence) branch the 3D Intersection Over Union (IOU) between the 3D candidate box and its corresponding ground-truth box can be used as a soft label for the quality evaluation of the 3D candidate box, or
  • IOU 3D Intersection Over Union
  • binary cross entropy loss is used to learn the 3D candidate frame score.
  • ⁇ x, ⁇ y, and ⁇ z respectively represent the offset of the center position of the 3D frame
  • ⁇ h, ⁇ w, and ⁇ l respectively represent the size offset of the bird's-eye view corresponding to the 3D frame
  • represents the direction offset of the bird's-eye view corresponding to the 3D frame amount
  • d a bird's-eye view showing the normalized center offset
  • x a, y a, and z a represents a 3D center position of the anchor point / candidate frame
  • h a, w a and l a denotes the anchor 3D / candidate block corresponding to
  • ⁇ a represents the direction of the bird's-eye view corresponding to the 3D anchor point/candidate frame
  • x g , y g and z g represent the center position of the corresponding ground-truth frame
  • h g , w g and l g Indicates
  • the difference in the correction method of the candidate frame in the related art is that the position correction of the 3D candidate frame in the application embodiment of the present disclosure can directly regress the relative offset or size ratio according to the parameters of the 3D candidate frame, because the above-mentioned ROI perception points
  • the cloud pooling module has encoded all the shared information of the 3D candidate frames, and transmitted different 3D candidate frames to the same standardized space coordinate system.
  • the application embodiment of the present disclosure proposes a new 3D target detection method, that is, using the above-mentioned Part-A 2 network to detect a three-dimensional target from a point cloud; in the part perception stage, learn by using position tags from a 3D frame Estimate the accurate target position; group the predicted position of each target through the new ROI perception point cloud pooling module. Therefore, in the part aggregation stage, the spatial relationship of the predicted internal position of the target can be considered to score the 3D candidate frames and correct their positions.
  • the target detection method of the disclosed application embodiment achieves the most advanced performance on the challenging KITTI three-dimensional detection benchmark, which proves the effectiveness of the method.
  • an embodiment of the present disclosure proposes a target detection device.
  • FIG. 5 is a schematic diagram of the composition structure of the target detection device according to the embodiment of the disclosure. As shown in FIG. 5, the device is located in an electronic device, and the device includes: an acquisition module 601, a first processing module 602, and a second processing module 603 , among them,
  • the obtaining module 601 is configured to obtain 3D point cloud data; according to the 3D point cloud data, determine the point cloud semantic feature corresponding to the 3D point cloud data;
  • the first processing module 602 is configured to determine the location information of the former scenic spot based on the semantic feature of the point cloud; the former scenic spot represents the point cloud data belonging to the target in the point cloud data, and the location information of the former scenic spot Used to characterize the relative position of the front scenic spot within the target; extract at least one initial 3D box based on the point cloud data;
  • the second processing module 603 is configured to determine the 3D detection frame of the target according to the semantic feature of the point cloud corresponding to the point cloud data, the position information of the front scenic spot, and the at least one initial 3D frame. There are targets in the area of.
  • the second processing module 603 is configured to perform a pooling operation of the location information of the front scenic spot and the semantic feature of the point cloud for each initial 3D frame, to obtain each initial 3D frame after pooling. Position information and point cloud semantic features of each initial 3D frame after pooling; correct each initial 3D frame and/or determine the confidence level of each initial 3D frame according to the location information and semantic feature of each initial 3D frame after pooling To determine the 3D detection frame of the target.
  • the second processing module 603 is configured to evenly divide each initial 3D frame into a plurality of grids, and perform the position information of the front scenic spot and the semantic feature of the point cloud for each grid.
  • the pooling operation to obtain the position information and point cloud semantic features of each initial 3D box after pooling; according to the position information and point cloud semantic features of each initial 3D box after pooling, each initial 3D
  • the frame is modified and/or the confidence of each initial 3D frame is determined to determine the 3D detection frame of the target.
  • the second processing module 603 is configured to respond to a grid that does not include the previous spot location information and the point cloud semantic feature pooling operation for each grid.
  • a grid that does not include the previous spot location information and the point cloud semantic feature pooling operation for each grid.
  • scenic spots mark the location information of the grid as empty, obtain the location information of the former scenic spot after the grid pooling, and set the semantic feature of the grid point cloud to zero to obtain The semantic features of the point cloud after grid pooling; in response to a grid containing the previous scenic spot, the position information of the previous scenic spot of the grid is uniformly pooled to obtain the grid pooled The position information of the former scenic spot of the grid, and the point cloud semantic feature of the previous scenic spot of the grid is maximized pooling to obtain the point cloud semantic feature after the grid is pooled.
  • the second processing module 603 is configured to perform a pooling operation of the location information of the front scenic spot and the semantic feature of the point cloud for each initial 3D frame, to obtain each initial 3D frame after pooling.
  • the part position information and point cloud semantic features of each initial 3D frame after the pooling are merged, and each initial 3D frame is revised and/or based on the merged characteristics. Or determine the confidence of each initial 3D box.
  • the second processing module 603 is configured to modify each initial 3D frame and/or determine the confidence of each initial 3D frame according to the combined features:
  • a sparse convolution operation is performed to obtain a feature map after the sparse convolution operation; according to the feature map after the sparse convolution operation, each initial 3D box is corrected and/or Determine the confidence of each initial 3D box;
  • a sparse convolution operation is performed to obtain a feature map after the sparse convolution operation; the feature map after the sparse convolution operation is down-sampled, and according to the down-sampled feature map, Correct each initial 3D box and/or determine the confidence level of each initial 3D box.
  • the second processing module 603 is configured to perform a pooling operation on the feature map after the sparse convolution operation when down-sampling the feature map after the sparse convolution operation , To realize the down-sampling processing of the feature map after the sparse convolution operation.
  • the acquisition module 601 is configured to acquire 3D point cloud data, and perform 3D meshing processing on the 3D point cloud data to obtain a 3D grid; in the non-empty grid of the 3D grid Extract the semantic feature of the point cloud corresponding to the 3D point cloud data.
  • the first processing module 602 is configured to perform foreground data on the point cloud data according to the point cloud semantic feature in the case of determining the location information of the front scenic spot based on the point cloud semantic feature And background segmentation to determine the former scenic spot;
  • the former scenic spot is the point cloud data belonging to the foreground in the point cloud data;
  • the determined former scenic spot is processed by the neural network used to predict the location information of the former scenic spot , Obtain the position information of the former scenic spot; wherein, the neural network is trained by using a training data set that includes the annotation information of the 3D box, and the annotation information of the 3D box includes at least the front of the point cloud data of the training data set Location information of scenic spots.
  • the functional modules in this embodiment may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit can be realized in the form of hardware or software function module.
  • the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer readable storage medium.
  • the technical solution of this embodiment is essentially or It is said that the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product.
  • the computer software product is stored in a storage medium and includes several instructions to enable a computer device (which can A personal computer, server, or network device, etc.) or a processor (processor) executes all or part of the steps of the method described in this embodiment.
  • the aforementioned storage media include: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes.
  • the computer program instructions corresponding to any target detection method or intelligent driving method in this embodiment can be stored on storage media such as optical disks, hard disks, and USB flash drives.
  • storage media such as optical disks, hard disks, and USB flash drives.
  • FIG. 6 shows an electronic device 70 provided by an embodiment of the present disclosure, which may include a memory 71 and a processor 72; wherein,
  • the memory 71 is configured to store computer programs and data
  • the processor 72 is configured to execute a computer program stored in the memory to implement any target detection method or intelligent driving method in the foregoing embodiments.
  • the aforementioned memory 71 may be a volatile memory (volatile memory), such as RAM; or a non-volatile memory (non-volatile memory), such as ROM, flash memory, or hard disk (Hard Disk). Drive, HDD) or solid-state drive (Solid-State Drive, SSD); or a combination of the foregoing types of memories, and provides instructions and data to the processor 72.
  • volatile memory volatile memory
  • non-volatile memory such as ROM, flash memory, or hard disk (Hard Disk). Drive, HDD) or solid-state drive (Solid-State Drive, SSD); or a combination of the foregoing types of memories, and provides instructions and data to the processor 72.
  • the aforementioned processor 72 may be at least one of ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, and microprocessor. It can be understood that, for different devices, the electronic devices used to implement the above-mentioned processor functions may also be other, which is not specifically limited in the embodiment of the present disclosure.
  • the embodiment of the present disclosure also proposes a computer storage medium on which a computer program is stored, and when the computer program is executed by a processor, any one of the foregoing target detection methods is implemented.
  • the embodiments of the present disclosure also provide a computer program product, the computer program product includes computer executable instructions, and after the computer executable instructions are executed, any target detection method provided in the embodiments of the present disclosure can be used.
  • the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
  • the functions or modules contained in the device provided in the embodiments of the present disclosure can be used to execute the methods described in the above method embodiments.
  • brevity, here No longer refer to the description of the above method embodiments.
  • the technical solution of the present disclosure essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to make a terminal (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present disclosure.
  • a terminal which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Computer Graphics (AREA)
  • Geometry (AREA)
  • Image Analysis (AREA)

Abstract

一种目标检测方法、装置、电子设备和计算机存储介质,该方法包括:获取3D点云数据(101);根据3D点云数据,确定3D点云数据对应的点云语义特征(102);基于点云语义特征,确定前景点的部位位置信息(103);基于点云数据提取出至少一个初始3D框(104);根据点云数据对应的点云语义特征、前景点的部位位置信息和至少一个初始3D框,确定目标的3D检测框(105)。如此,直接从3D点云数据中获得点云语义特征,以确定前景点的部位位置信息,进而根据点云语义特征、前景点的部位位置信息和至少一个3D框确定出目标的3D检测框,无需将3D点云数据投影到俯视图,利用2D检测技术得到俯视图的框,避免了量化时损失点云的原始信息。

Description

目标检测方法和装置及智能驾驶方法、设备和存储介质
相关申请的交叉引用
本申请要求在2019年6月18日提交中国专利局、申请号为201910523342.4、申请名称为“目标检测方法和装置及智能驾驶方法、设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及目标检测技术,尤其涉及一种目标检测方法、智能驾驶方法、目标检测装置、电子设备和计算机存储介质。
背景技术
在自动驾驶或机器人等领域,一个核心问题是如何感知周围物体;在相关技术中,可以将采集的点云数据投影到俯视图,利用二维(2D)检测技术得到俯视图的框;这样,会在量化时损失了点云的原始信息,而从2D图像上检测时很难检测到被遮挡的物体。
发明内容
本公开实施例期望提供目标检测的技术方案。
本公开实施例提供了一种目标检测方法,所述方法包括:
获取三维(3D)点云数据;
根据所述3D点云数据,确定所述3D点云数据对应的点云语义特征;
基于所述点云语义特征,确定前景点的部位位置信息;所述前景点表示所述点云数据中属于目标的点云数据,所述前景点的部位位置信息用于表征所述前景点在目标内的相对位置;
基于所述点云数据提取出至少一个初始3D框;
根据所述点云数据对应的点云语义特征、所述前景点的部位位置信息和所述至少一个初始3D框,确定目标的3D检测框,所述检测框内的区域中存在目标。
本公开实施例还提出了一种智能驾驶方法,应用于智能驾驶设备中,所述智能驾驶方法包括:
根据上述任意一种目标检测方法得出所述智能驾驶设备周围的所述目标的3D检测框;
根据所述目标的3D检测框,生成驾驶策略。
本公开实施例还提出了一种目标检测装置,所述装置包括获取模块、第一处理模块和第二处理模块,其中,
获取模块,配置为获取3D点云数据;根据所述3D点云数据,确定所述3D点云数据对应的点云语义特征;
第一处理模块,配置为基于所述点云语义特征,确定前景点的部位位置信息;所述前景点表示所述点云数据中属于目标的点云数据,所述前景点的部位位置信息用于表征所述前景点在目标内的相对位置;基于所述点云数据提取出至少一个初始3D框;
第二处理模块,配置为根据所述点云数据对应的点云语义特征、所述前景点的部位位置信息和所述至少一个初始3D框,确定目标的3D检测框,所述检测框内的区域中存在目标。
本公开实施例还提出了一种电子设备,包括处理器和配置为存储能够在处理器上运行的计算机程序的存储器;其中,
所述处理器配置为运行所述计算机程序时,执行上述任意一种目标检测方法。
本公开实施例还提出了一种计算机存储介质,其上存储有计算机程序,该计算机程 序被处理器执行时实现上述任意一种目标检测方法。
本公开实施例还提供一种计算机程序产品,所述计算机程序产品包括计算机可执行指令,该计算机可执行指令被执行后,能够本公开实施例提供的任一种目标检测方法。
本公开实施例提出的目标检测方法、智能驾驶方法、目标检测装置、电子设备和计算机存储介质中,获取3D点云数据;根据所述3D点云数据,确定所述3D点云数据对应的点云语义特征;基于所述点云语义特征,确定前景点的部位位置信息;所述前景点表示所述点云数据中属于目标的点云数据,所述前景点的部位位置信息用于表征所述前景点在目标内的相对位置;基于所述点云数据提取出至少一个初始3D框;根据所述点云数据对应的点云语义特征、所述前景点的部位位置信息和所述至少一个初始3D框,确定目标的3D检测框,所述检测框内的区域中存在目标。因此,本公开实施例提供的目标检测方法可以直接从3D点云数据中获得点云语义特征,以确定前景点的部位位置信息,进而根据点云语义特征、前景点的部位位置信息和至少一个3D框确定出目标的3D检测框,而无需将3D点云数据投影到俯视图,利用2D检测技术得到俯视图的框,避免了量化时损失点云的原始信息,也避免了投影到俯视图上时导致的被遮挡物体难以检测的缺陷。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。
图1为本公开实施例的目标检测方法的流程图;
图2为本公开应用实施例中3D部位感知和聚合神经网络的综合框架示意图;
图3为本公开应用实施例中稀疏上采样和特征修正的模块框图;
图4为本公开应用实施例中针对不同难度级别的KITTI数据集的VAL分割集得出的目标部位位置的详细误差统计图;
图5为本公开实施例的目标检测装置的组成结构示意图;
图6为本公开实施例的电子设备的硬件结构示意图。
具体实施方式
以下结合附图及实施例,对本公开进行进一步详细说明。应当理解,此处所提供的实施例仅仅用以解释本公开,并不用于限定本公开。另外,以下所提供的实施例是用于实施本公开的部分实施例,而非提供实施本公开的全部实施例,在不冲突的情况下,本公开实施例记载的技术方案可以任意组合的方式实施。
需要说明的是,在本公开实施例中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的方法或者装置不仅包括所明确记载的要素,而且还包括没有明确列出的其他要素,或者是还包括为实施方法或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个......”限定的要素,并不排除在包括该要素的方法或者装置中还存在另外的相关要素(例如方法中的步骤或者装置中的单元,例如的单元可以是部分电路、部分处理器、部分程序或软件等等)。
例如,本公开实施例提供的目标检测方法或智能驾驶方法包含了一系列的步骤,但是本公开实施例提供的目标检测方法或智能驾驶方法不限于所记载的步骤,同样地,本公开实施例提供的目标检测装置包括了一系列模块,但是本公开实施例提供的装置不限于包括所明确记载的模块,还可以包括为获取相关信息、或基于信息进行处理时所需要设置的模块。
本文中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关 系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中术语“至少一种”表示多种中的任意一种或多种中的至少两种的任意组合,例如,包括A、B、C中的至少一种,可以表示包括从A、B和C构成的集合中选择的任意一个或多个元素。
本公开实施例可以应用于终端和服务器组成的计算机系统中,并可以与众多其它通用或专用计算系统环境或配置一起操作。这里,终端可以是瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统,等等,服务器可以是服务器计算机系统小型计算机系统﹑大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
终端、服务器等电子设备可以在由计算机系统执行的计算机系统可执行指令(诸如程序模块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
在相关技术中,随着自动驾驶和机器人技术的飞速发展,基于点云数据的3D目标检测技术,越来越受到人们的关注,其中,点云数据可以基于雷达传感器获取;尽管从图像中进行2D目标检测已经取得了重大成就,但是,直接将上述2D目标检测方法应用于基于点云的三维(3D)目标检测,仍然存在一些困难,这主要是因为基于激光雷达(LiDAR)传感器产生的点云数据稀疏不规则,如何从不规则点中提取识别点云语义特征,并根据提取到的特征进行前景和背景的分割,以进行3D检测框的确定,仍然是一个具有挑战性的问题。
而在自动驾驶和机器人等领域,3D目标检测是一个非常重要的研究方向;例如,通过3D目标检测,可以确定出周围车辆和行人在3D空间的具体位置、形状太小、移动方向等等重要信息,从而帮助自动驾驶车辆或者机器人进行动作的决策。
目前相关的3D目标检测方案中,往往将点云投影到俯视图上,利用2D检测技术去得到俯视图的框,或者直接利用2D图像先出候选框,再在特定区域的点云上去回归对应的3D框。这里,利用2D检测技术得到的俯视图的框为2D框,2D框表示用于标识目标的点云数据的二维平面的框,2D框可以是长方形或其他二维平面形状的框。
可以看出,投影到俯视图上在量化时损失了点云的原始信息,而从2D图像上检测时很难检测到被遮挡的目标。另外,在采用上述方案检测3D框时,并没有单独的去考虑目标的部位信息,如对于汽车来说,车头、车尾、车轮等部位的位置信息有助于对目标的3D检测。
针对上述技术问题,在本公开的一些实施例中,提出了一种目标检测方法,本公开实施例可以在自动驾驶、机器人导航等场景实施。
图1为本公开实施例的目标检测方法的流程图,如图1所示,该流程可以包括:
步骤101:获取3D点云数据。
在实际应用中,可以基于雷达传感器等采集点云数据。
步骤102:根据3D点云数据,确定3D点云数据对应的点云语义特征。
针对点云数据,为了分割前景和背景并预测前景点的3D目标部位位置信息,需要从点云数据中学习区别性的逐点特征;对于得到点云数据对应的点云语义特征的实现方式,示例性地,可以将整个点云进行3D网格化处理,得到3D网格;在3D网格的非空网格中提取出所述3D点云数据对应的点云语义特征;3D点云数据对应的点云语义特征可以表示3D点云数据的坐标信息等。
在实际实施时,可以将每个网格的中心当做一个新的点,则得到一个近似等价于初始点云的网格化点云;上述网格化点云通常是稀疏的,在得到上述网格化点云之后,可以基于稀疏卷积操作提取上述网格化点云的逐点特征,这里的网格化点云的逐点特征是网格化后点云的每个点的语义特征,可以作为上述点云数据对应的点云语义特征;也就是说,可以将整个3D空间作为标准化网格进行网格化处理,然后基于稀疏卷积从非空网格中提取点云语义特征。
在3D目标检测中,针对点云数据,可以通过前景和背景的分割,得到前景点和背景点;前景点表示属于目标的点云数据,背景点表示不属于目标的点云数据;目标可以是车辆、人体等需要识别出的物体;例如,前景和背景的分割方法包括但不限于基于阈值的分割方法、基于区域的分割方法、基于边缘的分割方法以及基于特定理论的分割方法等。
在上述3D网格中的非空网格表示包含点云数据的网格,上述3D网格中的空网格表示不包含点云数据的网格。
对于将整个点云数据进行3D稀疏网格化的实现方式,在一个具体的示例中,整个3D空间的尺寸为70m*80m*4m,每个网格的尺寸为5cm*5cm*10cm;对于KITTI数据集上的每个3D场景,一般有16000个非空网格。
步骤103:基于所述点云语义特征,确定前景点的部位位置信息;所述前景点表示所述点云数据中属于目标的点云数据,所述前景点的部位位置信息用于表征所述前景点在目标内的相对位置。
对于预测前景点的部位位置信息的实现方式,示例性地,可以根据上述点云语义特征针对上述点云数据进行前景和背景的分割,以确定出前景点;前景点为所述点云数据中的属于目标的点云数据;
利用用于预测前景点的部位位置信息的神经网络对确定出的前景点进行处理,得到前景点的部位位置信息;
其中,上述神经网络采用包括有3D框的标注信息的训练数据集训练得到,3D框的标注信息至少包括所述训练数据集的点云数据的前景点的部位位置信息。
本公开实施例中,并不对前景和背景的分割方法进行限制,例如,可以采用焦点损失(focal loss)方法等来实现前景和背景的分割。
在实际应用中,训练数据集可以是预先获取的数据集,例如,针对需要进行目标检测的场景,可以预先利用雷达传感器等获取点云数据,然后,针对点云数据进行前景点分割并划分出3D框,并在3D框中添加标注信息,以得到训练数据集,该标注信息可以表示前景点在3D框内的部位位置信息。这里,训练数据集中3D框可以记为真值(ground-truth)框。
这里,3D框表示一个用于标识目标的点云数据的立体框,3D框可以是长方体或其他形状的立体框。
示例性地,在得到训练数据集后,可以基于训练数据集的3D框的标注信息,并利用二元交叉熵损失作为部位回归损失,来预测前景点的部位位置信息。可选地,ground-truth框内或外的所有点都作为正负样本进行训练。
在实际应用中,上述3D框的标注信息包括准确的部位位置信息,具有信息丰富的特点,并且可以免费获得;也就是说,本公开实施例的技术方案,可以基于上述3D候选框的标注信息推断出的免费监督信息,预测前景点的目标内部位位置信息。
可以看出,本公开实施例中,可以基于稀疏卷积操作直接提取原始点云数据的信息,将其用于前景和背景的分割并预测每个前景点的部位位置信息(即在目标3D框中的位置信息),进而可以量化表征每个点属于目标哪个部位的信息。这避免了相关技术中将 点云投影到俯视图时引起的量化损失以及2D图像检测的遮挡问题,使得点云语义特征提取过程可以更自然且高效。
步骤104:基于点云数据提取出至少一个初始3D框。
对于基于点云数据提取出至少一个初始3D框的实现方式,示例性地,可以利用区域候选网络(RegionProposal Network,RPN)提取出至少一个3D候选框,每个3D候选框为一个初始3D框。需要说明的是,以上仅仅是对提取初始3D框的方式进行了举例说明,本公开实施例并不局限于此。
本公开实施例中,可以通过聚合初始3D框的各个点的部位位置信息,来帮助最终的3D框的生成;也就是说,预测的每个前景点的部位位置信息可以帮助最终3D框生成。
步骤105:根据点云数据对应的点云语义特征、前景点的部位位置信息和上述至少一个初始3D框,确定目标的3D检测框,所述检测框内的区域中存在目标。
对于本步骤的实现方式,示例性地,可以针对每个初始3D框,进行前景点的部位位置信息和点云语义特征的池化操作,得到池化后的每个初始3D框的部位位置信息和点云语义特征;根据池化后的每个初始3D框的部位位置信息和点云语义特征,对每个初始3D框进行修正和/或确定每个初始3D框的置信度,以确定所述目标的3D检测框。
这里,在对每个初始3D框进行修正后,可以得到最终的3D框,用于实现对目标的检测;而初始3D框的置信度可以用于表示初始3D框内前景点的部位位置信息的置信度,进而,确定初始3D框的置信度有利于对初始3D框进行修正,以得到最终的3D检测框。
这里,目标的3D检测框可以表示用于目标检测的3D框,示例性地,在确定出目标的3D检测框后,可以根据目标的3D检测框确定出目标在图像中的信息,例如可以根据目标的3D检测框确定出目标在图像中位置、尺寸等信息。
本公开实施例中,对于每个初始3D框中前景点的部位位置信息和点云语义特征,需要通过聚合同一初始3D框中所有点的部位位置信息来进行3D框的置信度打分和/或修正。
在第一个示例中,可以直接获取并聚合初始3D框内的所有点的特征,用于进行3D框的置信度打分和修正;也就是说,可以直接对初始3D框的部位位置信息和点云语义特征进行池化处理,进而实现对初始3D框的置信度打分和/或修正;由于点云的稀疏性,上述第一个示例的方法,并不能从池化后的特征恢复初始3D框的形状,损失了初始3D框的信息。
在第二个示例中,可以将上述每个初始3D框均匀地划分为多个网格,针对每个网格进行前景点的部位位置信息和点云语义特征的池化操作,得到池化后的每个初始3D框的部位位置信息和点云语义特征。
可以看出,对于不同大小的初始3D框,将产生固定分辨率的3D网格化特征。可选地,可以在3D空间上根据设定的分辨率对每个初始3D框进行均匀的网格化处理,设定的分辨率记为池化分辨率。
可选地,当上述多个网格中任意一个网格不包含前景点时,任意一个网格为空网格,此时,可以将所述任意一个网格的部位位置信息标记为空,得到上述网格池化后的前景点的部位位置信息,并将所述网格的点云语义特征设置为零,得到所述网格池化后的点云语义特征。
当上述多个网格中任意一个网格包含前景点时,可以将所述网格的前景点的部位位置信息进行均匀池化处理,得到上述网格池化后的前景点的部位位置信息,并将所述网格的前景点的点云语义特征进行最大化池化处理,得到所述网格池化后的点云语义特 征。这里,均匀池化可以是指:取邻域内前景点的部位位置信息的平均值作为该网格池化后的前景点的部位位置信息;最大化池化可以是指:取邻域内前景点的部位位置信息的最大值作为该网格池化后的前景点的部位位置信息。
可以看出,对前景点的部位位置信息进行均匀池化处理后,池化后的部位位置信息可以近似表征每个网格的中心位置信息。
本公开实施例中,在得到上述网格池化后的前景点的部位位置信息和上述网格池化后的点云语义特征后,可以得出池化后的每个初始3D框的部位位置信息和点云语义特征;这里,池化后的每个初始3D框的部位位置信息包括对应初始3D框的各个网格池化后的前景点的部位位置信息,池化后的每个初始3D框的点云语义特征包括对应初始3D框的各个网格池化后的点云语义特征。
在对每个网格进行前景点的部位位置信息和点云语义特征的池化操作时,还对空网格进行了相应处理,因而,这样得出的池化后的每个初始3D框的部位位置信息和点云语义特征可以更好地编码3D初始框的几何信息,进而,可以认为本公开实施例提出了对初始3D框敏感的池化操作。
本公开实施例提出的对初始3D框敏感的池化操作,可以从不同大小的初始3D框得到相同分辨率的池化后特征,并且可以从池化后的特征恢复3D初始框的形状;另外,池化后的特征可以便于进行初始3D框内部位位置信息的整合,进而,有利于初始3D框的置信度打分和初始3D框的修正。
对于根据池化后的每个初始3D框的部位位置信息和点云语义特征,对每个初始3D框进行修正和/或确定每个初始3D框的置信度的实现方式,示例性地,可以将上述池化后的每个初始3D框的部位位置信息和点云语义特征进行合并,根据合并后的特征,对每个初始3D框进行修正和/或确定每个初始3D框的置信度。
本公开实施例中,可以将池化后的每个初始3D框的部位位置信息和点云语义特征转换为相同的特征维度,然后,将相同的特征维度的部位位置信息和点云语义特征连接,实现相同的特征维度的部位位置信息和点云语义特征的合并。
在实际应用中,池化后的每个初始3D框的部位位置信息和点云语义特征均可以通过特征映射(feature map)表示,这样,可以将池化后得到的特征映射转换至的相同的特征维度,然后,将这两个特征映射进行合并。
本公开实施例中,合并后的特征可以是m*n*k的矩阵,m、n和k均为正整数;合并后的特征可以用于后续的3D框内的部位位置信息的整合,进而,可以基于初始3D框内部位位置信息整合,进行3D框内的部位位置信息的置信度预测与3D框的修正。
相关技术中,通常在得到初始3D框的点云数据后,直接使用PointNet进行点云的信息整合,由于点云的稀疏性,该操作损失了初始3D框的信息,不利于3D部位位置信息的整合。
而在本公开实施例中,对于根据合并后的特征,对每个初始3D框进行修正和/或确定每个初始3D框的置信度的过程,示例性地,可以采用如下几种方式实现。
第一种方式
可以将所述合并后的特征矢量化为特征向量,根据所述特征向量,对每个初始3D框进行修正和/或确定每个初始3D框的置信度。在具体实现时,在将合并后的特征矢量化为特征向量后,然后再加上几个全连接层(Fully-Connected layers,FC layers),以对每个初始3D框进行修正和/或确定每个初始3D框的置信度;这里,全连接层属于神经网络中的一种基础单元,可以整合卷积层或者池化层中具有类别区分性的局部信息。
第二种方式
可以针对合并后的特征,通过进行稀疏卷积操作,得到稀疏卷积操作后的特征映射; 根据所述稀疏卷积操作后的特征映射,对每个初始3D框进行修正和/或确定每个初始3D框的置信度。可选地,在得到稀疏卷积操作后的特征映射,可以再通过卷积操作,逐步将局部尺度到全局尺度的特征进行聚合,以实现对每个初始3D框进行修正和/或确定每个初始3D框的置信度。在一个具体的示例中,在池化分辨率较低时,可以采用第二种方式来对每个初始3D框进行修正和/或确定每个初始3D框的置信度。
第三种方式
针对合并后的特征,通过进行稀疏卷积操作,得到稀疏卷积操作后的特征映射;对所述稀疏卷积操作后的特征映射进行降采样,根据降采样后的特征映射,对每个初始3D框进行修正和/或确定每个初始3D框的置信度。这里通过对稀疏卷积操作后的特征映射进行降采样处理,可以更有效地对每个初始3D框进行修正和/或确定每个初始3D框的置信度,并且可以节省计算资源。
可选地,在得到稀疏卷积操作后的特征映射后,可以通过池化操作,对稀疏卷积操作后的特征映射进行降采样;例如,这里的针对稀疏卷积操作后的特征映射的池化操作为稀疏最大化池化(sparse max-pooling)操作。
可选地,通过对稀疏卷积操作后的特征映射进行降采样,得到一个特征向量,以用于部位位置信息的整合。
也就是说,本公开实施例中,可以在池化后的每个初始3D框的部位位置信息和点云语义特征的基础上,将网格化后的特征逐渐降采样成一个编码后的特征向量,用于3D部位位置信息的整合;然后,可以利用这个编码后的特征向量,对每个初始3D框进行修正和/或确定每个初始3D框的置信度。
综上,本公开实施例提出了基于稀疏卷积操作的3D部位位置信息的整合操作,可以逐层编码每个初始3D框内池化后特征的3D部位位置信息;该操作与对初始3D框敏感的池化操作结合,可以更好地聚合3D部位位置信息,用于最终的初始3D框的置信度预测和/或初始3D框的修正,以得出目标的3D检测框。
在实际应用中,步骤101至步骤103可以基于电子设备的处理器实现,上述处理器可以为特定用途集成电路(Application Specific Integrated Circuit,ASIC)、数字信号处理器(Digital Signal Processor,DSP)、数字信号处理装置(Digital Signal Processing Device,DSPD)、可编程逻辑装置(Programmable Logic Device,PLD)、现场可编程门阵列(Field Programmable Gate Array,FPGA)、中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器中的至少一种。可以理解地,对于不同的电子设备,用于实现上述处理器功能的电子器件还可以为其它,本公开实施例不作具体限定。
可以看出,本公开实施例提供的目标检测方法可以直接从3D点云数据中获得点云语义特征,以确定前景点的部位位置信息,进而根据点云语义特征、前景点的部位位置信息和至少一个3D框确定出目标的3D检测框,而无需将3D点云数据投影到俯视图,利用2D检测技术得到俯视图的框,避免了量化时损失点云的原始信息,也避免了投影到俯视图上时导致的被遮挡物体难以检测的缺陷。
基于前述记载的目标检测方法,本公开实施例还提出了一种智能驾驶方法,应用于智能驾驶设备中,该智能驾驶方法包括:根据上述任意一种目标检测方法得出所述智能驾驶设备周围的所述目标的3D检测框;根据所述目标的3D检测框,生成驾驶策略。
在一个示例中,智能驾驶设备包括自动驾驶的车辆、机器人、导盲设备等,此时,智能驾驶设备可以根据生成的驾驶策略对其进行驾驶控制;在另一个示例中,智能驾驶设备包括安装辅助驾驶系统的车辆,此时,生成的驾驶策略可以用于指导驾驶员来进行车辆的驾驶控制。
下面通过一个具体的应用实施例对本公开进行进一步说明。
在该应用实施例的方案中,提出了从原始点云进行目标检测的3D部位感知和聚合神经网络(可以命名为Part-A 2网络),该网络的框架是一种新的基于点云的三维目标检测的两阶段框架,可以由如下两个阶段组成,其中,第一个阶段为部位感知阶段,第二个阶段为部位聚合阶段。
首先,在部位感知阶段,可以根据3D框的标注信息推断出免费的监督信息,同时预测初始3D框和准确的部位位置(intra-object part locations)信息;然后,可以对相同框内前景点的部位位置信息进行聚合,从而实现对3D框特征的编码有效表示。在部位聚合阶段,考虑通过整合池化后的部位位置信息的空间关系,用于对3D框重新评分(置信度打分)和修正位置;在KITTI数据集上进行了大量实验,证明预测的前景点的部位位置信息,有利于3D目标检测,并且,上述基于3D部位感知和聚合神经网络的目标检测方法,优于相关技术中通过将点云作为输入馈送的目标检测方法。
在本公开的一些实施例中,不同于从鸟瞰图或2D图像中进行目标检测的方案,提出了通过对前景点进行分割,来直接从原始点云生成初始3D框(即3D候选框)的方案,其中,分割标签直接根据训练数据集中3D框的标注信息得出;然而3D框的标注信息不仅提供了分割掩模,而且还提供了3D框内所有点的精确框内部位位置。这与2D图像中的框标注信息完全不同,因为2D图像中的部分对象可能被遮挡;使用二维ground-truth框进行目标检测时,会为目标内的每一个像素产生不准确和带有噪声的框内部位位置;相对地,上述3D框内部位位置准确且信息丰富,并且可以免费获得,但在3D目标检测中从未被使用过。
基于这个重要发现,在一些实施例中提出了上述Part-A 2网络;具体地,在首先进行的部位感知阶段,该网络通过学习,估计所有前景点的目标部位位置信息,其中,部位位置的标注信息和分割掩模可以直接从人工标注的真实信息生成,这里,人工标注的真实信息可以记为Ground-truth,例如,人工标注的真实信息可以是人工标注的三维框,在实际实施时,可以通过将整个三维空间划分为小网格,并采用基于稀疏卷积的三维UNET-like神经网络(U型网络结构)来学习点特征;可以在U型网络结构添加一个RPN头部,以生成初始的3D候选框,进而,可以对这些部位进行聚合,以便进入部位聚合阶段。
部位聚合阶段的动机是,给定一组3D候选框中的点,上述Part-A 2网络应能够评估该候选框的质量,并通过学习所有这些点的预测的目标部位位置的空间关系来优化该候选框。因此,为了对同一3D框内的点进行分组,可以提出一种新颖的感知点云池化模块,可以记为RoI感知点云池化模块;RoI感知点云池化模块可以通过新的池化操作,消除在点云上进行区域池化时的模糊性;与相关技术中池化操作方案中在所有点云或非空体素上进行池化操作不同,RoI感知点云池化模块是在3D框中的所有网格(包括非空网格和空网格)进行池化操作,这是生成3D框评分和位置修正的有效表示的关键,因为空网格也对3D框信息进行编码。在池化操作后,上述网络可以使用稀疏卷积和池化操作聚合部位位置信息;实验结果表明,聚合部位特征能够显著提高候选框质量,在三维检测基准上达到了最先进的性能。
不同于上述通基于从多个传感器获取的数据进行3D目标检测,本公开应用实施例中,3D部位感知和聚合神经网络只使用点云数据作为输入,就可以获得与相关技术类似甚至更好的3D检测结果;进一步地,上述3D部位感知和聚合神经网络的框架中,进一步探索了3D框的标注信息提供的丰富信息,并学习预测精确的目标部位位置信息,以提高3D目标检测的性能;进一步地,本公开应用实施例提出了一个U型网络结构的主干网,可以利用稀疏卷积和反卷积提取识别点云特征,用于预测目标部位位置信息和三维目标检测。
图2为本公开应用实施例中3D部位感知和聚合神经网络的综合框架示意图,如图2所示,该3D部位感知和聚合神经网络的框架包括部位感知阶段和部位聚合阶段,其中,在部位感知阶段,通过将原始点云数据输入至新设计的U型网络结构的主干网,可以精确估计目标部位位置并生成3D候选框;在部位聚合阶段,进行了提出的基于RoI感知点云池化模块的池化操作,具体地,将每个3D候选框内部位信息进行分组,然后利用部位聚合网络来考虑各个部位之间的空间关系,以便对3D框进行评分和位置修正。
可以理解的是,由于三维空间中的对象是自然分离的,因此3D目标检测的ground-truth框自动为每个3D点提供精确的目标部部位位置和分割掩膜;这与2D目标检测非常不同,2D目标框可能由于遮挡仅包含目标的一部分,因此不能为每个2D像素提供准确的目标部位位置。
本公开实施例的目标监测方法可以应用于多种场景中,在第一个示例中,可以利用上述目标检测方法进行自动驾驶场景的3D目标监测,通过检测周围目标的位置、大小、移动方向等信息帮助自动驾驶决策;在第二个示例中,可以利用上述目标检测方法实现3D目标的跟踪,具体地,可以在每个时刻利用上述目标检测方法实现3D目标检测,检测结果可以作为3D目标跟踪的依据;在第三个示例中,可以利用上述目标检测方法进行3D框内点云的池化操作,具体地,可以将不同3D框的内稀疏点云池化为一个拥有固定分辨率的3D框的特征。
基于这一重要的发现,本公开应用实施例中提出了上述Part-A 2网络,用于从点云进行3D目标检测。具体来说,我们引入3D部位位置标签和分割标签作为额外的监督信息,以利于3D候选框的生成;在部位聚合阶段,对每个3D候选框内的预测的3D目标部位位置信息进行聚合,以对该候选框进行评分并修正位置。
下面具体说明本公开应用实施例的流程。
首先可以学习估计3D点的目标部位位置信息。具体地说,如图2所示,本公开应用实施例设计了一个U型网络结构,可以通过在获得的稀疏网格上进行稀疏卷积和稀疏反卷积,来学习前景点的逐点特征表示;图2中,可以对点云数据执行3次步长为2稀疏卷积操作,如此可以将点云数据的空间分辨率通过降采样降低至初始空间分辨率的1/8,每次稀疏卷积操作都有几个子流形稀疏卷积;这里,稀疏卷积操作的步长可以根据点云数据需要达到的空间分辨率进行确定,例如,点云数据需要达到的空间分辨率越低,则稀疏卷积操作的步长需要设置得越长;在对点云数据执行3次稀疏卷积操作后,对3次稀疏卷积操作后得到的特征执行稀疏上采样和特征修正;本公开实施例中,基于稀疏操作的上采样块(用于执行稀疏上采样操作),可以用于修正融合特征和并节省计算资源。
稀疏上采样和特征修正可以基于稀疏上采样和特征修正模块实现,图3为本公开应用实施例中稀疏上采样和特征修正的模块框图,该模块应用于基于稀疏卷积的U型网络结构主干网的解码器中;参照图3,通过稀疏卷积对横向特征和底部特征首先进行融合,然后,通过稀疏反卷积对融合后的特征进行特征上采样,图3中,稀疏卷积3×3×3表示卷积核大小为3×3×3的稀疏卷积,通道连接(contcat)表示特征向量在通道方向上的连接,通道缩减(channel reduction)表示特征向量在通道方向上的缩减,
Figure PCTCN2019121774-appb-000001
表示按照特征向量在通道方向进行相加;可以看出,参照图3,可以针对横向特征和底部特征,进行了稀疏卷积、通道连接、通道缩减、稀疏反卷积等操作,实现了对横向特征和底部特征的特征修正。
参照图2,在对3次稀疏卷积操作后得到的特征执行稀疏上采样和特征修正后,还可以针对执行稀疏上采样和特征修正后的特征,进行语义分割和目标部位位置预测。
在利用神经网络识别和检测目标时,目标内部位位置信息是必不可少的;例如,车 辆的侧面也是一个垂直于地面的平面,两个车轮总是靠近地面。通过学习估计每个点的前景分割掩模和目标部位位置,神经网络发展了推断物体的形状和姿势的能力,这有利于3D目标检测。
在具体实施时,可以在上述稀疏卷积的U型网络结构主干网的基础上,附加两个分支,分别用于分割前景点和预测它们的物体部位位置;在预测前景点的物体部位位置时,可以基于训练数据集的3D框的标注信息进行预测,在训练数据集中,ground-truth框内或外的所有点都作为正负样本进行训练。
3D ground-truth框自动提供3D部位位置标签;前景点的部位标签(p x,p y,p z)是已知参数,这里,可以将(p x,p y,p z)转换为部位位置标签(O x,O y,O z),以表示其在相应目标中的相对位置;3D框由(C x,C y,C z,h,w,l,θ)表示,其中,(C x,C y,C z)表示3D框的中心位置,(h,w,l)表示3D框对应的鸟瞰图的尺寸大小,θ表示3D框在对应的的鸟瞰图中的方向,即3D框在对应的的鸟瞰图中的朝向与鸟瞰图的X轴方向的夹角。部位位置标签(O x,O y,O z)可以通过式(1)计算得出。
Figure PCTCN2019121774-appb-000002
Figure PCTCN2019121774-appb-000003
其中,O x,O y,O z∈[0,1],目标中心的部位位置为(0.5,0.5,0.5);这里,式(1)涉及的坐标都以KITTI的激光雷达坐标系表示,其中,z方向垂直于地面,x和y方向在水平面上。
这里,可以利用二元交叉熵损失作为部位回归损失来学习前景点部位沿3维的位置,其表达式如下:
L part(P u)=-(O ulog(P u)+(1-O u)log(1-P u)),u∈{x,y,z}    (2)
其中,P u表示在S形层(Sigmoid Layer)之后的预测的目标内部位位置,L part(P u)表示预测的3D点的部位位置信息,这里,可以只对前景点进行部位位置预测。
本公开应用实施例中,还可以生成3D候选框。具体地说,为了聚合3D目标检测的预测的目标内部位位置,需要生成3D候选框,将来自同一目标的估计前景点的目标部位信息聚合起来;在实际实施时,如图2所示,在稀疏卷积编码器生成的特征映射(即对点云数据通过3次稀疏卷积操作后得到的特征映射)附加相同的RPN头;为了生成3D候选框时,特征映射被将采样8倍,并且聚合相同鸟瞰位置的不同高度处的特征,以生成用于3D候选框生成的2D鸟瞰特征映射。
参照图2,针对提取出的3D候选框,可以在部位聚合阶段执行池化操作,对于池化操作的实现方式,在一些实施例中,提出了点云区域池化操作,可以将3D候选框中的逐点特征进行池化操作,然后,基于池化操作后的特征映射,对3D候选框进行修正;但是,这种池化操作会丢失3D候选框信息,因为3D候选框中的点并非规则分布,并且存在从池化后点中恢复3D框的模糊性。
图4为本公开应用实施例中点云池化操作的示意图,如图4所示,先前的点云池化操作表示上述记载的点云区域池化操作,圆圈表示池化后点,可以看出,如果采用上述记载的点云区域池化操作,则不同的3D候选框将会导致相同的池化后点,也就是说,上述记载的点云区域池化操作具有模糊性,导致无法使用先前的点云池化方法恢复初始3D候选框形状,这会对后续的候选框修正产生负面影响。
对于池化操作的实现方式,在另一些实施例中,提出了ROI感知点云池化操作,ROI感知点云池化操作的具体过程为:将所述每个3D候选框均匀地划分为多个网格, 当所述多个网格中任意一个网格不包含前景点时,所述任意一个网格为空网格,此时,可以将所述任意一个网格的部位位置信息标记为空,并将所述任意一个网格的点云语义特征设置为零;将所述每个网格的前景点的部位位置信息进行均匀池化处理,并对所述每个网格的前景点的点云语义特征进行最大化池化处理,得到池化后的每个3D候选框的部位位置信息和点云语义特征。
可以理解的是,结合图4,ROI感知点云池化操作可以通过保留空网格来对3D候选框的形状进行编码,而稀疏卷积可以有效地对候选框的形状(空网格)进行处理。
也就是说,对于RoI感知点云池化操作的具体实现方式,可以将3D候选框均匀地划分为具有固定空间形状(H*W*L)的规则网格,其中,H、W和L分别表示池化分辨率在每个维度的高度、宽度和长度超参数,并与3D候选框的大小无关。通过聚合(例如,最大化池化或均匀池化)每个网格内的点特征来计算每个网格的特征;可以看出,基于ROI感知点云池化操作,可以将不同的3D候选框规范化为相同的局部空间坐标,其中每个网格对3D候选框中相应固定位置的特征进行编码,这对3D候选框编码更有意义,并有利于后续的3D候选框评分和位置修正。
在得到池化后的3D候选框的部位位置信息和点云语义特征之后,还可以执行用于3D候选框修正的部位位置聚合。
具体地说,通过考虑一个3D候选框中所有3D点的预测的目标部位位置的空间分布,可以认为通过聚合部位位置来评价该3D候选框的质量是合理的;可以将部位位置的聚合的问题表示为优化问题,并通过拟合相应3D候选框中所有点的预测部位位置来直接求解3D边界框的参数。然而,这种数学方法对异常值和预测的部位偏移量的质量很敏感。
为了解决这一问题,在本公开应用实施例中,提出了一种基于学习的方法,可以可靠地聚合部位位置信息,以用于进行3D候选框评分(即置信度)和位置修正。对于每个3D候选框,我们分别在3D候选框的部位位置信息和点云语义特征应用提出的ROI感知点云池化操作,从而生成两个尺寸为(14*14*14*4)和(14*14*14*C)的特征映射,其中,预测的部位位置信息对应4维映射,其中,3个维度表示XYZ维度,用于表示部位位置,1个维度表示前景分割分数,C表示部位感知阶段得出的逐点特征的特征尺寸。
在池化操作之后,如图2所示,在部位聚合阶段,可以通过分层方式从预测的目标部位位置的空间分布中学习。具体来说,我们首先使用内核大小为3*3*3的稀疏卷积层将两个池化后特征映射(包括池化后的3D候选框的部位位置信息和点云语义特征)转换为相同的特征维度;然后,将这两个相同特征维度的特征映射连接起来;针对连接后的特征映射,可以使用四个内核大小为3*3*3的稀疏卷积层堆叠起来进行稀疏卷积操作,随着接收域的增加,可以逐渐聚合部位信息。在实际实施时,可以在池化后的特征映射转换为相同特征维度的特征映射之后,可以应用内核大小为2*2*2且步长为2*2*2的稀疏最大化池池化操作,以将特征映射的分辨率降采样到7*7*7,以节约计算资源和参数。在应用四个内核大小为3*3*3的稀疏卷积层堆叠起来进行稀疏卷积操作后,还可以将稀疏卷积操作得出的特征映射进行矢量化(对应图2中的FC),得到一个特征向量;在得到特征向量后,可以附加两个分支进行最后的3D候选框评分和3D候选框位置修正;示例性地,3D候选框评分表示3D候选框的置信度评分,3D候选框的置信度评分至少表示3D候选框内前景点的部位位置信息的评分。
与直接将池化的三维特征图矢量化为特征向量的方法相比,本公开应用实施例提出的部位聚合阶段的执行过程,可以有效地从局部到全局的尺度上聚合特征,从而可以学习预测部位位置的空间分布。通过使用稀疏卷积,它还节省了大量的计算资源和参数, 因为池化后的网格是非常稀疏的;而相关技术并不能忽略它(即不能采用稀疏卷积来进行部位位置聚合),这是因为,相关技术中,需要将每个网格编码为3D候选框中一个特定位置的特征。
可以理解的是,参照图2,在对3D候选框进行位置修正后,可以得到位置修正后的3D框,即,得到最终的3D框,可以用于实现3D目标检测。
本公开应用实施例中,可以将两个分支附加到从预测的部位信息聚合的矢量化特征向量。对于3D候选框评分(即置信度)分支,可以使用3D候选框与其对应的ground-truth框之间的3D交并比(Intersection Over Union,IOU)作为3D候选框质量评估的软标签,也可以根据公式(2)利用二元交叉熵损失,来学习到3D候选框评分。
对于3D候选框的生成和位置修正,我们可以采用回归目标方案,并使用平滑-L1(smooth-L1)损失对归一化框参数进行回归,具体实现过程如式(3)所示。
Figure PCTCN2019121774-appb-000004
其中,Δx、Δy和Δz分别表示3D框中心位置的偏移量,Δh、Δw和Δl分别表示3D框对应的鸟瞰图的尺寸大小偏移量,Δθ表示3D框对应的鸟瞰图的方向偏移量,d a表示标准化鸟瞰图中的中心偏移量,x a、y a和z a表示3D锚点/候选框的中心位置,h a、w a和l a表示3D锚点/候选框对应的鸟瞰图的尺寸大小,θ a表示3D锚点/候选框对应的鸟瞰图的方向;x g、y g和z g表示对应的ground-truth框的中心位置,h g、w g和l g表示该ground-truth框对应的鸟瞰图的尺寸大小,θ g表示该ground-truth框对应的鸟瞰图的方向。
在相关技术中对候选框的修正方法不同的是,本公开应用实施例中对于3D候选框的位置修正,可以直接根据3D候选框的参数回归相对偏移量或大小比率,因为上述ROI感知点云池化模块已经对3D候选框的全部共享信息进行编码,并将不同的3D候选框传输到相同的标准化空间坐标系。
可以看出,在具有相等损失权重1的部位感知阶段,存在三个损失,包括前景点分割的焦点损失、目标内部位位置的回归的二元交叉熵损失和3D候选框生成的平滑-L1损失;对于部位聚合阶段,也有两个损失,损失权重相同,包括IOU回归的二元交叉熵损失和位置修正的平滑L1损失。
综上,本公开应用实施例提出了一种新的3D目标检测方法,即利用上述Part-A 2网络,从点云检测三维目标;在部位感知阶段,通过使用来自3D框的位置标签来学习估计准确的目标部位位置;通过新的ROI感知点云池化模块对每个目标的预测的部位位置进行分组。因此,在部位聚合阶段可以考虑预测的目标内部位位置的空间关系,以对3D候选框进行评分并修正它们的位置。实验表明,该公开应用实施例的目标检测方法在具有挑战性的KITTI三维检测基准上达到了最先进的性能,证明了该方法的有效性。
本领域技术人员可以理解,在具体实施方式的上述方法中,各步骤的撰写顺序并不意味着严格的执行顺序而对实施过程构成任何限定,各步骤的具体执行顺序应当以其功能和可能的内在逻辑确定
在前述实施例提出的目标检测方法的基础上,本公开实施例提出了一种目标检测装置。
图5为本公开实施例的目标检测装置的组成结构示意图,如图5所示,所述装置位于电子设备中,所述装置包括:获取模块601、第一处理模块602和第二处理模块603, 其中,
获取模块601,配置为获取3D点云数据;根据所述3D点云数据,确定所述3D点云数据对应的点云语义特征;
第一处理模块602,配置为基于所述点云语义特征,确定前景点的部位位置信息;所述前景点表示所述点云数据中属于目标的点云数据,所述前景点的部位位置信息用于表征所述前景点在目标内的相对位置;基于所述点云数据提取出至少一个初始3D框;
第二处理模块603,配置为根据所述点云数据对应的点云语义特征、所述前景点的部位位置信息和所述至少一个初始3D框,确定目标的3D检测框,所述检测框内的区域中存在目标。
在一实施方式中,所述第二处理模块603,配置为针对每个初始3D框,进行前景点的部位位置信息和点云语义特征的池化操作,得到池化后的每个初始3D框的部位位置信息和点云语义特征;根据池化后的每个初始3D框的部位位置信息和点云语义特征,对每个初始3D框进行修正和/或确定每个初始3D框的置信度,以确定所述目标的3D检测框。
在一实施方式中,所述第二处理模块603,配置为将所述每个初始3D框均匀地划分为多个网格,针对每个网格进行前景点的部位位置信息和点云语义特征的池化操作,得到池化后的每个初始3D框的部位位置信息和点云语义特征;根据池化后的每个初始3D框的部位位置信息和点云语义特征,对每个初始3D框进行修正和/或确定每个初始3D框的置信度,以确定所述目标的3D检测框。
在一实施方式中,所述第二处理模块603在针对每个网格进行前景点的部位位置信息和点云语义特征的池化操作的情况下,配置为响应于一个网格中不包含前景点的情况,将所述网格的部位位置信息标记为空,得到所述网格池化后的前景点的部位位置信息,并将所述网格的点云语义特征设置为零,得到所述网格池化后的点云语义特征;响应于一个网格中包含前景点的情况,将所述网格的前景点的部位位置信息进行均匀池化处理,得到所述网格池化后的前景点的部位位置信息,并将所述网格的前景点的点云语义特征进行最大化池化处理,得到所述网格池化后的点云语义特征。
在一实施方式中,所述第二处理模块603,配置为针对每个初始3D框,进行前景点的部位位置信息和点云语义特征的池化操作,得到池化后的每个初始3D框的部位位置信息和点云语义特征;将所述池化后的每个初始3D框的部位位置信息和点云语义特征进行合并,根据合并后的特征,对每个初始3D框进行修正和/或确定每个初始3D框的置信度。
在一实施方式中,所述第二处理模块603在根据合并后的特征,对每个初始3D框进行修正和/或确定每个初始3D框的置信度的情况下,配置为:
将所述合并后的特征矢量化为特征向量,根据所述特征向量,对每个初始3D框进行修正和/或确定每个初始3D框的置信度;
或者,针对所述合并后的特征,通过进行稀疏卷积操作,得到稀疏卷积操作后的特征映射;根据所述稀疏卷积操作后的特征映射,对每个初始3D框进行修正和/或确定每个初始3D框的置信度;
或者,针对所述合并后的特征,通过进行稀疏卷积操作,得到稀疏卷积操作后的特征映射;对所述稀疏卷积操作后的特征映射进行降采样,根据降采样后的特征映射,对每个初始3D框进行修正和/或确定每个初始3D框的置信度。
在一实施方式中,所述第二处理模块603在对所述稀疏卷积操作后的特征映射进行降采样的情况下,配置为通过对所述稀疏卷积操作后的特征映射进行池化操作,实现对所述稀疏卷积操作后的特征映射降采样的处理。
在一实施方式中,所述获取模块601,配置为获取3D点云数据,将所述3D点云数据进行3D网格化处理,得到3D网格;在所述3D网格的非空网格中提取出所述3D点云数据对应的点云语义特征。
在一实施方式中,所述第一处理模块602在基于所述点云语义特征,确定前景点的部位位置信息的情况下,配置为根据所述点云语义特征针对所述点云数据进行前景和背景的分割,以确定出前景点;所述前景点为所述点云数据中的属于前景的点云数据;利用用于预测前景点的部位位置信息的神经网络对确定出的前景点进行处理,得到前景点的部位位置信息;其中,所述神经网络采用包括有3D框的标注信息的训练数据集训练得到,所述3D框的标注信息至少包括所述训练数据集的点云数据的前景点的部位位置信息。
另外,在本实施例中的各功能模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
所述集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中,基于这样的理解,本实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或processor(处理器)执行本实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
具体来讲,本实施例中的任意一种目标检测方法或智能驾驶方法对应的计算机程序指令可以被存储在光盘,硬盘,U盘等存储介质上,当存储介质中的与任意一种目标检测方法或智能驾驶方法对应的计算机程序指令被一电子设备读取或被执行时,实现前述实施例的任意一种目标检测方法或智能驾驶方法。
基于前述实施例相同的技术构思,参见图6,其示出了本公开实施例提供的一种电子设备70,可以包括:存储器71和处理器72;其中,
所述存储器71,配置为存储计算机程序和数据;
所述处理器72,配置为执行所述存储器中存储的计算机程序,以实现前述实施例的任意一种目标检测方法或智能驾驶方法。
在实际应用中,上述存储器71可以是易失性存储器(volatile memory),例如RAM;或者非易失性存储器(non-volatile memory),例如ROM,快闪存储器(flash memory),硬盘(Hard Disk Drive,HDD)或固态硬盘(Solid-State Drive,SSD);或者上述种类的存储器的组合,并向处理器72提供指令和数据。
上述处理器72可以为ASIC、DSP、DSPD、PLD、FPGA、CPU、控制器、微控制器、微处理器中的至少一种。可以理解地,对于不同的设备,用于实现上述处理器功能的电子器件还可以为其它,本公开实施例不作具体限定。
本公开实施例还提出了一种计算机存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述任意一种目标检测方法。
本公开实施例还提供一种计算机程序产品,所述计算机程序产品包括计算机可执行指令,该计算机可执行指令被执行后,能够本公开实施例提供的任一种目标检测方法。
在一些实施例中,本公开实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法,其具体实现可以参照上文方法实施例的描述,为了简洁,这里不再赘述
上文对各个实施例的描述倾向于强调各个实施例之间的不同之处,其相同或相似之处可以互相参考,为了简洁,本文不再赘述
本申请所提供的各方法实施例中所揭露的方法,在不冲突的情况下可以任意组合,得到新的方法实施例。
本申请所提供的各产品实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的产品实施例。
本申请所提供的各方法或设备实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的方法实施例或设备实施例。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本公开各个实施例所述的方法。
上面结合附图对本公开的实施例进行了描述,但是本公开并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本公开的启示下,在不脱离本公开宗旨和权利要求所保护的范围情况下,还可做出很多形式,这些均属于本公开的保护之内。

Claims (22)

  1. 一种目标检测方法,其中,所述方法包括:
    获取三维3D点云数据;
    根据所述3D点云数据,确定所述3D点云数据对应的点云语义特征;
    基于所述点云语义特征,确定前景点的部位位置信息;所述前景点表示所述点云数据中属于目标的点云数据,所述前景点的部位位置信息用于表征所述前景点在目标内的相对位置;
    基于所述点云数据提取出至少一个初始3D框;
    根据所述点云数据对应的点云语义特征、所述前景点的部位位置信息和所述至少一个初始3D框,确定目标的3D检测框,所述检测框内的区域中存在目标。
  2. 根据权利要求1所述的方法,其中,所述根据所述点云数据对应的点云语义特征、所述前景点的部位位置信息和所述至少一个初始3D框,确定目标的3D检测框,包括:
    针对每个初始3D框,进行前景点的部位位置信息和点云语义特征的池化操作,得到池化后的每个初始3D框的部位位置信息和点云语义特征;
    根据池化后的每个初始3D框的部位位置信息和点云语义特征,对每个初始3D框进行修正和/或确定每个初始3D框的置信度,以确定所述目标的3D检测框。
  3. 根据权利要求2所述的方法,其中,所述针对每个初始3D框,进行前景点的部位位置信息和点云语义特征的池化操作,得到池化后的每个初始3D框的部位位置信息和点云语义特征,包括:
    将所述每个初始3D框均匀地划分为多个网格,针对每个网格进行前景点的部位位置信息和点云语义特征的池化操作,得到池化后的每个初始3D框的部位位置信息和点云语义特征。
  4. 根据权利要求3所述的方法,其中,所述针对每个网格进行前景点的部位位置信息和点云语义特征的池化操作,包括:
    响应于一个网格中不包含前景点的情况,将所述网格的部位位置信息标记为空,得到所述网格池化后的前景点的部位位置信息,并将所述网格的点云语义特征设置为零,得到所述网格池化后的点云语义特征;
    响应于一个网格中包含前景点的情况,将所述网格的前景点的部位位置信息进行均匀池化处理,得到所述网格池化后的前景点的部位位置信息,并将所述网格的前景点的点云语义特征进行最大化池化处理,得到所述网格池化后的点云语义特征。
  5. 根据权利要求2所述的方法,其中,所述根据池化后的每个初始3D框的部位位置信息和点云语义特征,对每个初始3D框进行修正和/或确定每个初始3D框的置信度,包括:
    将所述池化后的每个初始3D框的部位位置信息和点云语义特征进行合并,根据合并后的特征,对每个初始3D框进行修正和/或确定每个初始3D框的置信度。
  6. 根据权利要求5所述的方法,其中,所述根据合并后的特征,对每个初始3D框进行修正和/或确定每个初始3D框的置信度,包括:
    将所述合并后的特征矢量化为特征向量,根据所述特征向量,对每个初始3D框进行修正和/或确定每个初始3D框的置信度;
    或者,针对所述合并后的特征,通过进行稀疏卷积操作,得到稀疏卷积操作后的特征映射;根据所述稀疏卷积操作后的特征映射,对每个初始3D框进行修正和/或确 定每个初始3D框的置信度;
    或者,针对所述合并后的特征,通过进行稀疏卷积操作,得到稀疏卷积操作后的特征映射;对所述稀疏卷积操作后的特征映射进行降采样,根据降采样后的特征映射,对每个初始3D框进行修正和/或确定每个初始3D框的置信度。
  7. 根据权利要求6所述的方法,其中,所述对所述稀疏卷积操作后的特征映射进行降采样,包括:
    通过对所述稀疏卷积操作后的特征映射进行池化操作,实现对所述稀疏卷积操作后的特征映射降采样的处理。
  8. 根据权利要求1至7任一项所述的方法,其中,所述根据所述3D点云数据,确定所述3D点云数据对应的点云语义特征,包括:
    将所述3D点云数据进行3D网格化处理,得到3D网格;在所述3D网格的非空网格中提取出所述3D点云数据对应的点云语义特征。
  9. 根据权利要求1至7任一项所述的方法,其中,所述基于所述点云语义特征,确定前景点的部位位置信息,包括:
    根据所述点云语义特征针对所述点云数据进行前景和背景的分割,以确定出前景点;所述前景点为所述点云数据中的属于前景的点云数据;
    利用用于预测前景点的部位位置信息的神经网络对确定出的前景点进行处理,得到前景点的部位位置信息;
    其中,所述神经网络采用包括有3D框的标注信息的训练数据集训练得到,所述3D框的标注信息至少包括所述训练数据集的点云数据的前景点的部位位置信息。
  10. 一种智能驾驶方法,其中,应用于智能驾驶设备中,所述智能驾驶方法包括:
    根据权利要求1至9任一项所述的目标检测方法得出所述智能驾驶设备周围的所述目标的3D检测框;
    根据所述目标的3D检测框,生成驾驶策略。
  11. 一种目标检测装置,其中,所述装置包括获取模块、第一处理模块和第二处理模块,其中,
    获取模块,配置为获取三维3D点云数据;根据所述3D点云数据,确定所述3D点云数据对应的点云语义特征;
    第一处理模块,配置为基于所述点云语义特征,确定前景点的部位位置信息;所述前景点表示所述点云数据中属于目标的点云数据,所述前景点的部位位置信息用于表征所述前景点在目标内的相对位置;基于所述点云数据提取出至少一个初始3D框;
    第二处理模块,配置为根据所述点云数据对应的点云语义特征、所述前景点的部位位置信息和所述至少一个初始3D框,确定目标的3D检测框,所述检测框内的区域中存在目标。
  12. 根据权利要求11所述的装置,其中,所述第二处理模块,配置为针对每个初始3D框,进行前景点的部位位置信息和点云语义特征的池化操作,得到池化后的每个初始3D框的部位位置信息和点云语义特征;根据池化后的每个初始3D框的部位位置信息和点云语义特征,对每个初始3D框进行修正和/或确定每个初始3D框的置信度,以确定所述目标的3D检测框。
  13. 根据权利要求12所述的装置,其中,所述第二处理模块,配置为将所述每个初始3D框均匀地划分为多个网格,针对每个网格进行前景点的部位位置信息和点云语义特征的池化操作,得到池化后的每个初始3D框的部位位置信息和点云语义特征;根据池化后的每个初始3D框的部位位置信息和点云语义特征,对每个初始3D框进行修正和/或确定每个初始3D框的置信度,以确定所述目标的3D检测框。
  14. 根据权利要求13所述的装置,其中,所述第二处理模块在针对每个网格进行前景点的部位位置信息和点云语义特征的池化操作的情况下,配置为:
    响应于一个网格中不包含前景点的情况,将所述网格的部位位置信息标记为空,得到所述网格池化后的前景点的部位位置信息,并将所述网格的点云语义特征设置为零,得到所述网格池化后的点云语义特征;响应于一个网格中包含前景点的情况,将所述网格的前景点的部位位置信息进行均匀池化处理,得到所述网格池化后的前景点的部位位置信息,并将所述网格的前景点的点云语义特征进行最大化池化处理,得到所述网格池化后的点云语义特征。
  15. 根据权利要求12所述的装置,其中,所述第二处理模块,配置为针对每个初始3D框,进行前景点的部位位置信息和点云语义特征的池化操作,得到池化后的每个初始3D框的部位位置信息和点云语义特征;将所述池化后的每个初始3D框的部位位置信息和点云语义特征进行合并,根据合并后的特征,对每个初始3D框进行修正和/或确定每个初始3D框的置信度。
  16. 根据权利要求15所述的装置,其中,所述第二处理模块在根据合并后的特征,对每个初始3D框进行修正和/或确定每个初始3D框的置信度的情况下,配置为:
    将所述合并后的特征矢量化为特征向量,根据所述特征向量,对每个初始3D框进行修正和/或确定每个初始3D框的置信度;
    或者,针对所述合并后的特征,通过进行稀疏卷积操作,得到稀疏卷积操作后的特征映射;根据所述稀疏卷积操作后的特征映射,对每个初始3D框进行修正和/或确定每个初始3D框的置信度;
    或者,针对所述合并后的特征,通过进行稀疏卷积操作,得到稀疏卷积操作后的特征映射;对所述稀疏卷积操作后的特征映射进行降采样,根据降采样后的特征映射,对每个初始3D框进行修正和/或确定每个初始3D框的置信度。
  17. 根据权利要求16所述的装置,其中,所述第二处理模块在对所述稀疏卷积操作后的特征映射进行降采样的情况下,配置为:
    通过对所述稀疏卷积操作后的特征映射进行池化操作,实现对所述稀疏卷积操作后的特征映射降采样的处理。
  18. 根据权利要求11至17任一项所述的装置,其中,所述获取模块,配置为获取3D点云数据,将所述3D点云数据进行3D网格化处理,得到3D网格;在所述3D网格的非空网格中提取出所述3D点云数据对应的点云语义特征。
  19. 根据权利要求11至17任一项所述的装置,其中,所述第一处理模块在基于所述点云语义特征,确定前景点的部位位置信息的情况下,配置为:
    根据所述点云语义特征针对所述点云数据进行前景和背景的分割,以确定出前景点;所述前景点为所述点云数据中的属于前景的点云数据;利用用于预测前景点的部位位置信息的神经网络对确定出的前景点进行处理,得到前景点的部位位置信息;其中,所述神经网络采用包括有3D框的标注信息的训练数据集训练得到,所述3D框的标注信息至少包括所述训练数据集的点云数据的前景点的部位位置信息。
  20. 一种电子设备,其中,包括处理器和配置为存储能够在处理器上运行的计算机程序的存储器;其中,
    所述处理器配置为运行所述计算机程序时,执行权利要求1至10任一项所述的方法。
  21. 一种计算机存储介质,其上存储有计算机程序,其中,该计算机程序被处理器执行时实现权利要求1至10任一项所述的方法。
  22. 一种计算机程序产品,其中,所述计算机程序产品包括计算机可执行指令, 该计算机可执行指令被执行后,能够实现权利要求1至10任一项所述的方法步骤。
PCT/CN2019/121774 2019-06-17 2019-11-28 目标检测方法和装置及智能驾驶方法、设备和存储介质 WO2020253121A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
JP2020567923A JP7033373B2 (ja) 2019-06-17 2019-11-28 ターゲット検出方法及び装置、スマート運転方法、装置並びに記憶媒体
SG11202011959SA SG11202011959SA (en) 2019-06-17 2019-11-28 Method and apparatus for object detection, intelligent driving method and device, and storage medium
KR1020207035715A KR20210008083A (ko) 2019-06-17 2019-11-28 목표 검출 방법 및 장치 및 지능형 주행 방법, 기기 및 저장 매체
US17/106,826 US20210082181A1 (en) 2019-06-17 2020-11-30 Method and apparatus for object detection, intelligent driving method and device, and storage medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910523342.4A CN112101066B (zh) 2019-06-17 2019-06-17 目标检测方法和装置及智能驾驶方法、设备和存储介质
CN201910523342.4 2019-06-17

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/106,826 Continuation US20210082181A1 (en) 2019-06-17 2020-11-30 Method and apparatus for object detection, intelligent driving method and device, and storage medium

Publications (1)

Publication Number Publication Date
WO2020253121A1 true WO2020253121A1 (zh) 2020-12-24

Family

ID=73748556

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/121774 WO2020253121A1 (zh) 2019-06-17 2019-11-28 目标检测方法和装置及智能驾驶方法、设备和存储介质

Country Status (6)

Country Link
US (1) US20210082181A1 (zh)
JP (1) JP7033373B2 (zh)
KR (1) KR20210008083A (zh)
CN (1) CN112101066B (zh)
SG (1) SG11202011959SA (zh)
WO (1) WO2020253121A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113658199A (zh) * 2021-09-02 2021-11-16 中国矿业大学 基于回归修正的染色体实例分割网络
JP7224682B1 (ja) 2021-08-17 2023-02-20 忠北大学校産学協力団 自律走行のための3次元多重客体検出装置及び方法
CN115861561A (zh) * 2023-02-24 2023-03-28 航天宏图信息技术股份有限公司 一种基于语义约束的等高线生成方法和装置
CN117475410A (zh) * 2023-12-27 2024-01-30 山东海润数聚科技有限公司 基于前景点筛选的三维目标检测方法、系统、设备、介质

Families Citing this family (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2018033137A1 (zh) * 2016-08-19 2018-02-22 北京市商汤科技开发有限公司 在视频图像中展示业务对象的方法、装置和电子设备
US12051206B2 (en) * 2019-07-25 2024-07-30 Nvidia Corporation Deep neural network for segmentation of road scenes and animate object instances for autonomous driving applications
US11885907B2 (en) 2019-11-21 2024-01-30 Nvidia Corporation Deep neural network for detecting obstacle instances using radar sensors in autonomous machine applications
US11531088B2 (en) 2019-11-21 2022-12-20 Nvidia Corporation Deep neural network for detecting obstacle instances using radar sensors in autonomous machine applications
US11532168B2 (en) 2019-11-15 2022-12-20 Nvidia Corporation Multi-view deep neural network for LiDAR perception
US12080078B2 (en) 2019-11-15 2024-09-03 Nvidia Corporation Multi-view deep neural network for LiDAR perception
US12050285B2 (en) 2019-11-21 2024-07-30 Nvidia Corporation Deep neural network for detecting obstacle instances using radar sensors in autonomous machine applications
US11277626B2 (en) 2020-02-21 2022-03-15 Alibaba Group Holding Limited Region of interest quality controllable video coding techniques
US11388423B2 (en) 2020-03-23 2022-07-12 Alibaba Group Holding Limited Region-of-interest based video encoding
TWI738367B (zh) * 2020-06-01 2021-09-01 國立中正大學 以卷積神經網路檢測物件影像之方法
US11443147B2 (en) * 2020-12-11 2022-09-13 Argo AI, LLC Systems and methods for object detection using stereovision information
CN112784691B (zh) * 2020-12-31 2023-06-02 杭州海康威视数字技术股份有限公司 一种目标检测模型训练方法、目标检测方法和装置
CN115035359A (zh) * 2021-02-24 2022-09-09 华为技术有限公司 一种点云数据处理方法、训练数据处理方法及装置
CN112801059B (zh) * 2021-04-07 2021-07-20 广东众聚人工智能科技有限公司 图卷积网络系统和基于图卷积网络系统的3d物体检测方法
CN113298840B (zh) * 2021-05-26 2022-09-16 南京邮电大学 基于带电作业场景下的多模态物体检测方法、系统、装置及存储介质
CN113283349A (zh) * 2021-05-28 2021-08-20 中国公路工程咨询集团有限公司 基于目标锚框优选策略的交通基建施工目标监测系统与方法
CN113469025B (zh) * 2021-06-29 2024-05-31 阿波罗智联(北京)科技有限公司 应用于车路协同的目标检测方法、装置、路侧设备和车辆
US20230035475A1 (en) * 2021-07-16 2023-02-02 Huawei Technologies Co., Ltd. Methods and systems for semantic segmentation of a point cloud
CN113688738B (zh) * 2021-08-25 2024-04-09 北京交通大学 一种基于激光雷达点云数据的目标识别系统及方法
WO2023036228A1 (en) * 2021-09-08 2023-03-16 Huawei Technologies Co., Ltd. System and method for proposal-free and cluster-free panoptic segmentation system of point clouds
US12008788B1 (en) * 2021-10-14 2024-06-11 Amazon Technologies, Inc. Evaluating spatial relationships using vision transformers
CN113642585B (zh) * 2021-10-14 2022-02-11 腾讯科技(深圳)有限公司 图像处理方法、装置、设备、存储介质及计算机程序产品
US12100230B2 (en) * 2021-10-28 2024-09-24 Nvidia Corporation Using neural networks for 3D surface structure estimation based on real-world data for autonomous systems and applications
US12039663B2 (en) 2021-10-28 2024-07-16 Nvidia Corporation 3D surface structure estimation using neural networks for autonomous systems and applications
CN113780257B (zh) * 2021-11-12 2022-02-22 紫东信息科技(苏州)有限公司 多模态融合弱监督车辆目标检测方法及系统
CN115249349B (zh) * 2021-11-18 2023-06-27 上海仙途智能科技有限公司 一种点云去噪方法、电子设备及存储介质
CN114298581A (zh) * 2021-12-30 2022-04-08 广州极飞科技股份有限公司 质量评估模型生成方法、质量评估方法、装置、电子设备和可读存储介质
CN114445593B (zh) * 2022-01-30 2024-05-10 重庆长安汽车股份有限公司 基于多帧语义点云拼接的鸟瞰图语义分割标签生成方法
CN114509785A (zh) * 2022-02-16 2022-05-17 中国第一汽车股份有限公司 三维物体检测方法、装置、存储介质、处理器及系统
CN114882046B (zh) * 2022-03-29 2024-08-02 驭势科技(北京)有限公司 三维点云数据的全景分割方法、装置、设备及介质
KR102708275B1 (ko) * 2022-12-07 2024-09-24 주식회사 에스더블유엠 딥러닝을 위한 폴리곤 매시 기반 3차원 객체 모델 및 주석데이터 생성장치 및 그 방법
CN115588187B (zh) * 2022-12-13 2023-04-11 华南师范大学 基于三维点云的行人检测方法、装置、设备以及存储介质
CN115937644B (zh) * 2022-12-15 2024-01-02 清华大学 一种基于全局及局部融合的点云特征提取方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9183459B1 (en) * 2014-05-06 2015-11-10 The Boeing Company Sensor fusion using detector confidence boosting
CN108171217A (zh) * 2018-01-29 2018-06-15 深圳市唯特视科技有限公司 一种基于点融合网络的三维物体检测方法
CN109188457A (zh) * 2018-09-07 2019-01-11 百度在线网络技术(北京)有限公司 物体检测框的生成方法、装置、设备、存储介质及车辆
CN109410307A (zh) * 2018-10-16 2019-03-01 大连理工大学 一种场景点云语义分割方法
CN109635685A (zh) * 2018-11-29 2019-04-16 北京市商汤科技开发有限公司 目标对象3d检测方法、装置、介质及设备

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7160257B2 (ja) * 2017-10-19 2022-10-25 日本コントロールシステム株式会社 情報処理装置、情報処理方法、およびプログラム
TWI651686B (zh) * 2017-11-30 2019-02-21 國家中山科學研究院 一種光學雷達行人偵測方法
JP7290240B2 (ja) 2018-04-27 2023-06-13 成典 田中 対象物認識装置
CN109655019B (zh) * 2018-10-29 2021-02-02 北方工业大学 一种基于深度学习和三维重建的货物体积测量方法
CN109597087B (zh) * 2018-11-15 2022-07-01 天津大学 一种基于点云数据的3d目标检测方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9183459B1 (en) * 2014-05-06 2015-11-10 The Boeing Company Sensor fusion using detector confidence boosting
CN108171217A (zh) * 2018-01-29 2018-06-15 深圳市唯特视科技有限公司 一种基于点融合网络的三维物体检测方法
CN109188457A (zh) * 2018-09-07 2019-01-11 百度在线网络技术(北京)有限公司 物体检测框的生成方法、装置、设备、存储介质及车辆
CN109410307A (zh) * 2018-10-16 2019-03-01 大连理工大学 一种场景点云语义分割方法
CN109635685A (zh) * 2018-11-29 2019-04-16 北京市商汤科技开发有限公司 目标对象3d检测方法、装置、介质及设备

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7224682B1 (ja) 2021-08-17 2023-02-20 忠北大学校産学協力団 自律走行のための3次元多重客体検出装置及び方法
KR20230026130A (ko) * 2021-08-17 2023-02-24 충북대학교 산학협력단 자율 주행을 위한 단일 계층 3차원 다중 객체 검출 장치 및 방법
JP2023027736A (ja) * 2021-08-17 2023-03-02 忠北大学校産学協力団 自律走行のための3次元多重客体検出装置及び方法
KR102681992B1 (ko) * 2021-08-17 2024-07-04 충북대학교 산학협력단 자율 주행을 위한 단일 계층 3차원 다중 객체 검출 장치 및 방법
CN113658199A (zh) * 2021-09-02 2021-11-16 中国矿业大学 基于回归修正的染色体实例分割网络
CN113658199B (zh) * 2021-09-02 2023-11-03 中国矿业大学 基于回归修正的染色体实例分割网络
CN115861561A (zh) * 2023-02-24 2023-03-28 航天宏图信息技术股份有限公司 一种基于语义约束的等高线生成方法和装置
CN117475410A (zh) * 2023-12-27 2024-01-30 山东海润数聚科技有限公司 基于前景点筛选的三维目标检测方法、系统、设备、介质
CN117475410B (zh) * 2023-12-27 2024-03-15 山东海润数聚科技有限公司 基于前景点筛选的三维目标检测方法、系统、设备、介质

Also Published As

Publication number Publication date
JP7033373B2 (ja) 2022-03-10
JP2021532442A (ja) 2021-11-25
KR20210008083A (ko) 2021-01-20
SG11202011959SA (en) 2021-01-28
CN112101066B (zh) 2024-03-08
CN112101066A (zh) 2020-12-18
US20210082181A1 (en) 2021-03-18

Similar Documents

Publication Publication Date Title
WO2020253121A1 (zh) 目标检测方法和装置及智能驾驶方法、设备和存储介质
JP7430277B2 (ja) 障害物検出方法及び装置、コンピュータデバイス、並びにコンピュータプログラム
CN111626217B (zh) 一种基于二维图片和三维点云融合的目标检测和追踪方法
CN111666921B (zh) 车辆控制方法、装置、计算机设备和计算机可读存储介质
WO2020108311A1 (zh) 目标对象3d检测方法、装置、介质及设备
US20190147245A1 (en) Three-dimensional object detection for autonomous robotic systems using image proposals
CN114972763B (zh) 激光雷达点云分割方法、装置、设备及存储介质
CN113761999A (zh) 一种目标检测方法、装置、电子设备和存储介质
CN112446227A (zh) 物体检测方法、装置及设备
CN113284163A (zh) 基于车载激光雷达点云的三维目标自适应检测方法及系统
US20220269900A1 (en) Low level sensor fusion based on lightweight semantic segmentation of 3d point clouds
Sun et al. PointMoSeg: Sparse tensor-based end-to-end moving-obstacle segmentation in 3-D lidar point clouds for autonomous driving
CN111898659A (zh) 一种目标检测方法及系统
CN113269147B (zh) 基于空间和形状的三维检测方法、系统、存储及处理装置
Lin et al. CNN-based classification for point cloud object with bearing angle image
JP2023158638A (ja) 自律走行車両のためのライダーポイントクラウド及び周辺カメラを用いたフュージョンベースのオブジェクトトラッカー
CN116246119A (zh) 3d目标检测方法、电子设备及存储介质
CN115147328A (zh) 三维目标检测方法及装置
KR102270827B1 (ko) 360도 주변 물체 검출 및 인식 작업을 위한 다중 센서 데이터 기반의 융합 정보 생성 방법 및 장치
US12079970B2 (en) Methods and systems for semantic scene completion for sparse 3D data
Dimitrievski et al. Semantically aware multilateral filter for depth upsampling in automotive lidar point clouds
CN112699711A (zh) 车道线检测方法、装置、存储介质及电子设备
CN113420648B (zh) 一种具有旋转适应性的目标检测方法及系统
CN116778262B (zh) 一种基于虚拟点云的三维目标检测方法和系统
Berrio et al. Fusing lidar and semantic image information in octree maps

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2020567923

Country of ref document: JP

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 20207035715

Country of ref document: KR

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19933826

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 21.03.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19933826

Country of ref document: EP

Kind code of ref document: A1