WO2023222062A1 - Procédé de détection cible pour conduite autonome, et appareil, support et véhicule - Google Patents

Procédé de détection cible pour conduite autonome, et appareil, support et véhicule Download PDF

Info

Publication number
WO2023222062A1
WO2023222062A1 PCT/CN2023/094927 CN2023094927W WO2023222062A1 WO 2023222062 A1 WO2023222062 A1 WO 2023222062A1 CN 2023094927 W CN2023094927 W CN 2023094927W WO 2023222062 A1 WO2023222062 A1 WO 2023222062A1
Authority
WO
WIPO (PCT)
Prior art keywords
features
point cloud
voxel
network
cloud data
Prior art date
Application number
PCT/CN2023/094927
Other languages
English (en)
Chinese (zh)
Inventor
何欣栋
任广辉
秦欢
Original Assignee
安徽蔚来智驾科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN202210556974.2A external-priority patent/CN114943950A/zh
Priority claimed from CN202210558014.XA external-priority patent/CN114943951A/zh
Application filed by 安徽蔚来智驾科技有限公司 filed Critical 安徽蔚来智驾科技有限公司
Publication of WO2023222062A1 publication Critical patent/WO2023222062A1/fr

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads

Definitions

  • the invention has priority in the application number 202210558014. , Medium and Vehicles", the entire content of which is incorporated by reference into the present invention.
  • the present invention relates to the technical field of automatic driving, and specifically provides a target detection method, device, medium and vehicle for automatic driving.
  • this field needs a new target detection solution for automatic driving to solve the above problems.
  • the present invention is proposed to solve or at least partially solve the problem of how to increase the computing speed of target detection to reduce the computing pressure of the vehicle-side chip while ensuring the automatic driving function.
  • the present invention provides a target detection method for automatic driving, including:
  • the feature vectors of the point cloud data contained are all value dimension;
  • the voxelized point cloud features are input into the convolutional network for feature extraction, and the extracted features are input into the target detection head network to obtain the target detection results.
  • voxelizing the original point cloud data to obtain voxelized point cloud features includes:
  • the original point cloud data in each voxel is deduplicated to obtain the deduplicated point cloud data
  • the feature vector average is calculated for the sampled point cloud data in each voxel to obtain the voxelized point cloud feature.
  • Deduplicating the original point cloud data in each voxel according to the voxel coordinates to obtain the deduplicated point cloud data includes:
  • each voxel For each voxel, perform a weighted calculation on the voxel coordinates (x, y, z) of each original point cloud data in the voxel according to the voxel coordinates, and obtain the key code of each original point cloud data. value;
  • the one-dimensional hash table average the original point cloud data with the same key code value in each voxel to achieve deduplication of the original point cloud data in each voxel and obtain the point cloud after deduplication. data.
  • Sampling the deduplicated point cloud data in each voxel according to the voxel coordinates to obtain the sampled point cloud data includes:
  • the deduplicated point cloud data corresponding to the first N key values in each voxel is obtained as the sampled point cloud data.
  • Inputting the voxelized point cloud features into a convolutional network for feature extraction, and inputting the extracted features into a target detection head network to obtain target detection results includes:
  • inputting the voxelized point cloud features into a sparse convolutional network for downsampling to obtain sparse voxel features includes:
  • a sparse convolutional network with a convolution kernel size of 3, a stride of 3, and a padding of 0 is used for downsampling.
  • the top view convolution network has an asymmetric network structure, and the activation function used is an h-swish activation function.
  • voxelizing the original point cloud data to obtain voxelized point cloud features includes: voxelizing the original point cloud data to obtain three-dimensional voxel features ; Input the three-dimensional voxel features into the sparse convolution network for feature extraction to obtain the first sparse voxel feature with M times downsampling and the second sparse voxel feature with N times downsampling, where M and N are natural numbers and M> N;
  • the step of inputting the voxelized point cloud features into a convolutional network for feature extraction, and inputting the extracted features into a target detection head network to obtain a target detection result includes: converting the first sparse voxels The features are input to the top view convolution network for feature extraction to obtain features on the top view; the features on the top view are input into the vehicle detection head to obtain vehicle information; the second sparse voxel feature input compaction module is compacted to obtain compact Features; input the features on the top view into the upsampling module for upsampling to obtain upsampled features; fuse the compact features and the upsampled features to obtain fused features; input the fused features into the vulnerable road user detection head to obtain Vulnerable road user information.
  • the second sparse voxel feature input compaction module performs compaction processing to obtain compact features, including:
  • the second sparse voxel feature is maximized in the depth dimension.
  • maximizing the second sparse voxel feature in the depth dimension includes:
  • the second sparse voxel features in the (H, W, D, C) dimension are stacked in the D dimension to obtain the compact features in the (H, W, C) dimension.
  • H and W respectively represent the grid length and width of the top view convolutional network
  • D represents the depth of the second voxel feature
  • C represents the number of channels of the second voxel feature.
  • the three-dimensional voxel features are input into a sparse convolution network for feature extraction to obtain a first sparse voxel feature that is downsampled by M times and a first sparse voxel feature that is downsampled by N times.
  • Two sparse voxel features including:
  • the three-dimensional voxel features are input into a sparse convolution network for feature extraction to obtain an 8-fold down-sampled first sparse voxel feature and a 4-fold down-sampled second sparse voxel feature.
  • the feature on the top view is input into an upsampling module for upsampling to obtain upsampled features, including
  • the features on the top view are input into the upsampling module for 2 times upsampling to obtain the upsampling features.
  • the sparse convolution network, the top view convolution network and the vehicle detection head constitute a subject detection network
  • the compaction module, upsampling module and vulnerable road user detection head constitute a branch detection network
  • the method also includes: using a training set to conduct overall training on the main body detection network and the branch detection network to obtain network parameters of the main body detection network and network parameters of the branch detection network respectively, wherein the training set includes training data containing vehicles. and data containing vulnerable road users;
  • the branch detection network is individually trained using data containing vulnerable road users in the training set.
  • the branch detection network is separately trained using data containing disadvantaged road users in the training set, including:
  • the branch detection network is trained using data containing vulnerable road users.
  • a control device in a second aspect, includes a processor and a storage device.
  • the storage device is adapted to store a plurality of program codes.
  • the program codes are adapted to be loaded and run by the processor to execute the above.
  • the target detection method for automatic driving described in any one of the technical solutions for the target detection method for automatic driving.
  • a computer-readable storage medium which stores a plurality of program codes, and the program codes are adapted to be loaded and run by a processor to perform the above-mentioned target detection method for autonomous driving.
  • a vehicle including:
  • Vehicle-mounted lidar is used to obtain raw point cloud data
  • the present invention can obtain the voxelized point cloud features in each voxel from the original point cloud data in the voxelization stage, and input the voxelized point cloud data into the convolutional network. Feature extraction is performed in the system, and the extracted features are further input into the target detection head network to obtain the target detection results.
  • the present invention does not need to use an additional model to extract features and average the voxelized point cloud data.
  • the characteristics of the point cloud data within the voxel are integrated, while ensuring the automatic driving function.
  • the initial extraction process of original point cloud data is simplified, the calculation speed of target detection is improved, and the calculation pressure of the vehicle-end chip is reduced.
  • the three-dimensional voxel features obtained according to the original point cloud data can be input into the sparse convolution network, so as to apply the sparse convolution network for feature extraction to respectively obtain the first sparse voxel features. and the second sparse voxel feature.
  • the first sparse voxel feature is input to the top view convolution network for feature extraction to obtain the features on the top view.
  • the vehicle detection head can obtain vehicle information based on the features on the top view, and the second sparse voxel feature is Perform compaction processing to obtain compact features, and upsample the features on the top view to obtain upsampled features.
  • the vulnerable user detection head can obtain vulnerable road user information based on the fused features.
  • the present invention can set detection heads for vehicles and vulnerable road users respectively according to their respective characteristics, and detect vehicle information and vulnerable road user information separately. While ensuring the vehicle detection effect, it can realize vulnerable road user detection. Better detection of road user information improves detection of vulnerable road users.
  • Figure 1 is a schematic flowchart of the main steps of a target detection method for autonomous driving according to an embodiment of the present invention
  • Figure 2 is a schematic diagram comparing the main process architecture of an automatic driving target detection method according to an embodiment of the present invention and an existing automatic driving target detection method;
  • Figure 3 is a schematic flowchart of the main steps to improve the operation speed of the target detection method according to an embodiment of the present invention
  • Figure 4 is a schematic flowchart of the main steps of obtaining voxelized point cloud features according to an embodiment of the present invention
  • Figure 5 is a schematic flowchart of the main steps of a target detection method for autonomous driving according to an embodiment of the present invention
  • Figure 6 is a block diagram of the main network structure of the main body detection network and the branch detection network according to an implementation of the embodiment of the present invention
  • Figure 7 is a schematic flowchart of the main steps of training the main body detection network and the branch detection network according to an embodiment of the present invention.
  • module and “processor” may include hardware, software, or a combination of both.
  • a module can include hardware circuits, various suitable sensors, communication ports, and memory. It can also include software parts, such as program code, or it can be a combination of software and hardware.
  • the processor may be a central processing unit, a microprocessor, an image processor, a digital signal processor, or any other suitable processor.
  • the processor has data and/or signal processing functions.
  • the processor can be implemented in software, hardware, or a combination of both.
  • Non-transitory computer-readable storage media include any suitable media that can store program code, such as magnetic disks, hard disks, optical disks, flash memory, read-only memory, random access memory, etc.
  • a and/or B means all possible combinations of A and B, such as just A, just B, or A and B.
  • the terms "at least one A or B” or “at least one of A and B” have a similar meaning to “A and/or B” and may include just A, just B or A and B.
  • the singular forms “a,” “the” and “the” may also include the plural form.
  • FIG. 1 is a schematic flowchart of the main steps of target detection for autonomous driving according to an embodiment of the present invention.
  • the automatic driving target detection method in the embodiment of the present invention mainly includes the following steps S101 to S103.
  • Step S101 Obtain original point cloud data.
  • the original point cloud data collected by the vehicle can be obtained.
  • the original point cloud data may be point cloud data collected by a vehicle-mounted lidar.
  • Step S102 Voxelize the original point cloud data to obtain voxelized point cloud features, where the voxelized point cloud features are M*C dimensional point cloud features, M is the number of voxels, and C is each voxel. Characteristics of the point cloud data contained in Dimensions of the vector mean. Voxelization converts unordered raw point cloud data into regular data that is easy to perform convolution operations.
  • FIG. 2 is a schematic diagram comparing the main process architecture of an automatic driving target detection method according to an embodiment of the present invention and an automatic driving target detection method in the prior art.
  • the original point cloud data is A*C-dimensional point cloud data
  • A is the number of original point cloud data
  • C is the dimension of the feature vector of the original point cloud data.
  • the A*C-dimensional original point cloud data can be converted into M*C-dimensional voxelized point cloud features (M ⁇ A), that is, the features of the point cloud data contained in each voxel.
  • the vectors are averaged to obtain the feature vector mean, which is used as the voxelized point cloud feature in that voxel.
  • the method in the existing technology is to first voxelize the A*C-dimensional original point cloud data to obtain M*T*C (M represents the number of voxels, and T represents the number of point clouds in each voxel.
  • the method in the embodiment of the present invention does not need to retrain and use the VFE network to extract the original point cloud data. It only needs to take the feature mean of the point cloud data within the voxel to obtain Voxelized point cloud features can simplify the processing of original point cloud data and improve the computing speed of the target detection process.
  • Step S103 Input voxelized point cloud features into the convolutional network for feature extraction, and input the extracted features into the target detection head network to obtain target detection results.
  • the voxelized point cloud features obtained in step S102 can be input into the convolutional network to perform feature extraction on the voxelized point cloud features, and the extracted features can be input into the target detection head network. , to achieve target detection for autonomous driving.
  • the target detection head network can be a region generation network RPN (Region Proposal Network).
  • RPN Region Proposal Network
  • the embodiment of the present invention can realize that the original point cloud data can be included in each voxel in the voxelization stage, and the voxelized point cloud data can be input to Feature extraction is performed in the convolutional network, and the extracted features are further input into the target detection head network to obtain the target detection results.
  • the embodiment of the present invention does not need to use additional models to extract features and average the voxelized point cloud data.
  • the characteristics of the point cloud data within the voxel are integrated to ensure autonomous driving. On the premise of the function, it simplifies the initial extraction process of original point cloud data, improves the calculation speed of target detection, and reduces the calculation pressure of the vehicle-end chip.
  • Step S102 and step S103 will be further described below.
  • FIG. 4 is a schematic flowchart of the main steps of obtaining voxelized point cloud features according to an implementation of the embodiment of the present invention.
  • step S102 may include the following steps S1021 to S1024:
  • Step S1021 Set the voxel size and obtain the voxel coordinates (x, y, z) of the original point cloud data;
  • the three-dimensional space where the original point cloud data is located can be divided according to a preset size, and the three-dimensional space is divided into a plurality of uniformly sized voxels.
  • the voxel coordinates (x, y, z) of the original point cloud data can be obtained according to the relationship between the original point cloud data coordinate system and the voxel coordinate system.
  • a Voxelizer (voxel generator) can be used to set the voxel size.
  • Step S1022 Deduplicate the original point cloud data in each voxel according to the voxel coordinates, and obtain the deduplicated point cloud data.
  • step S1022 may further include the following steps S10221 to S10223:
  • Step S10221 For each voxel, perform a weighted calculation on the voxel coordinates (x, y, z) of each original point cloud data in the voxel according to the voxel coordinates, and obtain the key value of each original point cloud data. .
  • Step S10222 Establish a one-dimensional hash table based on the corresponding relationship between the key value and the original point cloud data.
  • the hash table (Hash Table) is a data structure that is directly accessed based on the key value (Key Value).
  • Step S10223 According to the one-dimensional hash table, average the original point cloud data with the same key code value in each voxel to deduplicate the original point cloud data in each voxel and obtain the deduplicated point cloud data. .
  • the voxel coordinates of each original point cloud data in the voxel can be weighted to obtain the key value of each original point cloud data.
  • a one-dimensional hash table is established based on the correspondence between the key value and the original point cloud data. That is, the correspondence between the key value and the original point cloud data can be queried through the one-dimensional hash table.
  • the original point cloud data is deduplicated according to the one-dimensional hash table. Specifically, the original point cloud data with the same key value in each voxel is averaged, and the obtained average is used as the point cloud data corresponding to the key value to obtain the point cloud data after deduplication.
  • Those skilled in the art can set the weight of the voxel coordinates (x, y, z) according to actual application needs.
  • Step S1023 Sample the deduplicated point cloud data in each voxel according to the voxel coordinates to obtain the sampled point cloud data.
  • step S1023 may further include the following steps:
  • the deduplicated point cloud data corresponding to the first N key code values in each voxel is obtained as the sampled point cloud data.
  • the deduplicated point cloud data corresponding to the first N key values in each voxel can be used as the sampled point cloud data according to the one-dimensional hash table established in step S1022.
  • N is a positive integer greater than 1.
  • Those skilled in the art can set the value of N according to the needs of practical applications.
  • Step S1024 Calculate the feature vector mean of the sampled point cloud data in each voxel to obtain voxelized point cloud features.
  • the feature vector mean can be calculated for the sampled point cloud data in each voxel, and this feature vector mean can be used as the voxelized point cloud feature in the voxel.
  • step S103 may further include the following steps S1031 to step S1033:
  • Step S1031 Input the voxelized point cloud features into the sparse convolution network for downsampling to obtain sparse voxel features.
  • the sparse convolution network (Sparse Conv Network) is a 3D sparse convolution network.
  • the advantage of the 3D sparse convolution network is that it not only considers the 3D spatial information of the three-dimensional voxel features, but also takes advantage of the sparsity of the three-dimensional voxel features, so that The process of feature extraction is less computationally intensive.
  • the convolutional layers in the sparse convolutional network are 6 layers. This can greatly speed up the running speed of sparse convolutional networks.
  • step S1031 may further include the following steps:
  • Downsampling is performed in the height direction of voxelized point cloud features using a sparse convolutional network with a convolution kernel size of 3, a stride of 3, and a padding of 0.
  • the convolution kernel (kernelsize) of the sparse convolutional network is set to 3.
  • the stride is set to 3 and the padding is set to 0, which can achieve faster downsampling in the height direction and is also beneficial to the extraction of sparse point cloud features.
  • Step S1032 Input sparse voxel features into the top view convolution network for feature extraction to obtain features on the top view.
  • sparse voxel features can be input into a bird's eye view (BEV) convolutional network for feature extraction to obtain features on the bird's eye view.
  • BEV bird's eye view
  • Input sparse voxel features into top view Feature extraction in a convolutional network can increase the receptive field of the convolutional network.
  • the top view convolutional network has an asymmetric network structure, and the activation function used is the h-swish (hard swish) activation function.
  • the top view convolution network is an asymmetric network structure , that is, the top view convolutional network is an asymmetric convolutional network.
  • the application of asymmetric convolutional networks can effectively reduce the computational complexity of the convolutional network, thereby improving the computational efficiency of the top view convolutional network.
  • the activation function used by the top view convolution network is the h-swish activation function. Applying the h-swish activation function can achieve better training results in deep networks, and the amount of calculation is small, which is more suitable for lightweight networks such as top view convolutional neural networks.
  • Step S1033 Input the features on the top view into the target detection head network to obtain the target detection results.
  • the features on the top view can be input into the target detection head network, and the target detection results can be obtained by performing target detection.
  • FIG. 3 is a schematic flowchart of the main steps for improving the operation speed of a target detection method according to an embodiment of the present invention.
  • the computing speed of the target detection method can be improved through point cloud data process acceleration, network structure (convolutional network) acceleration and CUDA (Compute Unified Device Architecture) operator acceleration.
  • the point cloud data process acceleration can be achieved through the above steps S1021 to step S1024;
  • the network structure acceleration is achieved by network clipping of the sparse convolution network (reducing the number of convolution layers, etc.);
  • the CUDA operator acceleration is achieved by optimizing the operation of the Voxelizer module process to achieve.
  • FIG. 5 is a schematic flowchart of the main steps of a target detection method for autonomous driving according to an embodiment of the present invention.
  • step S102 may further include the following steps S1025 and S1026, and step S103 may further include the following steps S1034 to S1038:
  • Step S1025 Voxelize the original point cloud data to obtain three-dimensional voxel features.
  • the original point cloud data can be voxelized to obtain three-dimensional voxel features.
  • Voxelization converts unordered raw point cloud data into regular data that is easy to perform convolution operations.
  • the original point cloud data can be obtained through a vehicle-mounted lidar installed on the vehicle.
  • the original point cloud data can be voxelized to obtain three-dimensional voxel features through the following steps: using a hash table-based method, first formulate a tensor of a certain size and initialize it to 0, set the input The size of the voxel, then traverse the original point cloud data, and calculate which voxel the original point cloud data belongs to, while recording the coordinates of the voxel to which the original point cloud data belongs and the number of original point cloud data for each voxel, and finally get all The voxels, the coordinates corresponding to the voxels, and the maximum original point cloud data contained in each voxel are used, and the average coordinates and channel number of the point cloud in each voxel are used as the three-dimensional voxel features.
  • Step S1026 Input the three-dimensional voxel features into the sparse convolution network for feature extraction to obtain the first sparse voxel feature with M times downsampling and the second sparse voxel feature with N times downsampling, where M and N are natural numbers and M> N.
  • three-dimensional voxel features can be input into a sparse convolution network (Sparse Conv Network) for high-dimensional sparse feature extraction to obtain high-dimensional sparse voxel features of different scales.
  • the first sparse voxel feature may be obtained through M times downsampling
  • the second sparse voxel feature may be obtained through N times downsampling.
  • the sparse convolution network is a 3D sparse convolution network.
  • the advantage of the 3D sparse convolution network is that it not only considers the 3D spatial information of the three-dimensional voxel features, but also takes advantage of the sparsity of the three-dimensional voxel features, so that the process of feature extraction can be calculated. The amount is smaller.
  • Those skilled in the art can set the values of M and N according to the needs of practical applications.
  • two convolution modules can be respectively set up in the sparse convolution network.
  • One of the convolution modules implements M-fold downsampling of the three-dimensional voxel features to obtain the first sparse voxel features; the other convolution module Implement N-fold downsampling of three-dimensional voxel features to obtain second sparse voxel features.
  • the second sparse voxel feature is a feature with higher resolution than the first sparse voxel feature.
  • Step S1034 Input the first sparse voxel feature into the top view convolution network to perform feature extraction to obtain features on the top view.
  • the first sparse voxel feature can be input into a bird’s eye view (BEV, bird’s eye view) convolutional network, and feature extraction is performed to obtain features on the bird’s eye view.
  • BEV bird’s eye view
  • convolutional neural networks are often used to extract target features.
  • low-level networks generally have smaller receptive fields, and the resolution of extracting target features is high, which is suitable for detecting small target objects;
  • the receptive field will gradually increase, but the resolution of extracting target features is lower, making it more suitable for detecting large target objects. Therefore, in this embodiment, the first sparse voxel features with higher downsampling multiples are sent to the top view convolution network to obtain features on the top view to achieve target detection of large targets such as vehicles.
  • Step S1035 Input the features on the top view into the vehicle detection head to obtain vehicle information.
  • features on the top view can be input into the vehicle detection head to obtain vehicle information.
  • the vehicle detection head can perform feature recognition on features on the top view through a Region Proposal Network (RPN) to obtain vehicle information.
  • RPN Region Proposal Network
  • Step S1036 Perform compaction processing on the second sparse voxel feature input compaction module to obtain compact features.
  • the obtained compact feature is a dense feature relative to the second sparse voxel feature.
  • the dimensions of the second voxel feature can be represented by (H, W, D, C), where H represents the network length of the top view convolution network, W represents the network width of the top view convolution network, and D represents the third The depth of the two-voxel feature, C is the number of channels of the second voxel feature.
  • the second sparse voxel feature in the (H, W, D, C) dimension can be maximized in the D dimension to obtain compact features.
  • ReduceMax refers to the function that finds the maximum value in the specified dimension.
  • the second sparse voxel feature in the (H, W, D, C) dimension can be used to obtain the maximum value using the Maxpooling operation in the D dimension to obtain the compact feature in the (H, W, C) dimension.
  • Maxpooling refers to the operation of dividing the features into several small blocks of the same size (pooling size), taking only the maximum value in the specified dimension from each small block, and discarding other values.
  • the second sparse voxel feature in the (H, W, D, C) dimension can be stacked in the D dimension to obtain the compact feature in the (H, W, C) dimension, that is, a fixed size can be used in the D dimension.
  • the second sparse voxel feature is stacked at the anchor point to obtain the compact feature.
  • Step S1037 Input the features on the top view into the upsampling module for upsampling to obtain upsampled features; fuse the compact features and the upsampled features to obtain fused features.
  • the features on the top view can be upsampled to obtain the upsampled features, and the upsampled features and the compact features can be feature fused to obtain the fused features. Since the fused features are obtained based on upsampled features and compact features, the fused features are high-resolution and contain multi-level features.
  • the features on the top view can be input into the upsampling module for 2 times upsampling to obtain the upsampled features.
  • the additive weighted average method can be applied to achieve the comparison between the upsampled features and the compact features. Feature fusion to obtain fused features.
  • the multiplicative weighted average method can also be applied to implement feature fusion between upsampled features and compact features to obtain fused features.
  • feature fusion between upsampled features and compact features can also be achieved through a concat operation to obtain fused features.
  • Step S1038 Input the fusion features into the vulnerable road user detection head to obtain vulnerable road user information.
  • vulnerable road users may include pedestrians, bicycles, and small distant objects in an autonomous driving environment.
  • the three-dimensional voxel features obtained according to the original point cloud data can be input into the sparse convolution network to apply the sparse convolution network for feature extraction to obtain the first sparse respectively.
  • Voxel features and second sparse voxel features the first sparse voxel features are input to the top view convolution network for feature extraction to obtain the features on the top view.
  • the vehicle detection head can obtain vehicle information based on the features on the top view, and the second sparse
  • the voxel features are compacted to obtain compact features, and the features on the top view are upsampled to obtain upsampled features.
  • the compact features and upsampled features are feature fused to obtain fused features.
  • the vulnerable user detection head can obtain vulnerable users based on the fused features. Road user information. It can set detection heads for vehicles and vulnerable road users respectively according to their respective characteristics, and detect vehicle information and vulnerable road user information separately. While ensuring the vehicle detection effect, it can achieve better detection of vulnerable road user information. detection, improving the detection effect of vulnerable road users.
  • Figure 6 is a block diagram of the main network structure of the main body detection network and the branch detection network according to an implementation of the embodiment of the present invention
  • Figure 7 is A schematic flowchart of the main steps of training a main body detection network and a branch detection network according to an embodiment of the present invention.
  • the sparse convolution network, the top view convolution network and the vehicle detection head can constitute the subject detection network; the compaction module, the upsampling module and VRU (Vulnerable Road User) detection
  • the header constitutes a branch detection network.
  • the input of the sparse convolutional network is the grid feature (three-dimensional voxel feature) obtained after rasterizing (voxelizing) the original three-dimensional point cloud.
  • the compact features obtained by the compaction module and the upsampled features obtained by the upsampling module achieve feature fusion through the additive weighted average method.
  • the present invention may also include the following steps S108 and S109:
  • Step S108 Use the training set to conduct overall training on the main detection network and the branch detection network, and obtain the network parameters of the main detection network and the network parameters of the branch detection network respectively, where the training set includes training data containing vehicles and data containing vulnerable road users. ;
  • the main body detection network and the branch detection network can be trained as a whole using training data containing vehicles and training sets containing data of vulnerable road users to obtain network parameters of the main body detection network and branch detection network. parameter.
  • Step S109 Use the data including vulnerable road users in the training set to separately train the branch detection network.
  • the data containing disadvantaged road users in the training set can be extracted, and the branch detection network can be trained separately. That is, using a two-stage training method, the main body detection network and the branch detection network are first trained as a whole, and then in the finetune stage, the branch detection network is trained separately using the data containing disadvantaged road users in the training set.
  • step S109 may further include:
  • Step S1091 Initialize the network parameters of the branch detection network obtained in the overall training
  • Step S1092 Keep the network parameters of the subject detection network obtained in the overall training unchanged;
  • Step S1093 Use data containing vulnerable road users to train the branch detection network.
  • the network parameters of the main body detection network can be kept unchanged, the network parameters of the branch detection network can be initialized, and the data containing vulnerable road users can be used to
  • the networks are trained individually to obtain trained branch networks.
  • the branch detection network can be further effectively trained to detect vulnerable road users, so that the detection network combined with the main detection network and the branch detection network is more effective for vehicles and vulnerable road users. Users can have better detection results.
  • the present invention can implement all or part of the process in the method of the above-mentioned embodiment, and can also be completed by instructing relevant hardware through a computer program.
  • the computer program can be stored in a computer-readable file. In the storage medium, when the computer program is executed by the processor, the steps of each of the above method embodiments can be implemented.
  • the computer program includes computer program code, and the computer program The code can be in the form of source code, object code, executable file or some intermediate form, etc.
  • the computer-readable storage medium may include: any entity or device capable of carrying the computer program code, media, USB flash drive, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory, random access memory, electrical carrier wave signals, telecommunications signals, and software distribution media, etc. It should be noted that the content contained in the computer-readable storage medium can be appropriately added or deleted according to the requirements of legislation and patent practice in the jurisdiction. For example, in some jurisdictions, according to legislation and patent practice, computer-readable storage media Storage media does not include electrical carrier signals and telecommunications signals.
  • the present invention also provides a control device.
  • the control device includes a processor and a storage device.
  • the storage device can be configured to store a program for executing the target detection method for automatic driving in the above method embodiment.
  • the processor can be configured to use When executing the program in the storage device, the program includes but is not limited to a program that executes the target detection method for automatic driving in the above method embodiment.
  • the control device may be a control device device including various electronic devices.
  • the present invention also provides a computer-readable storage medium.
  • the computer-readable storage medium may be configured to store a program for executing the target detection method for autonomous driving in the above method embodiment, and the program may be loaded and run by a processor to Implement the above target detection method for autonomous driving.
  • the computer-readable storage medium may be a storage device formed by various electronic devices.
  • the computer-readable storage medium is a non-transitory computer-readable storage medium.
  • the present invention also provides a vehicle.
  • the vehicle may include a vehicle-mounted lidar and the control device in the above control device embodiment.
  • Vehicle-mounted lidar can be used to obtain raw point cloud data.
  • each module is only to illustrate the functional units of the device of the present invention
  • the physical devices corresponding to these modules may be the processor itself, or a part of the software in the processor, a part of the hardware, or Part of the combination of software and hardware. Therefore, the number of individual modules in the figure is only illustrative.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Traffic Control Systems (AREA)

Abstract

La présente invention se rapporte au domaine technique de la conduite autonome. L'invention concerne particulièrement un procédé de détection cible pour une conduite autonome, et un appareil, un support et un véhicule, qui visent à résoudre le problème de la manière d'augmenter la vitesse de fonctionnement de détection cible tout en garantissant une fonction de conduite autonome, de façon à réduire la pression de calcul d'une puce d'extrémité de véhicule. Afin d'atteindre l'objectif, dans la présente invention, une caractéristique de nuage de points voxélisée dans chaque voxel peut être acquise au niveau d'un étage de voxélisation de données de nuage de points d'origine. Au moyen de la configuration ci-dessus, dans la présente invention, il n'est pas nécessaire d'utiliser en outre un modèle pour effectuer une extraction de caractéristiques et une moyenne sur des données de nuage de points voxelisées, et des caractéristiques de données de nuage de points dans des voxels sont intégrées au niveau d'un étage de voxélisation, de telle sorte que, tout en garantissant une fonction de conduite autonome, un processus d'extraction préliminaire de données de nuage de points d'origine est simplifié, la vitesse de fonctionnement de détection cible est augmentée, et la pression de calcul d'une puce d'extrémité de véhicule est réduite.
PCT/CN2023/094927 2022-05-19 2023-05-18 Procédé de détection cible pour conduite autonome, et appareil, support et véhicule WO2023222062A1 (fr)

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN202210556974.2A CN114943950A (zh) 2022-05-19 2022-05-19 自动驾驶的目标检测方法、电子设备、介质及车辆
CN202210558014.X 2022-05-19
CN202210556974.2 2022-05-19
CN202210558014.XA CN114943951A (zh) 2022-05-19 2022-05-19 自动驾驶的目标检测方法、装置、介质及车辆

Publications (1)

Publication Number Publication Date
WO2023222062A1 true WO2023222062A1 (fr) 2023-11-23

Family

ID=88834706

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/094927 WO2023222062A1 (fr) 2022-05-19 2023-05-18 Procédé de détection cible pour conduite autonome, et appareil, support et véhicule

Country Status (1)

Country Link
WO (1) WO2023222062A1 (fr)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111144242A (zh) * 2019-12-13 2020-05-12 中国科学院深圳先进技术研究院 一种三维目标检测方法、装置及终端
CN111199206A (zh) * 2019-12-30 2020-05-26 上海眼控科技股份有限公司 三维目标检测方法、装置、计算机设备及存储介质
CN112347987A (zh) * 2020-11-30 2021-02-09 江南大学 一种多模数据融合的三维目标检测方法
CN112598635A (zh) * 2020-12-18 2021-04-02 武汉大学 一种基于对称点生成的点云3d目标检测方法
US20210287037A1 (en) * 2019-04-11 2021-09-16 Tencent Technology (Shenzhen) Company Limited Object detection method and apparatus, electronic device, and storage medium
CN113705631A (zh) * 2021-08-10 2021-11-26 重庆邮电大学 一种基于图卷积的3d点云目标检测方法
CN114943951A (zh) * 2022-05-19 2022-08-26 安徽蔚来智驾科技有限公司 自动驾驶的目标检测方法、装置、介质及车辆
CN114943950A (zh) * 2022-05-19 2022-08-26 安徽蔚来智驾科技有限公司 自动驾驶的目标检测方法、电子设备、介质及车辆

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210287037A1 (en) * 2019-04-11 2021-09-16 Tencent Technology (Shenzhen) Company Limited Object detection method and apparatus, electronic device, and storage medium
CN111144242A (zh) * 2019-12-13 2020-05-12 中国科学院深圳先进技术研究院 一种三维目标检测方法、装置及终端
CN111199206A (zh) * 2019-12-30 2020-05-26 上海眼控科技股份有限公司 三维目标检测方法、装置、计算机设备及存储介质
CN112347987A (zh) * 2020-11-30 2021-02-09 江南大学 一种多模数据融合的三维目标检测方法
CN112598635A (zh) * 2020-12-18 2021-04-02 武汉大学 一种基于对称点生成的点云3d目标检测方法
CN113705631A (zh) * 2021-08-10 2021-11-26 重庆邮电大学 一种基于图卷积的3d点云目标检测方法
CN114943951A (zh) * 2022-05-19 2022-08-26 安徽蔚来智驾科技有限公司 自动驾驶的目标检测方法、装置、介质及车辆
CN114943950A (zh) * 2022-05-19 2022-08-26 安徽蔚来智驾科技有限公司 自动驾驶的目标检测方法、电子设备、介质及车辆

Similar Documents

Publication Publication Date Title
Lei et al. Octree guided cnn with spherical kernels for 3d point clouds
KR102595399B1 (ko) 미지의 클래스들의 검출 및 미지의 클래스들에 대한 분류기들의 초기화
Ahmed et al. Deep learning advances on different 3D data representations: A survey
Zhi et al. LightNet: A Lightweight 3D Convolutional Neural Network for Real-Time 3D Object Recognition.
CN111160214B (zh) 一种基于数据融合的3d目标检测方法
US10540988B2 (en) Method and apparatus for sound event detection robust to frequency change
US20160328645A1 (en) Reduced computational complexity for fixed point neural network
CN108351984A (zh) 硬件高效的深度卷积神经网络
CN109426828B (zh) 点云分类方法、装置、设备及存储介质
CN114943951A (zh) 自动驾驶的目标检测方法、装置、介质及车辆
CN111414953B (zh) 点云分类方法和装置
KR20180048930A (ko) 분류를 위한 강제된 희소성
TW201633181A (zh) 用於經非同步脈衝調制的取樣信號的事件驅動型時間迴旋
CN112270332A (zh) 一种基于子流稀疏卷积的三维目标检测方法及系统
US20230153965A1 (en) Image processing method and related device
Sardar et al. Hardware implementation of real-time, high performance, RCE-NN based face recognition system
CN111680642A (zh) 一种地形分类方法及装置
CN110070867A (zh) 语音指令识别方法、计算机装置及计算机可读存储介质
CN113554084A (zh) 基于剪枝和轻量卷积的车辆再辨识模型压缩方法及系统
CN114821251B (zh) 一种点云上采样网络的确定方法及确定装置
CN114943950A (zh) 自动驾驶的目标检测方法、电子设备、介质及车辆
CN112613541A (zh) 目标检测方法及装置、存储介质及电子设备
Xu et al. Multi‐pyramid image spatial structure based on coarse‐to‐fine pyramid and scale space
CN114022706A (zh) 一种图像分类模型的优化方法、装置、设备及存储介质
WO2014107947A1 (fr) Procédé et dispositif de reconnaissance pour support de type feuille

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23807013

Country of ref document: EP

Kind code of ref document: A1