WO2021081808A1 - 基于人工神经网络的物体检测的系统及方法 - Google Patents

基于人工神经网络的物体检测的系统及方法 Download PDF

Info

Publication number
WO2021081808A1
WO2021081808A1 PCT/CN2019/114357 CN2019114357W WO2021081808A1 WO 2021081808 A1 WO2021081808 A1 WO 2021081808A1 CN 2019114357 W CN2019114357 W CN 2019114357W WO 2021081808 A1 WO2021081808 A1 WO 2021081808A1
Authority
WO
WIPO (PCT)
Prior art keywords
attention
feature map
coefficient
target object
neural network
Prior art date
Application number
PCT/CN2019/114357
Other languages
English (en)
French (fr)
Inventor
蒋卓键
陈晓智
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to CN201980008366.4A priority Critical patent/CN111602138B/zh
Priority to PCT/CN2019/114357 priority patent/WO2021081808A1/zh
Publication of WO2021081808A1 publication Critical patent/WO2021081808A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/56Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
    • G06V20/58Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects

Definitions

  • the present invention relates to the technical field of three-dimensional object detection and deep learning, and more specifically, to a system and method for object detection based on artificial neural networks.
  • Safety is one of the most concerned issues in autonomous driving.
  • the unmanned vehicle's accurate perception of the surrounding environment is the basis for ensuring safety. Therefore, the accuracy of the algorithm is very important.
  • unmanned vehicles need to detect the surrounding three-dimensional objects.
  • most of the laser radar is used to detect three-dimensional objects.
  • the traditional detection method faces the situation that the three-dimensional object to be detected is partially occluded, the point cloud is occluded, resulting in poor detection results.
  • the present invention provides a system and method for object detection based on artificial neural network, which can further improve the prediction effect of occluded objects compared with the prior art.
  • an artificial neural network-based object detection method includes: obtaining a three-dimensional point cloud, and using a backbone neural network to obtain a first feature map of the three-dimensional point cloud; using attention branch neural The network processes the first feature map and obtains a second feature map, where the second feature map is also used to obtain a loss function of the target object, and the loss function is used to update the attention branch neural network The network coefficients; the prediction result is obtained according to the second feature map, and the prediction result includes the location information of the target object.
  • the predicted attention coefficient of the visible part of the occluded target object is made higher than the predicted attention coefficient of the occluded part, which can be used to a greater extent in the prediction process.
  • the visible part can more accurately predict the location and size of the target object and other information.
  • the method of object detection based on artificial neural network can be applied to the field of autonomous driving of unmanned equipment such as drones or unmanned vehicles, and is used to predict the mobility of unmanned driving. Obstacles in the surrounding environment of the device (such as other vehicles, pedestrians, etc.), where the obstacle (ie, the target object) may be a partially occluded object.
  • an artificial neural network-based object detection model can be trained through deep learning of a neural network that can obtain target object position information and size information based on the visible part of the occluded target object.
  • the object detection model in the process of training, can be based on the attention mechanism to give more weight to the information of the visible part of the target object, that is, it is more sensitive to the information of the visible part, so that the object detection model can be used in In the subsequent prediction process, the information of the target object can be obtained more accurately according to the visible part of the target object.
  • the using the attention branch neural network to process the first feature map and obtain a second feature map includes: dividing candidates into the first feature map Frame; through the attention branch neural network to generate a predictive attention coefficient for the candidate frame of the first feature map, wherein the value of the predicted attention coefficient of each candidate frame is based on the first The sample feature map matched by the feature map is determined; the predicted attention coefficient and the first feature map are dot-multiplied to obtain the second feature map.
  • a sample library may be established first, and the sample library may include a sample feature map of the target object.
  • the sample feature map includes the true value attention coefficient.
  • the true attention coefficient corresponding to the visible part of the target object is higher than the true attention coefficient of the occluded part.
  • the sample feature map of the target object may be the same as the point cloud feature information of each part of the first feature map acquired by the detection model, and the difference is only that the attention coefficients generated at each position are different.
  • the method further includes: comparing the predicted attention coefficient of the candidate frame of the second feature map with the true value in the sample feature map corresponding to the candidate frame.
  • the attention coefficient of the value box when the confidence of the predicted attention coefficient of the candidate box and the true value attention coefficient of the true value box is higher than the first threshold, according to the predicted attention coefficient and the true value
  • the value attention coefficient determines the result of the attention loss function; the attention branch neural network coefficient is updated according to the result of the attention loss function, so that the confidence level of the attention branch neural network coefficient is higher than the second threshold .
  • the second threshold may be higher than the first threshold.
  • the predicted attention coefficient after updating the predicted attention coefficient, the predicted attention coefficient is closer to its corresponding true value attention coefficient.
  • the values of the first threshold and the second threshold can be flexibly set, which is not limited in the embodiment of the present application.
  • the method further includes: when the predicted attention coefficient of the second feature map is updated, taking the updated predicted attention coefficient The natural constant e exponential operation.
  • the predicted attention coefficient corresponding to the visible part of the target object and the predicted attention coefficient corresponding to the occluded part can be distinguished more clearly, highlighting the information of the visible part.
  • the updating the predicted attention coefficient according to the result of the attention loss function includes: passing inversely according to the result of the attention loss function
  • the forward propagation algorithm updates the predicted attention coefficient
  • the attention loss function is Wherein k is the number of feature point candidate frame, L a loss function is smooth L1, m k is the prediction coefficient attention, k T is the true value of the coefficient attention.
  • the obtaining a three-dimensional point cloud and using a backbone neural network to obtain a first feature map of the three-dimensional point cloud includes: obtaining a three-dimensional occluded target object Point cloud data; divide the three-dimensional point cloud data into a three-dimensional network, and obtain a plurality of three-dimensional space voxels; obtain the point cloud characteristics of the voxels according to the point cloud density in each voxel; use all The backbone neural network extracts the point cloud features, and generates the first feature map.
  • the generating a predictive attention coefficient for the candidate frame of the first feature map through the attention branch neural network includes: the attention branch neural network The network generates the predicted attention coefficient through one or more of a convolution operation, a full connection, and a variant of the convolution operation.
  • the method further includes: performing object detection on the target object through the artificial neural network, and obtaining feature map candidates corresponding to the visible part of the target object The three-dimensional position and confidence of the frame; sort the confidence, and select a candidate frame with a confidence higher than a third threshold; predict the information of the target object according to the candidate frame with the confidence higher than the third threshold.
  • the candidate frames are screened according to the confidence, and the prediction result of the target object is determined according to the information of the candidate frame with the higher confidence.
  • the information of the target object includes the position and/or size of the target object.
  • the method further includes: displaying a prediction result obtained according to the second feature map.
  • the method provided in the embodiment of the present application obtains a feature map with a predicted attention coefficient of the visible part higher than the predicted attention coefficient of the occluded part, and obtains the target object according to the feature map The prediction result, where the prediction result can be directly displayed by the monitor.
  • an object detection system based on an artificial neural network including at least one processor and a lidar, wherein the lidar is used to obtain a three-dimensional point cloud; and the three-dimensional point of the target object
  • the cloud is input to the processor; the processor is configured to perform three-dimensional grid division on the three-dimensional point cloud to obtain multiple voxels; the processor is further configured to perform a three-dimensional meshing of the three-dimensional point cloud to obtain a plurality of voxels; Density, determining the point cloud feature of the corresponding position of the voxel; the processor is further configured to extract the point cloud feature through the backbone network of the object detection model, and generate a first feature map of the target object; The processor is further configured to generate a predictive attention coefficient in the first feature map through the attention branch neural network of the object detection model; the processor is also configured to use the loss function branch neural network according to the sample The true value attention coefficient in the feature map and the predicted attention coefficient are the result of calculating the attention loss function; the processor
  • the first feature map determined according to the three-dimensional point cloud data of the occluded target object can be used to generate the second feature map with the predicted attention coefficient at each position, and the attention generated according to the predicted attention coefficient and the true value attention
  • the predicted attention coefficient of the visible part of the occluded target object is higher than the predicted attention coefficient of the occluded part, and the visible part can be used to a greater extent in the prediction process. Part, more accurately predict the location and size of the target object and other information.
  • the processor is further configured to divide candidate frames for the first feature map; the processor is further configured to branch nerves through the attention The network generates a predictive attention coefficient for the candidate frame of the first feature map; the processor is further configured to multiply the predicted attention coefficient and the first feature map to obtain the second feature map .
  • a sample library can be established first, and the sample library can include a sample feature map of the target object.
  • the sample feature map includes the true value attention coefficient.
  • the true attention coefficient corresponding to the visible part of the target object is higher than the true attention coefficient of the occluded part.
  • the processor is further configured to compare the predicted attention coefficient of the candidate frame of the second feature map with the sample feature map corresponding to the candidate frame The attention coefficient of the truth-value box; the processor is further configured to, when the confidence of the predicted attention coefficient of the candidate box and the true-value attention coefficient of the truth-value box is higher than the first threshold, according to The predicted attention coefficient and the true value attention coefficient determine the result of the attention loss function; the processor is further configured to update the predicted attention coefficient according to the result of the attention loss function, so that The confidence of the predicted attention coefficient is higher than the second threshold.
  • the processor when the processor is configured to update the predicted attention coefficient of the second feature map, perform an update on the updated predicted attention coefficient Take the natural constant e exponential operation.
  • the predicted attention coefficient corresponding to the visible part of the target object and the predicted attention coefficient corresponding to the occluded part can be distinguished more clearly, highlighting the information of the visible part.
  • the processor is further configured to update the predicted attention coefficient through a backpropagation algorithm according to the result of the attention loss function.
  • the attention loss function is Wherein k is the number of feature point candidate frame, L a loss function is smooth L1, m k is the prediction coefficient attention, k T is the true value of the coefficient attention.
  • the processor is further configured to obtain three-dimensional point cloud data of the occluded target object; the processor is also configured to convert the three-dimensional point cloud The data is divided into a three-dimensional network, and a plurality of three-dimensional space voxels are obtained; the processor is further configured to obtain the point cloud characteristics of the voxels according to the point cloud density in each voxel; the processor And is also used to extract the point cloud features using the backbone neural network and generate the first feature map.
  • the processor is configured to generate a predictive attention coefficient for the candidate frame of the first feature map through the attention branch neural network, including:
  • the attention branch neural network generates the predicted attention coefficient through one or more of a convolution operation, a full connection, and a variant of a convolution operation.
  • the processor is configured to perform object detection on the target object through the artificial neural network, and obtain a feature map corresponding to the visible part of the target object The three-dimensional position and confidence of the candidate frame; the processor is further configured to sort the confidence and select candidate frames with a confidence higher than a third threshold; the processor is further configured to The candidate frame whose degree is higher than the third threshold predicts the information of the target object.
  • the information of the target object includes the position and/or size of the target object.
  • the system further includes a display, and the display is configured to display the prediction result obtained according to the second characteristic map.
  • the system provided by the embodiment of the present application may be applied to a movable device in the field of unmanned driving, and the movable device may be an unmanned aerial vehicle or an unmanned vehicle.
  • the mobile device can collect the three-dimensional point cloud of the occluded target object through lidar, and predict the position and/or size information of the target object based on the visible part of the occluded object.
  • an artificial neural network-based object detection system includes a processing module and a receiving module, wherein the system is used to execute the method described in any implementation manner of the first aspect.
  • a computer storage medium on which a computer program is stored.
  • the computer program executes the method provided in the first aspect.
  • a chip system in a fifth aspect, includes at least one processor, and when program instructions are executed in the at least one processor, the method according to any one of the first aspects can be implemented .
  • a computer program product containing instructions is provided, which when executed by a computer causes the computer to execute the method provided in the first aspect.
  • the method of object detection based on artificial neural network can be applied to the field of autonomous driving of unmanned equipment such as drones or unmanned vehicles, and is used to predict the surrounding environment of unmanned movable equipment Obstacles in (such as other vehicles, pedestrians, etc.), where the obstacle (ie, the target object) may be a partially occluded object.
  • an artificial neural network-based object detection model can be trained through deep learning of a neural network that can obtain target object position information and size information based on the visible part of the occluded target object.
  • the object detection model in the process of training, can be based on the attention mechanism to give more weight to the information of the visible part of the target object, that is, it is more sensitive to the information of the visible part, so that the object detection model can be used in In the subsequent prediction process, the information of the target object can be obtained more accurately according to the visible part of the target object.
  • FIG. 1 shows a schematic diagram of a scene where the method for object detection based on artificial neural network provided by an embodiment of the present application is applied.
  • Fig. 2 shows a schematic flow chart of the method for object detection based on artificial neural network provided by an embodiment of the present application.
  • Fig. 3 shows a schematic flow chart of the method for object detection based on artificial neural network provided by an embodiment of the present application.
  • Fig. 4 shows a schematic diagram of an object detection system based on artificial neural network provided by an embodiment of the present application.
  • Fig. 5 shows a schematic diagram of another object detection system based on artificial neural network provided by an embodiment of the present application.
  • the attention mechanism is to focus on important points and ignore other unimportant factors.
  • the attention mechanism is similar to the human visual attention mechanism.
  • human vision can quickly scan the global image to obtain the target area that needs to be focused on, which is commonly referred to as the focus of attention.
  • This area devotes more attention resources to obtain more detailed information that needs to be paid attention to, while suppressing other useless information.
  • the judgment of the importance level may depend on the application scenario.
  • the attention mechanism is divided into spatial attention and temporal attention.
  • the former is generally used for image processing, and the latter is generally used for natural language processing.
  • the embodiments of the present application mainly relate to spatial attention.
  • the object detection method provided by the embodiment of the present application can be applied to an automatic driving scene (as shown in Fig. 1).
  • unmanned vehicles can use lidar to acquire a three-dimensional point cloud to detect the surrounding environment, and detect three-dimensional objects in the surrounding environment, and when the detected three-dimensional object is partially occluded, the point cloud Missing leads to missed detection, and the detection effect will be greatly reduced.
  • the embodiment of the present application improves the neural network training strategy in the deep learning algorithm.
  • the attention network branch is added to improve the utilization of the key point cloud in the visible part, thereby improving the detection effect of occluded three-dimensional objects.
  • Fig. 2 shows a schematic flow chart of the method for object detection based on artificial neural network provided by an embodiment of the present application. Including the following steps.
  • S101 Acquire a three-dimensional point cloud, and use a backbone neural network to obtain a first feature map of the three-dimensional point cloud.
  • the three-dimensional point cloud is the three-dimensional point cloud data of the partially occluded target object.
  • the detection model can include a backbone network and network branches.
  • the backbone network can be used to receive three-dimensional point cloud data and generate feature maps based on the three-dimensional point cloud data; the network branches can be used for computing The loss function of the network.
  • the loss function is the loss function related to the confidence, position and attention coefficient. These loss functions can guide the update of network parameters, such as the attention coefficient, etc., so that the neural network detection model can be based on the target object.
  • the occlusion part more accurately predicts the position and size of the target object, and has better prediction performance.
  • the neural network detection model before generating the feature map from the point cloud data, can voxelize the three-dimensional space of the target object, and determine the point cloud at the location of the voxel according to the point cloud density in each voxel feature. This process can transform the point cloud data of the target object into dimensions that the neural network can receive.
  • the neural network detection model may first perform three-dimensional network division processing on the point cloud of the target object at a certain resolution in the xyz direction to obtain voxels in multiple three-dimensional spaces; then determine based on the point cloud density in the voxels The point cloud feature corresponding to this voxel. Among them, for a voxel with a point cloud, calculate the point cloud density P in the voxel, and set the point cloud feature at that position as P; for a voxel without a point cloud, set its point cloud feature to 0 .
  • the neural network detection model can extract point cloud features through a backbone network and generate a first feature map.
  • the backbone network can be any network structure, and the size of the first feature map can be H*W, and the specific values of H and W are not limited in the embodiment of the present application.
  • the data input to the neural network detection model is not limited to the point cloud data of the target object, but may also be the image information of the target object, such as the RGB image information of the target object.
  • the predictive attention coefficient may be generated for the candidate frame of the first feature map through the attention branch neural network.
  • each position of the first feature map may refer to each candidate frame obtained after dividing the first feature map, and the size of the candidate frame can be flexibly set according to needs, which is not limited in this application.
  • the attention branch neural network can generate corresponding prediction attention coefficients at various positions of the first feature map in a variety of ways, such as convolution operations, full connections, and variants of convolution (such as SPAS, STAR, SGRS, SPARSE, etc.).
  • the value of the initially generated predicted attention coefficient on the first feature map may be a preset default value; or, the value of the initially generated predicted attention coefficient may be an empirical value.
  • a sample library may be established first, and the sample library includes a sample feature map of the target object, wherein the sample feature map can be divided into truth value boxes, and The size of the truth box may be the same as the size of the candidate box in the second feature map.
  • each part of the target object in the sample feature map may be labeled with the true value attention coefficient in advance.
  • the true value of the attention coefficient can be pre-marked as a negative number or a positive number less than 1; for the unobstructed part of the target object (also visible part), the true value can be pre-marked
  • the value of the attention coefficient is a positive number greater than 1.
  • the natural logarithm e exponent may be taken for the true value attention coefficient in the sample feature map, so that the information of the unoccluded part is more prominent.
  • the object detection model training process is to make the attention coefficient of the visible part of the target object higher or much higher than the attention coefficient of the occluded part of the target object.
  • the higher the attention coefficient means that the part of the point cloud is the key point cloud for predicting the target object.
  • the degree of attention and utilization of the part of the point cloud will be higher.
  • the second feature map can be obtained according to the first feature map generated with the predicted attention coefficient.
  • the generated predicted attention coefficient and the first feature map may be dot-multiplied by the least square method to obtain the second feature map.
  • the second feature map is obtained after the corresponding predicted attention coefficients are generated in each candidate frame based on the first feature map, and the second feature map can also be understood as an attention feature map (that is, an attention feature map).
  • the predicted attention coefficients in each candidate frame in the second feature map are default values or empirical values, it cannot be guaranteed that the predicted attention coefficients of the visible part of the target object are higher or much higher than those of the occluded part. Attention coefficient. In this case, it is necessary to correct and update the predicted attention coefficient in the second feature map with reference to the true attention coefficient in the sample feature map, so that the confidence of the predicted attention coefficient and the true attention coefficient The first threshold is reached.
  • the attention branch neural network is trained to generate a high attention coefficient in the visible part of the target object, and a lower attention coefficient is generated in the occluded part of the target object, so that the target can be obtained more accurately
  • the part of the feature map corresponding to the visible part of the object is convenient for subsequent use of the visible part to predict the information of the target object.
  • the neural network model of the detected object can correct and update the predicted attention coefficient through the attention coefficient loss function. Specifically, compare the predicted attention coefficient in the candidate frame of the second feature map with the true value attention coefficient in the true value frame of the sample feature map corresponding to the candidate frame; when the predicted attention coefficient of the candidate frame is compared with the true value
  • the result of the attention loss function is determined according to the predicted attention coefficient and the true attention coefficient; when calculated according to the predicted attention coefficient and the true attention coefficient
  • the attention branch neural network coefficient is updated according to the structure of the attention loss function, so that the confidence of the attention branch neural network coefficient is higher than the second threshold.
  • attention loss function ⁇ n 0 k [L a (m k, t k)], k is the number of candidate frame feature point, L a smooth smooth L1 by a minimum loss function, k is m Predict the attention function, t k is the actual true value attention coefficient.
  • a backpropagation algorithm can be used to correct and update the network coefficients of the attention branch neural network using the result.
  • the predicted attention coefficient can be made closer to the value of the true attention coefficient.
  • the updated predicted attention coefficient can be operated to take the natural constant e exponent, so that the attention coefficient of the visible part is equal to that of the occluded part.
  • the attention coefficient has a more obvious difference to highlight the information in the visible part.
  • the attention coefficient corresponding to the visible part of the target object in the second feature map is higher, so that the object detection neural network model has a higher level of attention to the visible part.
  • the information is more sensitive, and the visible part of the information is used to a greater extent to predict the position and size of the entire target object.
  • the attention branch neural network can generate a high prediction attention coefficient in the visible part of the target object in the subsequent detection process of the target object.
  • the occluded part of the target object generates a lower predictive attention coefficient, that is, the attention branch neural network can be more sensitive to the information of the visible part of the target object after the above-mentioned training process, so that the object detection model can be used in the actual prediction process. Pay more attention to the visible part of the information to improve the detection effect of the occluded target object.
  • the neural network object detection model can obtain a prediction result based on the second feature map after the prediction attention coefficient is corrected or updated.
  • the prediction result can include the position information of the target object or the size of the target object. And other information.
  • the point cloud data or image information data of the target object can be input into the detection model, and the detection model can filter out
  • the data or information belonging to the unoccluded part in the point cloud data or image information is predicted based on the data or information of the unoccluded part, and information such as the position or size of the entire occluded target object is predicted.
  • the three-dimensional position and confidence level corresponding to the candidate frame of the detected object can be obtained; after the candidate frames are sorted by the confidence level, according to the confidence level from high to low
  • the sequence can screen out a certain number of candidate frames, wherein the confidence of the screened candidate frames can all be higher than the third threshold; the position and size of the target object can be predicted according to the screened candidate frames with higher confidence.
  • the process of predicting the size or position of the entire object based on the point cloud data or image information of some key parts of the target object through the deep learning algorithm of the neural network can refer to the existing process, and will not be repeated here.
  • the method for object detection based on artificial neural networks can be applied to the field of autonomous driving, where unmanned vehicles or drones can predict the location and size of obstacles in the surrounding environment. Scene.
  • the feature map when the feature map is generated based on the information of the obstacle, the feature map will be directly input into the branch network used to calculate the loss function of position and confidence, that is, in this case, Basically the same attention is given to the obscured and visible parts of the obstacle.
  • the obstruction is partially obscured, the valuable information used to predict the position of the obstacle, the confidence level, etc. is missing, which will result in a poor detection effect. .
  • an attention network branch is added to the artificial neural network, and the attention network branch is trained to learn to accurately identify the visible part of an obstacle, and then it is used by the object detection model
  • the key information of the visible part predicts the location, size, or confidence of obstacles, so that unmanned vehicles or drones can accurately learn the distribution and size of obstacles in the surrounding environment to make accurate driving trajectories.
  • the method of object detection based on artificial neural network provided in the embodiments of the present application can still use lidar for detection, and does not need to use other sensors for fusion, thereby reducing hardware costs.
  • FIG. 3 shows a schematic flowchart of an object detection method based on an artificial neural network provided by an embodiment of the present application. The process includes the following steps.
  • the object detection method provided in the embodiments of the present application may also input an image of a target object, such as a GRB image.
  • the three-dimensional network refers to the three-dimensional mesh processing of the point cloud of the target object, that is, the three-dimensional space voxelization.
  • the spatial point cloud can be divided into a grid according to a certain resolution in the three spatial coordinate directions of xyz to obtain voxels in a three-dimensional space.
  • the point cloud feature is determined according to the point cloud density in the voxel. Among them, for a voxel with a point cloud, calculate the point cloud density in the voxel (denoted as P), and set the point cloud feature at that position as P; for a voxel without a point cloud, set its point cloud feature Set to 0. This process can transform the point cloud data of the target object into dimensions that the neural network can receive.
  • S203 Acquire a first feature map through the backbone network.
  • the first feature map is a feature map of the target object point cloud.
  • the backbone network can be any network structure, and the size of the first feature map can be H*W.
  • the embodiment of the present application does not limit the specific values of H and W.
  • the data input to the neural network detection model is not limited to the point cloud data of the target object, but may also be the image information of the target object, such as the RGB image information of the target object.
  • the related operation of the attention coefficient in the process may include generating the predicted attention coefficient corresponding to each position in the first feature map based on the first feature map.
  • Each position of the first feature map may refer to each candidate frame obtained by dividing the first feature map, and the size of the candidate frame can be flexibly set according to needs, which is not limited in this application.
  • the attention branch neural network can generate corresponding prediction attention coefficients at various positions of the first feature map in a variety of ways, such as convolution operations, full connections, and variants of convolution (such as SPAS, STAR, SGRS, SPARSE, etc.).
  • the second feature map of the second feature map is obtained after the corresponding predicted attention coefficients are generated in each candidate frame based on the first feature map.
  • the second feature map can also be understood as an attention feature map (ie attention feature map). ).
  • the predicted attention coefficients in each candidate frame in the second feature map are default values or empirical values, it cannot be guaranteed that the predicted attention coefficients of the visible part of the target object are higher or much higher than those of the occluded part. Attention coefficient. In this case, it is necessary to correct and update the predicted attention coefficient in the second feature map with reference to the true attention coefficient in the sample feature map, so that the confidence of the predicted attention coefficient and the true attention coefficient The first threshold is reached.
  • the attention branch neural network is trained to generate a high attention coefficient in the visible part of the target object, and a lower attention coefficient is generated in the occluded part of the target object, so that the target can be obtained more accurately
  • the part of the feature map corresponding to the visible part of the object is convenient for subsequent use of the visible part to predict the information of the target object.
  • the neural network model of the detected object can correct and update the predicted attention coefficient through the attention coefficient loss function. Specifically, compare the predicted attention coefficient in the candidate frame of the second feature map with the true value attention coefficient in the true value frame of the sample feature map corresponding to the candidate frame; when the predicted attention coefficient of the candidate frame is compared with the true value
  • the result of the attention loss function is determined according to the predicted attention coefficient and the true attention coefficient; when calculated according to the predicted attention coefficient and the true attention coefficient
  • the attention branch neural network coefficient is updated according to the structure of the attention loss function, so that the confidence of the attention branch neural network coefficient is higher than the second threshold.
  • attention loss function ⁇ n 0 k [L a (m k, t k)], k is the number of feature point candidate box, L a loss function is smooth L1, m and k is a function of the predicted attention , T k is the actual true value attention coefficient.
  • a backpropagation algorithm can be used to correct and update the network coefficients of the attention branch neural network using the result.
  • the predicted attention coefficient can be made closer to the value of the true attention coefficient.
  • the updated predicted attention coefficient can be operated to take the natural constant e exponent, so that the attention coefficient of the visible part is equal to that of the occluded part.
  • the attention coefficient has a more obvious difference to highlight the information in the visible part.
  • the attention coefficient corresponding to the visible part of the target object in the second feature map is higher, so that the object detection neural network model has a higher level of attention to the visible part.
  • the information is more sensitive, and the visible part of the information is used to a greater extent to predict the position and size of the entire target object.
  • the attention branch neural network can generate a high prediction attention coefficient in the visible part of the target object in the subsequent detection process of the target object.
  • the occluded part of the target object generates a lower predictive attention coefficient, that is, the attention branch neural network can be more sensitive to the information of the visible part of the target object after the above-mentioned training process, so that the object detection model can be used in the actual prediction process. Pay more attention to the visible part of the information to improve the detection effect of the occluded target object.
  • the neural network object detection model can obtain a prediction result based on the second feature map after the prediction attention coefficient is corrected or updated.
  • the prediction result can include the position information of the target object or the size of the target object. And other information.
  • the three-dimensional position and confidence level corresponding to the candidate frame of the detected object can be obtained; after the candidate frames are sorted by the confidence level, according to the confidence level from high to low
  • the sequence can filter out candidate frames of certain data; predict the position and size of the target object according to the selected candidate frames with higher confidence.
  • the process of predicting the size or position of the entire object based on the point cloud data or image information of some key parts of the target object through the deep learning algorithm of the neural network can refer to the existing process, and will not be repeated here.
  • the method of object detection based on artificial neural network can be applied to scenarios where unmanned vehicles or drones and other unmanned vehicles or drones predict the location and size of obstacles in the surrounding environment. .
  • the feature map will be directly input into the branch network used to calculate the position and the loss function of the confidence. Therefore, it will affect the obstacles.
  • the occluded part and the visible part are given basically the same degree of attention, that is, in this case, because the obstacle is partially occluded, the valuable information used to predict the position of the obstacle, the confidence level, etc. is partially missing, and the detection effect is better. difference.
  • an attention network branch is added to the artificial neural network, and the attention network branch is trained to accurately identify the visible part of the obstacle, and then used by the object detection model
  • the key information of the visible part predicts the location, size, or confidence of obstacles, so that unmanned vehicles or drones can accurately learn the distribution and size of obstacles in the surrounding environment to make accurate driving trajectories.
  • Fig. 4 shows a schematic diagram of an object detection system based on artificial neural network provided by an embodiment of the present application.
  • the system 300 includes at least one lidar 310 and a processor 320.
  • the system 300 may be a distributed sensing and processing system installed on an autonomous vehicle.
  • at least one lidar 310 may be installed on the roof and be a rotating lidar; the lidar 310 may also be installed on other autonomous vehicles. Location or use other forms of lidar.
  • the processor 320 may be a supercomputing platform installed on an autonomous vehicle, that is, the processor 320 may include one or more processing units in the form of CPU, GPU, FPGA, or ASIC, etc., for processing the information acquired by the sensors of the autonomous vehicle Sensor data.
  • the lidar 310 is used to obtain a three-dimensional point cloud.
  • the processor 320 is configured to use the backbone neural network to obtain the first feature map of the three-dimensional point cloud.
  • the processor 320 is further configured to use the attention branch neural network to process the first feature map and obtain a second feature map.
  • Each position of the second feature map includes the predicted attention corresponding to the position.
  • the second feature map is also used to obtain the loss function of the target object, and the loss function is used to update the predicted attention coefficient.
  • the processor 320 is further configured to obtain a prediction result according to the second feature map, and the prediction result includes position information of the target object.
  • the processor 320 is further configured to divide the candidate frame for the first feature map.
  • the processor 320 is further configured to generate a predictive attention coefficient for the candidate frame of the first feature map through the attention branch neural network.
  • the processor 320 is further configured to perform dot multiplication on the predicted attention coefficient and the first feature map to obtain the second feature map.
  • the processor 320 is further configured to compare the predicted attention coefficient of the candidate frame of the second feature map with the attention coefficient of the true value frame in the sample feature map corresponding to the candidate frame.
  • the processor 320 is further configured to: when the confidence of the predicted attention coefficient of the candidate frame and the true value attention coefficient of the true value frame is higher than the first threshold, according to the predicted attention coefficient and the true value The attention coefficient determines the result of the attention loss function.
  • the processor 320 is further configured to update the predicted attention coefficient according to the result of the attention loss function, so that the confidence of the predicted attention coefficient is higher than the second threshold.
  • the processor 320 after the processor 320 is configured to update the predicted attention coefficient of the second feature map, perform an operation of taking the natural constant e exponent on the updated predicted attention coefficient.
  • the processor 320 is further configured to update the predicted attention coefficient through a backpropagation algorithm according to the result of the attention loss function.
  • the attention loss function is Wherein k is the number of feature point candidate box, L a is a smooth L1 loss function, attention prediction coefficient k m, k T is the true value of the coefficient attention.
  • the processor 320 is also used to obtain three-dimensional point cloud data of the occluded target object.
  • the processor 320 is further configured to divide the three-dimensional point cloud data into a three-dimensional network, and obtain a plurality of three-dimensional space voxels.
  • the processor 320 is further configured to obtain the point cloud feature of the voxel according to the point cloud density in each voxel.
  • the processor 320 is further configured to use the backbone neural network to extract point cloud features and generate a first feature map.
  • the processor 320 is configured to generate a predictive attention coefficient for the candidate frame of the first feature map through the attention branch neural network, including: the attention branch neural network performs convolution operation, full connection, and convolution One or more of the variants of the operation generate predictive attention coefficients.
  • the processor 320 is configured to perform object detection on the target object through an artificial neural network, and obtain the three-dimensional position and confidence of the feature map candidate frame corresponding to the visible part of the target object.
  • the processor 320 is further configured to sort the confidence levels and select candidate boxes with confidence levels higher than the third threshold.
  • the processor 320 is further configured to predict the information of the target object according to the candidate frame whose confidence is higher than the third threshold.
  • the information of the target object includes the position and/or size of the target object.
  • system 300 provided in the embodiment of the present application may further include a display, and the display is configured to display the prediction result of the target object predicted according to the second feature map.
  • an object detection model system based on artificial neural networks provided by the embodiments of the present application can be applied to the autonomous driving field of unmanned driving equipment such as unmanned aerial vehicles or unmanned vehicles to predict the potential Obstacles (such as other vehicles, pedestrians, etc.) in the surrounding environment of the mobile device, where the obstacle (ie, the target object) may be a partially occluded object.
  • the system provided by the embodiments of the present application can train an artificial neural network-based object detection model that can obtain the position information and size information of the target object according to the visible part of the occluded target object through the deep learning of the neural network.
  • the object detection model in the process of training, can be based on the attention mechanism to give more weight to the information of the visible part of the target object, that is, it is more sensitive to the information of the visible part, so that the object detection model can be used in In the subsequent prediction process, the information of the target object can be obtained more accurately according to the visible part of the target object.
  • FIG. 5 shows a schematic diagram of an object detection system based on artificial neural network provided by an embodiment of the present application.
  • the system 400 includes at least one receiving module 410 and a processing module 420.
  • the receiving module 410 is used to obtain a three-dimensional point cloud.
  • the processing module 420 is configured to use the backbone neural network to obtain the first feature map of the three-dimensional point cloud.
  • the processing module 420 is further configured to use the attention branch neural network to process the first feature map and obtain a second feature map.
  • Each position of the second feature map includes the predicted attention corresponding to the position.
  • the second feature map is also used to obtain the loss function of the target object, and the loss function is used to update the predicted attention coefficient.
  • the processing module 420 is further configured to obtain a prediction result according to the second feature map, and the prediction result includes position information of the target object.
  • the processing module 420 is further configured to divide the candidate frame for the first feature map.
  • the processing module 420 is further configured to generate a predictive attention coefficient for the candidate frame of the first feature map through the attention branch neural network.
  • the processing module 420 is further configured to perform dot multiplication on the predicted attention coefficient and the first feature map to obtain the second feature map.
  • the processing module 420 is also used to compare the predicted attention coefficient of the candidate frame of the second feature map with the attention coefficient of the true value frame in the sample feature map corresponding to the candidate frame.
  • the processing module 420 is further configured to, when the confidence of the predicted attention coefficient of the candidate frame and the true value attention coefficient of the true value frame is higher than the first threshold, according to the predicted attention coefficient and the true value The attention coefficient determines the result of the attention loss function.
  • the processing module 420 is further configured to update the predicted attention coefficient according to the result of the attention loss function, so that the confidence of the predicted attention coefficient is higher than the second threshold.
  • the processing module 420 when the processing module 420 is configured to update the predicted attention coefficient of the second feature map, perform an operation of taking the natural constant e exponent on the updated predicted attention coefficient.
  • the processing module 420 is further configured to update the predicted attention coefficient through a backpropagation algorithm according to the result of the attention loss function.
  • the attention loss function is Wherein k is the number of feature point candidate box, L a is a smooth L1 loss function, attention prediction coefficient k m, k T is the true value of the coefficient attention.
  • the processing module 420 is also used to obtain three-dimensional point cloud data of the occluded target object.
  • the processing module 420 is also used to divide the three-dimensional point cloud data into a three-dimensional network and obtain multiple three-dimensional space voxels.
  • the processing module 420 is further configured to obtain the point cloud feature of each voxel according to the point cloud density in each voxel.
  • the processing module 420 is further configured to use the backbone neural network to extract point cloud features and generate a first feature map.
  • the processing module 420 is configured to generate a predictive attention coefficient for the candidate frame of the first feature map through the attention branch neural network, including: the attention branch neural network performs convolution operation, full connection, and convolution One or more of the variants of the operation generate predictive attention coefficients.
  • the processing module 420 is used to detect the target object through an artificial neural network, and obtain the three-dimensional position and confidence of the feature map candidate frame corresponding to the visible part of the target object.
  • the processing module 420 is further configured to sort the confidence levels and select candidate boxes with confidence levels higher than the third threshold.
  • the processing module 420 is further configured to predict the information of the target object according to the candidate frame whose confidence is higher than the third threshold.
  • the information of the target object includes the position and/or size of the target object.
  • the object detection model system provided in the embodiment of the present application may further include a display, and the display is configured to display the prediction result obtained according to the second feature map.
  • the embodiment of the present invention also provides a chip system, which includes at least one processor, and when the program instructions are executed in the at least one processor, the method provided in the embodiment of the present application can be implemented.
  • An embodiment of the present invention also provides a computer storage medium on which a computer program is stored, and when the computer program is executed by a computer, the computer executes the method of the foregoing method embodiment.
  • the embodiment of the present invention also provides a computer program product containing instructions, which when executed by a computer causes the computer to execute the method of the foregoing method embodiment.
  • the computer may be implemented in whole or in part by software, hardware, firmware or any other combination.
  • software it can be implemented in the form of a computer program product in whole or in part.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, server, or data center.
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, a magnetic tape), an optical medium (for example, a digital video disc (DVD)), or a semiconductor medium (for example, a solid state disk (SSD)), etc.
  • the disclosed system, device, and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative, for example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components may be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the functional units in the various embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

一种基于人工神经网络的物体检测的系统及方法,该方法包括:获取三维点云,并使用主干神经网络获取三维点云的第一特征图(S101);使用注意力分支神经网络处理第一特征图,并获取第二特征图,该第二特征图的各个位置包括与该位置对应的预测注意力系数,第二特征图还用于获取目标物体的损失函数,该损失函数用于更新注意力分支神经网络的网络系数(S102);根据第二特征图获得预测结果,预测结果包括目标物体的位置信息(S103)。

Description

基于人工神经网络的物体检测的系统及方法
版权申明
本专利文件披露的内容包含受版权保护的材料。该版权为版权所有人所有。版权所有人不反对任何人复制专利与商标局的官方记录和档案中所存在的该专利文件或者该专利披露。
技术领域
本发明涉及三维物体检测和深度学习技术领域,并且更为具体地,涉及一种基于人工神经网络的物体检测的系统及方法。
背景技术
安全性是自动驾驶中人们最为关注的问题之一。在算法层面,无人车对周围环境的准确感知是保证安全的基础,因此,算法的精度十分重要。在无人驾驶过程中,无人车需要对周围的三维物体进行检测。目前,大多采用激光雷达对三维物体进行检测,传统的检测方法在面对被检测的三维物体被部分遮挡的情况时,会由于点云被遮挡,出现检测效果差的问题。
因此,如何提升对被遮挡的三维物体的检测效果,成为亟待解决的问题。
发明内容
本发明提供一种基于人工神经网络的物体检测的系统及方法,相对于现有技术,可以进一步提高对被遮挡物体的预测效果。
第一方面,提供了一种基于人工神经网络的物体检测的方法,所述方法包括:获取三维点云,并使用主干神经网络获取所述三维点云的第一特征图;使用注意力分支神经网络处理所述第一特征图,并获取第二特征图,其中,所述第二特征图还用于获取所述目标物体的损失函数,所述损失函数用于更新所述注意力分支神经网络的网络系数;根据所述第二特征图获得预测结果,所述预测结果包括目标物体的位置信息。
可选地,通过根据被遮挡目标物体的三维点云数据确定的第一特征图可以获得各个位置生成有预测注意力系数的第二特征图,根据预测注意力系数和真值注意力生成的注意力损失函数对该预测注意力系数进行校正或者更新后,使该被遮挡目标物体的可见部分的预测注意力系数高于被遮挡部分的 预测注意力系数,在预测过程中可以更大程度利用该可见部分,更准确地预测目标物体的位置和大小等信息。
应理解,本申请实施例提供的一种基于人工神经网络的物体检测的方法,可以应用于无人机或者无人车等无人驾驶设备的自动驾驶领域,用于预测无人驾驶的可移动设备周围环境中的障碍物(如其他车辆、行人等),其中,该障碍物(即目标物体)可以是被部分遮挡的物体。根据本申请实施例提供的方法,可以通过神经网络的深度学习,训练出能够根据被遮挡目标物体的可见部分信息获得目标物体位置信息以及尺寸信息等的基于人工神经网络的物体检测模型。其中,该物体检测模型在被训练的过程中,可以基于注意力机制,对目标物体的可见部分的信息赋予更多的权重,也即对可见部分的信息更加敏感,使得该物体检测模型可以在后续预测过程中,可以根据目标物体的可见部分,更准确地获取目标物体的信息。
结合第一方面,在第一方面的某些实现方式中,所述使用注意力分支神经网络处理所述第一特征图,并获取第二特征图,包括:对所述第一特征图划分候选框;通过所述注意力分支神经网络对所述第一特征图的候选框生成预测注意力系数,其中,每个所述候选框的所述预测注意力系数的值为根据与所述第一特征图匹配的样本特征图确定的;将所述预测注意力系数与所述第一特征图进行点乘,获取所述第二特征图。
可选地,在对检测模型进行训练之前,可以先建立样本库,该样本库中可以包括该目标物体的样本特征图,该样本特征图包括真值注意力系数,示例性的,该样本特征图中目标物体可见部分对应的真值注意力系数高于被遮挡部分的真值注意力系数。
可选地,该目标物体的样本特征图可以与该检测模型获取的第一特征图的各部分点云特征信息相同,其区别仅为各个位置生成的注意力系数不同。
结合第一方面,在第一方面的某些实现方式中,所述方法还包括:对比所述第二特征图的候选框的预测注意力系数与所述候选框对应的样本特征图中的真值框的注意力系数;当所述候选框的预测注意力系数与所述真值框的真值注意力系数的置信度高于第一阈值时,根据所述预测注意力系数和所述真值注意力系数确定注意力损失函数的结果;根据所述注意力损失函数的结果对所述注意力分支神经网络系数进行更新,使所述注意力分支神经网络系数的置信度高于第二阈值。
可选地,第二阈值可以高于第一阈值。换句话说,通过对预测注意力系数更新后,预测注意力系数和与其对应的真值注意力系数的值更为接近。其中,第一阈值和第二阈值的值可以灵活设置,本申请实施例对此不做限定。
结合第一方面,在第一方面的某些实现方式中,所述方法还包括:当对所述第二特征图的预测注意力系数更新后,对更新后的所述预测注意力系数进行取自然常数e指数操作。
应理解,对更新后的预测注意力系数进行取e指数操作后,可以使得目标物体可见部分对应的预测注意力系数与被遮挡部分对应的预测注意力系数区分更加明显,凸显可见部分的信息。
结合第一方面,在第一方面的某些实现方式中,所述根据所述注意力损失函数的结果对所述预测注意力系数进行更新,包括:根据所述注意力损失函数的结果通过反向传播算法对所述预测注意力系数进行更新。
结合第一方面,在第一方面的某些实现方式中,所述注意力损失函数为
Figure PCTCN2019114357-appb-000001
其中,k为所述候选框中的特征点个数,L a为smooth L1损失函数,m k为所述预测注意力系数,t k为所述真值注意力系数。
结合第一方面,在第一方面的某些实现方式中,所述获取三维点云,并使用主干神经网络获取所述三维点云的第一特征图,包括:获取被遮挡的目标物体的三维点云数据;将所述三维点云数据进行三维网络划分,并获得多个三维空间体素;根据每个所述体素中的点云密度,获得所述体素的点云特征;使用所述主干神经网络提取所述点云特征,并生成所述第一特征图。
结合第一方面,在第一方面的某些实现方式中,所述通过所述注意力分支神经网络对所述第一特征图的候选框生成预测注意力系数,包括:所述注意力分支神经网络通过卷积操作、全连接以及卷积操作的变种中的一种或多种方式生成所述预测注意力系数。
结合第一方面,在第一方面的某些实现方式中,所述方法还包括:通过所述人工神经网络对所述目标物体进行物体检测,并获得所述目标物体可见部分对应的特征图候选框的三维位置和置信度;对所述置信度进行排序,并选取置信度高于第三阈值的候选框;根据所述置信度高于第三阈值的候选框预测所述目标物体的信息。
应理解,在预测过程中,根据置信度对候选框进行筛选,根据该置信度较高的候选框的信息确定目标物体的预测结果。
结合第一方面,在第一方面的某些实现方式中,所述目标物体的信息包括所述目标物体的位置和/或尺寸。
结合第一方面,在第一方面的某些实现方式中,所述方法还包括:显示根据所述第二特征图获得的预测结果。
应理解,在预测目标物体的位置或者尺寸信息时,通过本申请实施例提供的方法获取可见部分预测注意力系数高于被遮挡部分预测注意力系数的特征图,并根据该特征图获取目标物体的预测结果,其中,该预测结果可以由显示器直接显示。
第二方面,提供了一种基于人工神经网络的物体检测的系统,包括至少一个处理器和激光雷达,其中,所述激光雷达,用于获取三维点云;并将所述目标物体的三维点云输入所述处理器;所述处理器,用于对所述三维点云进行三维网格划分,得到多个体素;所述处理器,还用于根据每个所述体素中的点云密度,确定所述体素对应位置的点云特征;所述处理器,还用于通过所述物体检测模型的主干网络提取所述点云特征,并生成所述目标物体的第一特征图;所述处理器,还用于通过所述物体检测模型的注意力分支神经网络在所述第一特征图中生成预测注意力系数;所述处理器,还用于利用损失函数分支神经网络根据样本特征图中的真值注意力系数和所述预测注意力系数计算注意力损失函数的结果;所述处理器,还用于根据所述注意力损失函数的结果对所述预测注意力系数进行更新,使得所述第二特征图中所述目标物体可见部分对应的特征图部分生成的预测注意力系数高于所述目标物体被遮挡部分的特征图部分的预测注意力系数;所述处理器,还用于根据所述目标物体的可见部分信息获得预测结果,所述预测结果包括目标物体的位置信息。
应理解,通过根据被遮挡目标物体的三维点云数据确定的第一特征图可以获得各个位置生成有预测注意力系数的第二特征图,根据预测注意力系数和真值注意力生成的注意力损失函数对该预测注意力系数进行校正或者更新后,使该被遮挡目标物体的可见部分的预测注意力系数高于被遮挡部分的预测注意力系数,在预测过程中可以更大程度利用该可见部分,更准确地预测目标物体的位置和大小等信息。
结合第二方面,在第二方面的某些实现方式中,所述处理器,还用于对所述第一特征图划分候选框;所述处理器,还用于通过所述注意力分支神经 网络对所述第一特征图的候选框生成预测注意力系数;所述处理器,还用于将所述预测注意力系数与所述第一特征图进行点乘,获取所述第二特征图。
可选地,在对检测系统进行训练之前,可以先建立样本库,该样本库中可以包括该目标物体的样本特征图,该样本特征图包括真值注意力系数,示例性的,该样本特征图中目标物体可见部分对应的真值注意力系数高于被遮挡部分的真值注意力系数。
结合第二方面,在第二方面的某些实现方式中,所述处理器,还用于对比所述第二特征图的候选框的预测注意力系数与所述候选框对应的样本特征图中的真值框的注意力系数;所述处理器,还用于当所述候选框的预测注意力系数与所述真值框的真值注意力系数的置信度高于第一阈值时,根据所述预测注意力系数和所述真值注意力系数确定注意力损失函数的结果;所述处理器,还用于根据所述注意力损失函数的结果对所述预测注意力系数进行更新,使所述预测注意力系数的置信度高于第二阈值。
结合第二方面,在第二方面的某些实现方式中,当所述处理器,用于对所述第二特征图的预测注意力系数更新后,对更新后的所述预测注意力系数进行取自然常数e指数操作。
应理解,对更新后的预测注意力系数进行取e指数操作后,可以使得目标物体可见部分对应的预测注意力系数与被遮挡部分对应的预测注意力系数区分更加明显,凸显可见部分的信息。
结合第二方面,在第二方面的某些实现方式中,所述处理器,还用于根据所述注意力损失函数的结果通过反向传播算法对所述预测注意力系数进行更新。
结合第二方面,在第二方面的某些实现方式中,所述注意力损失函数为
Figure PCTCN2019114357-appb-000002
其中,k为所述候选框中的特征点个数,L a为smooth L1损失函数,m k为所述预测注意力系数,t k为所述真值注意力系数。
结合第二方面,在第二方面的某些实现方式中,所述处理器,还用于获取被遮挡的目标物体的三维点云数据;所述处理器,还用于将所述三维点云数据进行三维网络划分,并获得多个三维空间体素;所述处理器,还用于根据每个所述体素中的点云密度,获得所述体素的点云特征;所述处理器,还用于使用所述主干神经网络提取所述点云特征,并生成所述第一特征图。
结合第二方面,在第二方面的某些实现方式中,所述处理器,用于通过 所述注意力分支神经网络对所述第一特征图的候选框生成预测注意力系数,包括:所述注意力分支神经网络通过卷积操作、全连接以及卷积操作的变种中的一种或多种方式生成所述预测注意力系数。
结合第二方面,在第二方面的某些实现方式中,所述处理器,用于通过所述人工神经网络对所述目标物体进行物体检测,并获得所述目标物体可见部分对应的特征图候选框的三维位置和置信度;所述处理器,还用于对所述置信度进行排序,并选取置信度高于第三阈值的候选框;所述处理器,还用于根据所述置信度高于第三阈值的候选框预测所述目标物体的信息。
结合第二方面,在第二方面的某些实现方式中,所述目标物体的信息包括所述目标物体的位置和/或尺寸。
结合第二方面,在第二方面的某些实现方式中,所述系统还包括显示器,所述显示器用于显示根据所述第二特征图获得的预测结果。
可选地,本申请实施例提供的系统可以应用于无人驾驶领域的可移动设备中,该可移动设备可以为无人机或者无人车。该可移动设备可以通过激光雷达对被遮挡目标物体的三维点云进行采集,并根据被遮挡物体的可见部分对目标物体的位置和/或尺寸信息进行预测。
第三方面,提供了一种基于人工神经网络的物体检测的系统,所述系统包括处理模块和接收模块,其中,所述系统用于执行如第一方面任一实现方式中所述的方法。
第四方面,提供了一种计算机存储介质,其上存储有计算机程序,所述计算机程序被计算机执行时使得,所述计算机执行第一方面提供的方法。
第五方面,提供了一种芯片系统,所述芯片系统包括至少一个处理器,当程序指令在所述至少一个处理器中执行时,使得如第一方面中任一项所述的方法得以实现。
第六方面,提供了一种包含指令的计算机程序产品,所述指令被计算机执行时使得计算机执行第一方面提供的方法。
本申请实施例提供的一种基于人工神经网络的物体检测的方法,可以应用于无人机或者无人车等无人驾驶设备的自动驾驶领域,用于预测无人驾驶的可移动设备周围环境中的障碍物(如其他车辆、行人等),其中,该障碍物(即目标物体)可以是被部分遮挡的物体。根据本申请实施例提供的方法,可以通过神经网络的深度学习,训练出能够根据被遮挡目标物体的可见部分 信息获得目标物体位置信息以及尺寸信息等的基于人工神经网络的物体检测模型。其中,该物体检测模型在被训练的过程中,可以基于注意力机制,对目标物体的可见部分的信息赋予更多的权重,也即对可见部分的信息更加敏感,使得该物体检测模型可以在后续预测过程中,可以根据目标物体的可见部分,更准确地获取目标物体的信息。
附图说明
图1示出了本申请实施例提供的基于人工神经网络的物体检测的方法所应用的场景的示意图。
图2示出了本申请实施例提供的基于人工神经网络的物体检测的方法的示意性流程图。
图3示出了本申请实施例提供的基于人工神经网络的物体检测的方法的示意性流程图。
图4示出了本申请实施例提供的一种基于人工神经网络的物体检测的系统示意图。
图5示出了本申请实施例提供的另一种基于人工神经网络的物体检测的系统示意图。
具体实施方式
为了便于理解本发明实施例提供的技术方案,下文首先描述本发明实施例涉及的一些概念。
1、注意力(attention)机制
通俗来讲,attention机制就是把注意力集中放在重要的点上,而忽略其他不重要的因素。举例来说,attention机制类似于人类的视觉注意力机制,人类视觉在面对图像时,可以通过快速扫描全局图像,获得需要重点关注的目标区域,也就是一般所说的注意力焦点,而后对这一区域投入更多注意力资源,以获取更多需要关注目标的细节信息,而抑制其他无用信息。其中,重要程度的判断可以取决于应用场景。根据应用场景的不同,attention机制分为空间注意力和时间注意力,前者一般用于图像处理,后者一般用于自然语言处理。本申请实施例主要涉及空间注意力。
应理解,本申请实施例提供的物体检测的方法可以适用于自动驾驶场景 中(如图1所示)。具体地,在自动驾驶过程中,无人车可以使用激光雷达获取三维点云探测周围环境,并对周围环境中的三维物体进行检测,而当被检测的三维物体被部分遮挡时,使得点云缺失导致漏检,检测效果将会大大下降。为了在对被遮挡物体检测时,能够获得更好的检测结果,从而更精准地预测出被遮挡三维物体的位置和大小,本申请实施例通过改进深度学习算法中神经网络的训练策略,在三维物体的神经网络检测模型中,加入attention网络分支,提高对可见部分中关键点云的利用程度,进而提升被遮挡三维物体的检测效果。
以下结合附图对本申请实施例提供的物体检测的方法进行进一步介绍。
图2示出了本申请实施例提供的基于人工神经网络的物体检测的方法的示意性流程图。包括以下步骤。
S101,获取三维点云,并使用主干神经网络获取该三维点云的第一特征图。
其中,该三维点云为被部分遮挡的目标物体的三维点云数据。
应理解,在获取三维点云数据之前,先通过深度学习算法生成一个三维物体的神经网络检测模型。在神经网络检测模型的训练阶段,该检测模型可以包括主干网络以及网络分支,其中,主干网络可以用于接收三维点云数据,并根据该三维点云数据生成特征图;网络分支可以用于计算网络的损失函数,该损失函数为分别与置信度、位置和attention系数相关的损失函数,这些损失函数可指导更新网络参数,如attention系数等,使该神经网络检测模型能够根据目标物体的未被遮挡部分更准确地预测出目标物体的位置和大小,具备更好的预测性能。
在一种实现方式中,在根据点云数据生成特征图之前,神经网络检测模型可以将目标物体的三维空间体素化,并根据每个体素中的点云密度确定该体素位置的点云特征。该过程可以将目标物体的点云数据转化为神经网络可以接收的维度。
示例性的,神经网络检测模型可以首先在xyz方向上按照一定分辨率对目标物体的点云进行三维网络划分处理,以获得多个三维空间的体素;然后基于体素中的点云密度确定该体素对应的点云特征。其中,对于存在点云的体素,计算该体素中的点云密度P,并将该位置的点云特征置为P;对于无点云存在的体素,将其点云特征置为0。
在一种实现方式中,神经网络检测模型可以通过一个主干网络提取点云特征,并生成第一特征图。其中,主干网络可以是任意的网络结构,第一特征图的大小可以为H*W,本申请实施例对H和W的具体数值不做限定。
在一种实现方式中,向神经网络检测模型输入的数据并不限于目标物体的点云数据,还可以是目标物体的图像信息,如目标物体的RGB图像信息。
S102,使用注意力分支神经网络处理第一特征图,获取第二特征图,其中,该第二特征图还用于获取目标物体的损失函数,该损失函数用于更新注意力分支神经网络的网络系数。
在一种实现方式中,可以通过注意力分支神经网络对第一特征图的候选框生成预测注意力系数。其中,第一特征图的各个位置可以指对第一特征图进行划分后得到的各个候选框,该候选框的尺寸可以根据需要灵活设置,本申请对此不做限定。
在一种实现方式中,注意力分支神经网络可以通过多种方式在第一特征图的各个位置生成相应的预测注意力系数,例如通过卷积操作、全连接以及卷积的变种(如SPAS、STAR、SGRS、SPARSE等)。
在一种实现方式中,第一特征图上初始生成的预测注意力系数的值可以为预设置的默认值;或者,该初始生成的预测注意力系数的值为经验值。
可选地,在进行检测物体的神经网络模型训练过程之前,可以先建立样本库,该样本库中包括有目标物体的样本特征图,其中,该样本特征图上可以划分有真值框,且真值框的大小可以与第二特征图中候选框的大小相同。
可选地,可以预先对样本特征图中目标物体的各个部分进行真值注意力系数的标注。例如,对于目标物体被遮挡的部分,可以预先标注其真值注意力系数的值为负数或者小于1的正数;对于目标物体未被遮挡的部分(也即可见部分),可以预先标注其真值注意力系数的值为大于1的正数。
在一种实现方式中,可以对样本特征图中的真值注意力系数取自然对数e指数,使得未被遮挡部分的信息更加突出。
应理解,本申请实施例提供的物体检测模型训练过程,是为了使得目标物体可见部分的注意力系数可以高于或者远高于目标物体被遮挡部分的注意力系数。注意力系数越高也就意味着该部分点云为预测目标物体的关键点云,在后续预测目标物体的位置或者尺寸时,部分点云的受关注程度以及利用程度也就越高。
在一种实现方式中,根据生成有预测注意力系数的第一特征图可以获得第二特征图。示例性的,可以将生成的预测注意力系数与第一特征图通过最小二乘法进行点乘,获取第二特征图。换句话说,第二特征图是基于第一特征图在各个候选框生成相应的预测注意力系数后获得的,该第二特征图也可以理解为注意力特征图(即attention特征图)。
应理解,由于第二特征图中各个候选框中的预测注意力系数为默认值或者经验值,因此,其无法保证满足目标物体可见部分的预测注意力系数高于或者远高于遮挡部分的预测注意力系数。在这种情况下,需要以样本特征图中的真值注意力系数为参照对第二特征图中的预测注意力系数进行校正和更新,使得预测注意力系数和真值注意力系数的置信度达到第一阈值,在该过程中,训练注意力分支神经网络在目标物体的可见部分生成高注意力系数,在目标物体的被遮挡部分生成较低的注意力系数,进而能更准确地获得目标物体可见部分对应的特征图部分,便于后续利用该可见部分对目标物体的信息进行预测。
在一种实现方式中,检测物体神经网络模型可以通过注意力系数损失函数对预测注意力系数进行校正和更新。具体来说,对比第二特征图的候选框中的预测注意力系数与该候选框对应的样本特征图中真值框中的真值注意力系数;当候选框的预测注意力系数与真值框的真值注意力系数的置信度高于第一阈值时,根据该预测注意力系数和真值注意力系数确定注意力损失函数的结果;当根据预测注意力系数和真值注意力系数计算出注意力损失函数的结果后,根据该注意力损失函数的结构对注意力分支神经网络系数进行更新,使得该注意力分支神经网络系数的置信度高于第二阈值。其中,注意力损失函数为∑ n=0 k[L a(m k,t k)],k为候选框中的特征点个数,L a为平滑最小一乘smooth L1损失函数,m k为预测注意力函数,t k为实际的真值注意力系数。
在一种实现方式中,当计算出注意力损失函数的结果后,可以利用反向传播算法,利用该结果对注意力分支神经网络的网络系数进行校正和更新。
应理解,通过利用注意力损失函数的结果对第二特征图中的预测注意力系数进行校正和更新,能够使的该预测注意力系数更接近真值注意力系数的值。
在一种实现方式中,当对第二特征图的预测注意力系数更新后,可以对 更新后的预测注意力系数进行取自然常数e指数的操作,使得可见部分的注意力系数与遮挡部分的注意力系数具有更明显的差别,以突出可见部分的信息。
应理解,对第二特征图中的预测注意力系数进行校正和更新后,该第二特征图中目标物体可见部分对应的注意力系数较高,从而该物体检测神经网络模型对该可见部分的信息更加敏感,更大程度地利用该可见部分的信息预测整个目标物体的位置和尺寸。
还应理解,对注意力分支神经网络的预测注意力系数进行校正和更新后,该注意力分支神经网络在后续对目标物体的检测过程中可以在目标物体的可见部分生成高预测注意力系数,而在目标物体的被遮挡部分生成较低的预测注意力系数,也即注意力分支神经网络在上述训练过程之后可以对目标物体可见部分的信息更加敏感,使得该物体检测模型在实际预测过程中更加关注可见部分的信息,以达到提升被遮挡的目标物体的检测效果。
S103,根据第二特征图获得预测结果,该预测结果包括目标物体的位置信息。
在一种实现方式中,神经网络物体检测模型可以根据预测注意力系数经过校正或者更新后的第二特征图获得预测结果,该预测结果可以包括目标物体的位置信息,也可以包括目标物体的尺寸等信息。
在一种实现方式中,对于通过上述训练过程训练好的神经网络物体检测模型,在实际预测过程中,可以将目标物体的点云数据或者图像信息数据输入该检测模型,该检测模型可以筛选出该点云数据或者图像信息中属于未被遮挡部分的数据或者信息,根据该未被遮挡部分的数据或者信息预测整个被遮挡目标物体的位置或者尺寸等信息。
在一种实现方式中,通过上述神经网络物体检测模型,可以得出被检测物体的候选框所对应的三维位置和置信度;以置信度对候选框排序后,按照置信度由高到低的顺序可以筛选出一定数量的候选框,其中,被筛选出的候选框的置信度可以均高于第三阈值;根据筛选出的具有较高置信度候选框对目标物体的位置和尺寸进行预测。其中,通过神经网络的深度学习算法,根据目标物体的某些关键部分的点云数据或者图像信息预测整个物体的尺寸或者位置的过程可以参见现有流程,此处不再赘述。
应理解,本申请实施例提供的基于人工神经网络的物体检测的方法,可 以适用于自动驾驶领域中,无人车或者无人机等对周围环境存在的障碍物的位置、尺寸等信息进行预测的场景下。鉴于传统的对障碍物检测的过程中,当根据障碍物的信息生成特征图后,会直接将该特征图输入用于计算位置、置信度的损失函数的分支网络,也即该种情况下,对障碍物的被遮挡部分和可见部分赋予基本相同的关注度,然而由于障碍物被部分遮挡,使得用于预测障碍物位置、置信度等的有价值的信息部分缺失,会导致检测效果较差。本申请实施例提供的基于人工神经网络的物体检测的方法,在人工神经网络中加入注意力网络分支,并训练该注意力网络分支学习能够准确识别障碍物的可见部分,然后由物体检测模型利用该可见部分的关键信息预测障碍物的位置、尺寸或者置信度等信息,使得无人车或者无人机能够准确获知周围环境的障碍物的分布、大小等,以做出准确的行驶轨迹。
此外,本申请实施例提供的基于人工神经网络的物体检测的方法,可以仍然利用激光雷达进行检测,不需要利用其它传感器进行融合,降低了硬件成本。
图3示出了本申请实施例提供的基于人工神经网络的物体检测方法的示意性流程图。该过程包括以下步骤。
S201,输入点云。
应理解,本申请实施例提供的物体检测方法还可以输入目标物体的图像,如GRB图像。
S202,三维网格化。
其中,三维网络化是指对目标物体的点云进行三维网格划分处理,也即将三维空间体素化。具体来说,可以对空间点云在xyz三个空间坐标方向上按照一定分辨率,进行栅格划分,得到三维空间的体素。
在一种实现方式中,根据体素中的点云密度确定点云特征。其中,对于存在点云的体素,计算该体素中的点云密度(记为P),并将该位置的点云特征置为P;对于无点云的体素,将其点云特征置为0。该过程可以将目标物体的点云数据转化为神经网络可以接收的维度。
S203,通过主干网络获取第一特征图。
其中,该第一特征图为目标物体点云的特征图。
在一种实现方式中,主干网络可以是任意的网络结构,第一特征图的大小可以为H*W,本申请实施例对H和W的具体数值不做限定。
在一种实现方式中,向神经网络检测模型输入的数据并不限于目标物体的点云数据,还可以是目标物体的图像信息,如目标物体的RGB图像信息。
S204,注意力系数相关操作。
其中,该流程中注意力系数的相关操作可以包括基于第一特征图生成第一特征图中各个位置对应的预测注意力系数。该第一特征图的各个位置可以指对第一特征图进行划分后得到的各个候选框,该候选框的尺寸可以根据需要灵活设置,本申请对此不做限定。
在一种实现方式中,注意力分支神经网络可以通过多种方式在第一特征图的各个位置生成相应的预测注意力系数,例如通过卷积操作、全连接以及卷积的变种(如SPAS、STAR、SGRS、SPARSE等)。
S205,获得注意力系数。
S206,获得第二特征图。
其中,该第二特征图第二特征图是基于第一特征图在各个候选框生成相应的预测注意力系数后获得的,该第二特征图也可以理解为注意力特征图(即attention特征图)。
应理解,由于第二特征图中各个候选框中的预测注意力系数为默认值或者经验值,因此,其无法保证满足目标物体可见部分的预测注意力系数高于或者远高于遮挡部分的预测注意力系数。在这种情况下,需要以样本特征图中的真值注意力系数为参照对第二特征图中的预测注意力系数进行校正和更新,使得预测注意力系数和真值注意力系数的置信度达到第一阈值,在该过程中,训练注意力分支神经网络在目标物体的可见部分生成高注意力系数,在目标物体的被遮挡部分生成较低的注意力系数,进而能更准确地获得目标物体可见部分对应的特征图部分,便于后续利用该可见部分对目标物体的信息进行预测。
在一种实现方式中,检测物体神经网络模型可以通过注意力系数损失函数对预测注意力系数进行校正和更新。具体来说,对比第二特征图的候选框中的预测注意力系数与该候选框对应的样本特征图中真值框中的真值注意力系数;当候选框的预测注意力系数与真值框的真值注意力系数的置信度高于第一阈值时,根据该预测注意力系数和真值注意力系数确定注意力损失函数的结果;当根据预测注意力系数和真值注意力系数计算出注意力损失函数的结果后,根据该注意力损失函数的结构对注意力分支神经网络系数进行更 新,使得该注意力分支神经网络系数的置信度高于第二阈值。其中,注意力损失函数为∑ n=0 k[L a(m k,t k)],k为候选框中的特征点个数,L a为smooth L1损失函数,m k为预测注意力函数,t k为实际的真值注意力系数。
在一种实现方式中,当计算出注意力损失函数的结果后,可以利用反向传播算法,利用该结果对注意力分支神经网络的网络系数进行校正和更新。
应理解,通过利用注意力损失函数的结果对第二特征图中的预测注意力系数进行校正和更新,能够使的该预测注意力系数更接近真值注意力系数的值。
在一种实现方式中,当对第二特征图的预测注意力系数更新后,可以对更新后的预测注意力系数进行取自然常数e指数的操作,使得可见部分的注意力系数与遮挡部分的注意力系数具有更明显的差别,以突出可见部分的信息。
应理解,对第二特征图中的预测注意力系数进行校正和更新后,该第二特征图中目标物体可见部分对应的注意力系数较高,从而该物体检测神经网络模型对该可见部分的信息更加敏感,更大程度地利用该可见部分的信息预测整个目标物体的位置和尺寸。
还应理解,对注意力分支神经网络的预测注意力系数进行校正和更新后,该注意力分支神经网络在后续对目标物体的检测过程中可以在目标物体的可见部分生成高预测注意力系数,而在目标物体的被遮挡部分生成较低的预测注意力系数,也即注意力分支神经网络在上述训练过程之后可以对目标物体可见部分的信息更加敏感,使得该物体检测模型在实际预测过程中更加关注可见部分的信息,以达到提升被遮挡的目标物体的检测效果。
S207,获得预测结果。
在一种实现方式中,神经网络物体检测模型可以根据预测注意力系数经过校正或者更新后的第二特征图获得预测结果,该预测结果可以包括目标物体的位置信息,也可以包括目标物体的尺寸等信息。
S208,置信度排序及阈值筛选。
在一种实现方式中,通过上述神经网络物体检测模型,可以得出被检测物体的候选框所对应的三维位置和置信度;以置信度对候选框排序后,按照置信度由高到低的顺序可以筛选出一定数据的候选框;根据筛选出的具有较高置信度候选框对目标物体的位置和尺寸进行预测。其中,通过神经网络的 深度学习算法,根据目标物体的某些关键部分的点云数据或者图像信息预测整个物体的尺寸或者位置的过程可以参见现有流程,此处不再赘述。
S209,获得目标物体的最终预测结果。
本申请实施例提供的基于人工神经网络的物体检测的方法,可以适用于自动驾驶领域中,无人车或者无人机等对周围环境存在的障碍物的位置、尺寸等信息进行预测的场景下。鉴于传统的对障碍物检测的过程中,当根据障碍物的信息生成特征图后,会直接将该特征图输入用于计算位置、置信度的损失函数的分支网络,因此,其会对障碍物的被遮挡部分和可见部分赋予基本相同的关注度,也即该种情况下,由于障碍物被部分遮挡,使得用于预测障碍物位置、置信度等的有价值的信息部分缺失,检测效果较差。而本申请实施例提供的基于人工神经网络的物体检测的方法,在人工神经网络中加入注意力网络分支,并训练该注意力网络分支能够准确识别障碍物的可见部分,然后由物体检测模型利用该可见部分的关键信息预测障碍物的位置、尺寸或者置信度等信息,使得无人车或者无人机能够准确获知周围环境的障碍物的分布、大小等,以做出准确的行驶轨迹。
图4示出了本申请实施例提供的一种基于人工神经网络的物体检测的系统示意图。该系统300包括至少一个激光雷达310和处理器320。该系统300可以是设置在自动驾驶车辆上的分布式的感知处理系统,例如至少一个激光雷达310可以设置在车顶且为旋转式激光雷达;激光雷达310也可以设置在自动驾驶车辆上的其他位置或使用其他形式的激光雷达。处理器320可以是设置于自动驾驶车辆上的超算平台,即处理器320可以包括一个或多个CPU、GPU、FPGA或ASIC等形式的处理单元,用于处理自动驾驶车辆的传感器所获取的传感数据。
在一种实现方式中,激光雷达310,用于获取三维点云。
在一种实现方式中,处理器320,用于使用主干神经网络获取三维点云的第一特征图。
在一种实现方式中,处理器320,还用于使用注意力分支神经网络处理第一特征图,并获取第二特征图,所述第二特征图的各个位置包括与位置对应的预测注意力系数,第二特征图还用于获取目标物体的损失函数,损失函数用于更新预测注意力系数。
在一种实现方式中,处理器320,还用于根据第二特征图获得预测结果, 预测结果包括目标物体的位置信息。
在一种实现方式中,处理器320,还用于对第一特征图划分候选框。
在一种实现方式中,处理器320,还用于通过注意力分支神经网络对第一特征图的候选框生成预测注意力系数。
在一种实现方式中,处理器320,还用于将预测注意力系数与第一特征图进行点乘,获取第二特征图。
在一种实现方式中,处理器320,还用于对比第二特征图的候选框的预测注意力系数与候选框对应的样本特征图中的真值框的注意力系数。
在一种实现方式中,处理器320,还用于当候选框的预测注意力系数与真值框的真值注意力系数的置信度高于第一阈值时,根据预测注意力系数和真值注意力系数确定注意力损失函数的结果。
在一种实现方式中,处理器320,还用于根据注意力损失函数的结果对预测注意力系数进行更新,使预测注意力系数的置信度高于第二阈值。
在一种实现方式中,当所述处理器320,用于对所述第二特征图的预测注意力系数更新后,对更新后的所述预测注意力系数进行取自然常数e指数操作。
在一种实现方式中,所述处理器320,还用于根据注意力损失函数的结果通过反向传播算法对预测注意力系数进行更新。
在一种实现方式中,注意力损失函数为
Figure PCTCN2019114357-appb-000003
其中,k为候选框中的特征点个数,L a为smooth L1损失函数,m k为预测注意力系数,t k为真值注意力系数。
在一种实现方式中,处理器320,还用于获取被遮挡的目标物体的三维点云数据。
在一种实现方式中,处理器320,还用于将三维点云数据进行三维网络划分,并获得多个三维空间体素。
在一种实现方式中,处理器320,还用于根据每个体素中的点云密度,获得体素的点云特征。
在一种实现方式中,处理器320,还用于使用主干神经网络提取点云特征,并生成第一特征图。
在一种实现方式中,处理器320,用于通过注意力分支神经网络对第一特征图的候选框生成预测注意力系数,包括:注意力分支神经网络通过卷积 操作、全连接以及卷积操作的变种中的一种或多种方式生成预测注意力系数。
在一种实现方式中,处理器320,用于通过人工神经网络对目标物体进行物体检测,并获得目标物体可见部分对应的特征图候选框的三维位置和置信度。
在一种实现方式中,处理器320,还用于对置信度进行排序,并选取置信度高于第三阈值的候选框。
在一种实现方式中,处理器320,还用于根据置信度高于第三阈值的候选框预测目标物体的信息。
在一种实现方式中,目标物体的信息包括所述目标物体的位置和/或尺寸。
在一种实现方式中,本申请实施例提供的系统300还可以包括显示器,该显示器用于显示根据第二特征图预测的目标物体的预测结果。
应理解,本申请实施例提供的一种基于人工神经网络的物体检测模型的系统,可以应用于无人机或者无人车等无人驾驶设备的自动驾驶领域,用于预测无人驾驶的可移动设备周围环境中的障碍物(如其他车辆、行人等),其中,该障碍物(即目标物体)可以是被部分遮挡的物体。本申请实施例提供的系统,可以通过神经网络的深度学习,训练出能够根据被遮挡目标物体的可见部分信息获得目标物体位置信息以及尺寸信息等的基于人工神经网络的物体检测模型。其中,该物体检测模型在被训练的过程中,可以基于注意力机制,对目标物体的可见部分的信息赋予更多的权重,也即对可见部分的信息更加敏感,使得该物体检测模型可以在后续预测过程中,可以根据目标物体的可见部分,更准确地获取目标物体的信息。
图5示出了本申请实施例提供的一种基于人工神经网络的物体检测的系统示意图。该系统400包括至少一个接收模块410和处理模块420。
在一种实现方式中,接收模块410,用于获取三维点云。
在一种实现方式中,处理模块420,用于使用主干神经网络获取三维点云的第一特征图。
在一种实现方式中,处理模块420,还用于使用注意力分支神经网络处理第一特征图,并获取第二特征图,所述第二特征图的各个位置包括与位置对应的预测注意力系数,第二特征图还用于获取目标物体的损失函数,损失 函数用于更新预测注意力系数。
在一种实现方式中,处理模块420,还用于根据第二特征图获得预测结果,预测结果包括目标物体的位置信息。
在一种实现方式中,处理模块420,还用于对第一特征图划分候选框。
在一种实现方式中,处理模块420,还用于通过注意力分支神经网络对第一特征图的候选框生成预测注意力系数。
在一种实现方式中,处理模块420,还用于将预测注意力系数与第一特征图进行点乘,获取第二特征图。
在一种实现方式中,处理模块420,还用于对比第二特征图的候选框的预测注意力系数与候选框对应的样本特征图中的真值框的注意力系数。
在一种实现方式中,处理模块420,还用于当候选框的预测注意力系数与真值框的真值注意力系数的置信度高于第一阈值时,根据预测注意力系数和真值注意力系数确定注意力损失函数的结果。
在一种实现方式中,处理模块420,还用于根据注意力损失函数的结果对预测注意力系数进行更新,使预测注意力系数的置信度高于第二阈值。
在一种实现方式中,当处理模块420,用于对所述第二特征图的预测注意力系数更新后,对更新后的所述预测注意力系数进行取自然常数e指数操作。
在一种实现方式中,处理模块420,还用于根据注意力损失函数的结果通过反向传播算法对预测注意力系数进行更新。
在一种实现方式中,注意力损失函数为
Figure PCTCN2019114357-appb-000004
其中,k为候选框中的特征点个数,L a为smooth L1损失函数,m k为预测注意力系数,t k为真值注意力系数。
在一种实现方式中,处理模块420,还用于获取被遮挡的目标物体的三维点云数据。
在一种实现方式中,处理模块420,还用于将三维点云数据进行三维网络划分,并获得多个三维空间体素。
在一种实现方式中,处理模块420,还用于根据每个体素中的点云密度,获得体素的点云特征。
在一种实现方式中,处理模块420,还用于使用主干神经网络提取点云特征,并生成第一特征图。
在一种实现方式中,处理模块420,用于通过注意力分支神经网络对第一特征图的候选框生成预测注意力系数,包括:注意力分支神经网络通过卷积操作、全连接以及卷积操作的变种中的一种或多种方式生成预测注意力系数。
在一种实现方式中,处理模块420,用于通过人工神经网络对目标物体进行物体检测,并获得目标物体可见部分对应的特征图候选框的三维位置和置信度。
在一种实现方式中,处理模块420,还用于对置信度进行排序,并选取置信度高于第三阈值的候选框。
在一种实现方式中,处理模块420,还用于根据置信度高于第三阈值的候选框预测目标物体的信息。
在一种实现方式中,目标物体的信息包括所述目标物体的位置和/或尺寸。
在一种实现方式中,本申请实施例提供的物体检测模型的系统还可以包括显示器,该显示器用于显示根据第二特征图获得的预测结果。
本发明实施例还提供一种芯片系统,芯片系统包括至少一个处理器,当程序指令在该至少一个处理器中执行时,使得本申请实施例提供的方法得以实现。
本发明实施例还提供一种计算机存储介质,其上存储有计算机程序,该计算机程序被计算机执行时使得,该计算机执行上述方法实施例的方法。
本发明实施例还提供一种包含指令的计算机程序产品,该指令被计算机执行时使得计算机执行上述方法实施例的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其他任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本发明实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线(digital subscriber line,DSL))或无线 (例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如数字视频光盘(digital video disc,DVD))、或者半导体介质(例如固态硬盘(solid state disk,SSD))等。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
在本发明所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以所述权利要求的保护范围为准。

Claims (24)

  1. 一种基于人工神经网络的物体检测的系统,其特征在于,包括至少一个处理器和激光雷达,其中,
    所述激光雷达,用于获取目标物体的三维点云,并将所述三维点云输入所述处理器;
    所述处理器,用于对所述三维点云进行三维网格划分,得到多个体素;根据每个所述体素中的点云密度,确定所述体素对应位置的点云特征,通过所述物体检测模型的主干网络提取所述点云特征,并生成所述目标物体的第一特征图;通过所述物体检测模型的注意力分支神经网络在所述第一特征图中生成预测注意力系数;利用损失函数分支神经网络根据样本特征图中的真值注意力系数和所述预测注意力系数计算注意力损失函数的结果;根据所述注意力损失函数的结果对所述预测注意力系数进行更新,使得所述第二特征图中所述目标物体可见部分对应的特征图部分生成的预测注意力系数高于所述目标物体被遮挡部分的特征图部分的预测注意力系数;根据所述目标物体的可见部分信息获得预测结果,所述预测结果包括目标物体的位置信息。
  2. 根据权利要求1所述的系统,其特征在于,所述处理器,还用于对所述第一特征图划分候选框;通过所述注意力分支神经网络对所述第一特征图的候选框生成预测注意力系数;将所述预测注意力系数与所述第一特征图进行点乘,获取所述第二特征图。
  3. 根据权利要求2所述的系统,其特征在于,所述处理器,还用于对比所述第二特征图的候选框的预测注意力系数与所述候选框对应的样本特征图中的真值框的注意力系数;当所述候选框的预测注意力系数与所述真值框的真值注意力系数的置信度高于第一阈值时,根据所述预测注意力系数和所述真值注意力系数确定注意力损失函数的结果;根据所述注意力损失函数的结果对所述预测注意力系数进行更新,使所述预测注意力系数与所述真值注意力系数的置信度高于第二阈值。
  4. 根据权利要求1-3中任一项所述的系统,其特征在于,当所述处理器,用于对所述第二特征图的预测注意力系数更新后,对更新后的所述预测注意力系数进行取自然常数e指数操作。
  5. 根据权利要求3或4所述的系统,其特征在于,所述处理器,还用于根据所述注意力损失函数的结果通过反向传播算法对所述预测注意力系 数进行更新。
  6. 根据权利要求3-5中任一项所述的系统,其特征在于,所述注意力损失函数为
    Figure PCTCN2019114357-appb-100001
    其中,k为所述候选框中的特征点个数,L a为平滑最小一乘smooth L1损失函数,m k为所述预测注意力系数,t k为所述真值注意力系数。
  7. 根据权利要求1-6中任一项所述的系统,其特征在于,所述处理器,还用于获取被遮挡的目标物体的三维点云数据;
    所述处理器,还用于将所述三维点云数据进行三维网络划分,并获得多个三维空间体素;
    所述处理器,还用于根据每个所述体素中的点云密度,获得所述体素的点云特征;
    所述处理器,还用于使用所述主干神经网络提取所述点云特征,并生成所述第一特征图。
  8. 根据权利要求2-7中任一项所述的系统,其特征在于,所述处理器,用于通过所述注意力分支神经网络对所述第一特征图的候选框生成预测注意力系数,包括:
    所述注意力分支神经网络通过卷积操作、全连接以及卷积操作的变种中的一种或多种方式生成所述预测注意力系数。
  9. 根据权利要求1-8中任一项所述的系统,其特征在于,所述处理器,用于通过所述人工神经网络对所述目标物体进行物体检测,并获得所述目标物体可见部分对应的特征图候选框的三维位置和置信度;
    所述处理器,还用于对所述置信度进行排序,并选取置信度高于第三阈值的候选框;
    所述处理器,还用于根据所述置信度高于第三阈值的候选框预测所述目标物体的信息。
  10. 根据权利要求9所述的系统,其特征在于,所述目标物体的信息包括所述目标物体的位置和/或尺寸。
  11. 根据权利要求1-10中任一项所述的系统,其特征在于,所述系统包括显示器,所述显示器用于显示根据所述第二特征图获得的预测结果。
  12. 一种基于人工神经网络的物体检测的方法,其特征在于,所述方法包括:
    获取三维点云,并使用主干神经网络获取所述三维点云的第一特征图;
    使用注意力分支神经网络处理所述第一特征图,并获取第二特征图,所述第二特征图的各个位置包括与所述位置对应的预测注意力系数,所述第二特征图还用于获取所述目标物体的损失函数,所述损失函数用于更新所述预测注意力系数;
    根据所述第二特征图获得预测结果,所述预测结果包括目标物体的位置信息。
  13. 根据权利要求12所述的方法,其特征在于,所述使用注意力分支神经网络处理所述第一特征图,并获取第二特征图,包括:
    对所述第一特征图划分候选框;
    通过所述注意力分支神经网络对所述第一特征图的候选框生成预测注意力系数;
    将所述预测注意力系数与所述第一特征图进行点乘,获取所述第二特征图。
  14. 根据权利要求13所述的方法,其特征在于,所述方法还包括:
    对比所述第二特征图的候选框的预测注意力系数与所述候选框对应的样本特征图中的真值框的注意力系数;
    当所述候选框的预测注意力系数与所述真值框的真值注意力系数的置信度高于第一阈值时,根据所述预测注意力系数和所述真值注意力系数确定注意力损失函数的结果;
    根据所述注意力损失函数的结果对所述预测注意力系数进行更新,使所述预测注意力系数与所述真值注意力系数的置信度高于第二阈值。
  15. 根据权利要求12-14中任一项所述的方法,其特征在于,所述方法还包括:
    当对所述第二特征图的预测注意力系数更新后,对更新后的所述预测注意力系数进行取自然常数e指数操作。
  16. 根据权利要求14或15所述的方法,其特征在于,所述根据所述注意力损失函数的结果对所述预测注意力系数进行更新,包括:
    根据所述注意力损失函数的结果通过反向传播算法对所述预测注意力系数进行更新。
  17. 根据权利要求14-16中任一项所述的方法,其特征在于,所述注意 力损失函数为
    Figure PCTCN2019114357-appb-100002
    其中,k为所述候选框中的特征点个数,L a为smooth L1损失函数,m k为所述预测注意力系数,t k为所述真值注意力系数。
  18. 根据权利要求12-17中任一项所述的方法,其特征在于,所述获取三维点云,并使用主干神经网络获取所述三维点云的第一特征图,包括:
    获取被遮挡的目标物体的三维点云数据;
    将所述三维点云数据进行三维网络划分,并获得多个三维空间体素;
    根据每个所述体素中的点云密度,获得所述体素的点云特征;
    使用所述主干神经网络提取所述点云特征,并生成所述第一特征图。
  19. 根据权利要求13-18中任一项所述的方法,其特征在于,所述通过所述注意力分支神经网络对所述第一特征图的候选框生成预测注意力系数,包括:
    所述注意力分支神经网络通过卷积操作、全连接以及卷积操作的变种中的一种或多种方式生成所述预测注意力系数。
  20. 根据权利要求12-19中任一项所述的方法,其特征在于,所述方法还包括:
    通过所述人工神经网络对所述目标物体进行物体检测,并获得所述目标物体可见部分对应的特征图候选框的三维位置和置信度;
    对所述置信度进行排序,并选取置信度高于第三阈值的候选框;
    根据所述置信度高于第三阈值的候选框预测所述目标物体的信息。
  21. 根据权利要求20所述的方法,其特征在于,所述目标物体的信息包括所述目标物体的位置和/或尺寸。
  22. 根据权利要求12-21中任一项所述的方法,其特征在于,所述方法还包括:
    显示根据所述第二特征图获得的预测结果。
  23. 根据权利要求12-22中任一项所述的方法,其特征在于,所述方法适用于自动驾驶场景中对部分点云被遮挡的目标物体的检测过程,包括:
    建立基于人工神经网络的物体检测模型;
    将所述目标物体的三维点云输入所述物体检测模型,通过所述物体检测模型对所述三维点云进行三维网格划分,得到多个体素;
    根据每个所述体素中的点云密度,确定所述体素位置的点云特征;
    通过所述物体检测模型的主干网络提取所述点云特征,并生成所述目标 物体的第一特征图;
    通过所述物体检测模型的注意力分支神经网络在所述第一特征图中生成预测注意力系数;
    利用损失函数分支神经网络根据样本特征图中的真值注意力系数和所述预测注意力系数计算注意力损失函数的结果;
    根据所述注意力损失函数的结果对所述注意力分支神经网络的网络系数进行更新,使得所述注意力分支神经网络在所述目标物体可见部分对应的特征图部分生成的预测注意力系数高于在所述目标物体被遮挡部分的特征图部分的预测注意力系数;
    根据所述目标物体的可见部分信息获得预测结果,所述预测结果包括目标物体的位置信息。
  24. 一种计算机存储介质,其特征在于,所述计算机程序存储介质具有程序指令,当所述程序指令被直接或者间接执行时,使得如权利要求12至23中任一项所述的方法得以实现。
PCT/CN2019/114357 2019-10-30 2019-10-30 基于人工神经网络的物体检测的系统及方法 WO2021081808A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201980008366.4A CN111602138B (zh) 2019-10-30 2019-10-30 基于人工神经网络的物体检测的系统及方法
PCT/CN2019/114357 WO2021081808A1 (zh) 2019-10-30 2019-10-30 基于人工神经网络的物体检测的系统及方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/114357 WO2021081808A1 (zh) 2019-10-30 2019-10-30 基于人工神经网络的物体检测的系统及方法

Publications (1)

Publication Number Publication Date
WO2021081808A1 true WO2021081808A1 (zh) 2021-05-06

Family

ID=72186761

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/114357 WO2021081808A1 (zh) 2019-10-30 2019-10-30 基于人工神经网络的物体检测的系统及方法

Country Status (2)

Country Link
CN (1) CN111602138B (zh)
WO (1) WO2021081808A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298822A (zh) * 2021-05-18 2021-08-24 中国科学院深圳先进技术研究院 点云数据的选取方法及选取装置、设备、存储介质
CN114005110A (zh) * 2021-12-30 2022-02-01 智道网联科技(北京)有限公司 3d检测模型训练方法与装置、3d检测方法与装置
CN114119838A (zh) * 2022-01-24 2022-03-01 阿里巴巴(中国)有限公司 体素模型与图像生成方法、设备及存储介质
CN114663879A (zh) * 2022-02-09 2022-06-24 中国科学院自动化研究所 目标检测方法、装置、电子设备及存储介质
CN114723939A (zh) * 2022-04-12 2022-07-08 国网四川省电力公司营销服务中心 基于注意力机制的非极大值抑制方法、系统、设备和介质

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111931727A (zh) * 2020-09-23 2020-11-13 深圳市商汤科技有限公司 点云数据标注方法、装置、电子设备和存储介质
CN112232746B (zh) * 2020-11-03 2023-08-22 金陵科技学院 基于注意力加权的冷链物流需求估计方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190026531A1 (en) * 2017-07-21 2019-01-24 Skycatch, Inc. Determining stockpile volume based on digital aerial images and three-dimensional representations of a site
CN109479088A (zh) * 2017-06-02 2019-03-15 深圳市大疆创新科技有限公司 基于深度机器学习和激光雷达进行多目标跟踪和自动聚焦的系统和方法
CN109543601A (zh) * 2018-11-21 2019-03-29 电子科技大学 一种基于多模态深度学习的无人车目标检测方法
CN110032949A (zh) * 2019-03-22 2019-07-19 北京理工大学 一种基于轻量化卷积神经网络的目标检测与定位方法

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109597087B (zh) * 2018-11-15 2022-07-01 天津大学 一种基于点云数据的3d目标检测方法
CN109932730B (zh) * 2019-02-22 2023-06-23 东华大学 基于多尺度单极三维检测网络的激光雷达目标检测方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109479088A (zh) * 2017-06-02 2019-03-15 深圳市大疆创新科技有限公司 基于深度机器学习和激光雷达进行多目标跟踪和自动聚焦的系统和方法
US20190026531A1 (en) * 2017-07-21 2019-01-24 Skycatch, Inc. Determining stockpile volume based on digital aerial images and three-dimensional representations of a site
CN109543601A (zh) * 2018-11-21 2019-03-29 电子科技大学 一种基于多模态深度学习的无人车目标检测方法
CN110032949A (zh) * 2019-03-22 2019-07-19 北京理工大学 一种基于轻量化卷积神经网络的目标检测与定位方法

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113298822A (zh) * 2021-05-18 2021-08-24 中国科学院深圳先进技术研究院 点云数据的选取方法及选取装置、设备、存储介质
CN113298822B (zh) * 2021-05-18 2023-04-18 中国科学院深圳先进技术研究院 点云数据的选取方法及选取装置、设备、存储介质
CN114005110A (zh) * 2021-12-30 2022-02-01 智道网联科技(北京)有限公司 3d检测模型训练方法与装置、3d检测方法与装置
CN114005110B (zh) * 2021-12-30 2022-05-17 智道网联科技(北京)有限公司 3d检测模型训练方法与装置、3d检测方法与装置
CN114119838A (zh) * 2022-01-24 2022-03-01 阿里巴巴(中国)有限公司 体素模型与图像生成方法、设备及存储介质
CN114119838B (zh) * 2022-01-24 2022-07-22 阿里巴巴(中国)有限公司 体素模型与图像生成方法、设备及存储介质
CN114663879A (zh) * 2022-02-09 2022-06-24 中国科学院自动化研究所 目标检测方法、装置、电子设备及存储介质
CN114663879B (zh) * 2022-02-09 2023-02-21 中国科学院自动化研究所 目标检测方法、装置、电子设备及存储介质
CN114723939A (zh) * 2022-04-12 2022-07-08 国网四川省电力公司营销服务中心 基于注意力机制的非极大值抑制方法、系统、设备和介质
CN114723939B (zh) * 2022-04-12 2023-10-31 国网四川省电力公司营销服务中心 基于注意力机制的非极大值抑制方法、系统、设备和介质

Also Published As

Publication number Publication date
CN111602138B (zh) 2024-04-09
CN111602138A (zh) 2020-08-28

Similar Documents

Publication Publication Date Title
WO2021081808A1 (zh) 基于人工神经网络的物体检测的系统及方法
US10755112B2 (en) Systems and methods for reducing data storage in machine learning
CN110222787B (zh) 多尺度目标检测方法、装置、计算机设备及存储介质
CN111222395B (zh) 目标检测方法、装置与电子设备
JP6471448B2 (ja) 視差深度画像のノイズ識別方法及びノイズ識別装置
CN110781756A (zh) 基于遥感图像的城市道路提取方法及装置
CN110632608B (zh) 一种基于激光点云的目标检测方法和装置
JP7156515B2 (ja) 点群アノテーション装置、方法、及びプログラム
WO2022126522A1 (zh) 物体识别方法、装置、可移动平台以及存储介质
CN116027324B (zh) 基于毫米波雷达的跌倒检测方法、装置及毫米波雷达设备
CN110910445B (zh) 一种物件尺寸检测方法、装置、检测设备及存储介质
CN115797736B (zh) 目标检测模型的训练和目标检测方法、装置、设备和介质
CN111126278A (zh) 针对少类别场景的目标检测模型优化与加速的方法
CN115187941A (zh) 目标检测定位方法、系统、设备及存储介质
CN112258568B (zh) 一种高精度地图要素的提取方法及装置
CN109816726B (zh) 一种基于深度滤波器的视觉里程计地图更新方法和系统
CN116310993A (zh) 目标检测方法、装置、设备及存储介质
CN111611836A (zh) 基于背景消除法的船只检测模型训练及船只跟踪方法
CN116052097A (zh) 一种地图要素检测方法、装置、电子设备和存储介质
CN115909253A (zh) 一种目标检测、模型训练方法、装置、设备及存储介质
CN115439692A (zh) 一种图像处理方法、装置、电子设备及介质
KR20210134252A (ko) 이미지 안정화 방법, 장치, 노변 기기 및 클라우드 제어 플랫폼
CN114330542A (zh) 一种基于目标检测的样本挖掘方法、装置及存储介质
US20200058158A1 (en) System and method for object location detection from imagery
CN112861940A (zh) 双目视差估计方法、模型训练方法以及相关设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19950633

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19950633

Country of ref document: EP

Kind code of ref document: A1