WO2022126523A1 - Procédé de détection d'objet, dispositif, plateforme mobile et support de stockage lisible par ordinateur - Google Patents

Procédé de détection d'objet, dispositif, plateforme mobile et support de stockage lisible par ordinateur Download PDF

Info

Publication number
WO2022126523A1
WO2022126523A1 PCT/CN2020/137299 CN2020137299W WO2022126523A1 WO 2022126523 A1 WO2022126523 A1 WO 2022126523A1 CN 2020137299 W CN2020137299 W CN 2020137299W WO 2022126523 A1 WO2022126523 A1 WO 2022126523A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
size
network layer
object detection
target
Prior art date
Application number
PCT/CN2020/137299
Other languages
English (en)
Chinese (zh)
Inventor
蒋卓键
孙扬
陈晓智
Original Assignee
深圳市大疆创新科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市大疆创新科技有限公司 filed Critical 深圳市大疆创新科技有限公司
Priority to PCT/CN2020/137299 priority Critical patent/WO2022126523A1/fr
Publication of WO2022126523A1 publication Critical patent/WO2022126523A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks

Definitions

  • the present application relates to the technical field of object detection, and in particular, to an object detection method, device, movable platform, and computer-readable storage medium.
  • Movable platforms such as unmanned vehicles need to use sensors to sense the surrounding environment, and control the movable platform according to the information obtained by sensing the surrounding objects, so that the movable platform can work safely and reliably. How to accurately detect the object information around the movable platform has become an urgent technical problem to be solved.
  • the present application provides an object detection method, a device, a movable platform and a computer-readable storage medium to solve the technical problem in the related art that the object detection accuracy needs to be improved urgently.
  • an object detection method including:
  • a plurality of sampling points are obtained by sampling the space through the sensor of the movable platform
  • the object detection model includes: a feature extraction network layer, a size information extraction network layer, an object detection network layer and a position detection network layer;
  • the feature extraction network layer is used to obtain feature information of the sampling points based on a neural network
  • the size information extraction network layer is used to determine the size information of the sampling point according to the feature information of the sampling point, and the size information is used to indicate that the sampling point belongs to an object whose size is within the target size range. probability;
  • the object detection network layer is used to detect target sampling points of the same object corresponding to the plurality of sampling points based on the feature information and size information;
  • the position detection network layer is used to detect the position information of the object according to the position information of the target sampling point.
  • an object detection device comprising: a processor and a memory storing a computer program
  • the processor implements the following steps when executing the computer program:
  • the space is sampled by the sensor to obtain multiple sampling points to be identified;
  • a plurality of sampling points are obtained by sampling the space through the sensor of the movable platform
  • the object detection model includes: a feature extraction network layer, a size information extraction network layer, an object detection network layer and a position detection network layer;
  • the feature extraction network layer is used to obtain feature information of the sampling points based on a neural network
  • the size information extraction network layer is used to determine the size information of the sampling point according to the feature information of the sampling point, and the size information is used to indicate that the sampling point belongs to an object whose size is within the target size range. probability;
  • the object detection network layer is configured to detect target sampling points of the same object corresponding to the plurality of sampling points based on the feature information and size information;
  • the position detection network layer is used to detect the position information of the object according to the position information of the target sampling point.
  • a movable platform including:
  • a power system mounted within the body for powering the movable platform
  • the object detection device according to the aforementioned second aspect.
  • a computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, implements the object detection method according to the foregoing first aspect.
  • the feature extraction network layer in the object detection model can be used to obtain the feature information of the sampling point; the size information extraction network layer is used to determine the sampling point according to the feature information of the sampling point The size information is used to represent the probability that the sampling point belongs to an object whose size is within the target size range; the object detection network layer is used to detect the corresponding sampling points based on the feature information and size information.
  • the size information can be extracted based on the feature information, which improves the object detection model's attention to objects whose size is within the target size range, and the position detection network layer can be based on The position information of the target sampling point accurately detects the position information of the object, and can better identify the object whose size is within the target size range.
  • FIG. 1 is a schematic diagram of an object detection method according to an embodiment of the present application.
  • FIG. 2 is a schematic diagram of an object detection model according to an embodiment of the present application.
  • FIG. 3 is a hardware structure diagram of an object detection apparatus according to an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a movable platform according to an embodiment of the present application.
  • the movable platform in the embodiment of the present application may be a car, an unmanned aerial vehicle, an unmanned ship, or a robot, etc., wherein, the car may be an unmanned vehicle or a manned vehicle, and the unmanned aerial vehicle may be a drone or other unmanned aerial vehicle.
  • the movable platform is not limited to the movable platforms listed above, and can also be other movable platforms.
  • unmanned vehicles use on-board sensors to sense the surrounding environment of the vehicle, and control the steering and speed of the vehicle according to the object information obtained by sensing, so that the vehicle can drive on the road safely and reliably.
  • Vehicle sensors mainly include lidar, millimeter-wave radar, and vision sensors.
  • the process of recognizing an object may be to obtain data from a sensor and then input it to a trained object detection model, and the object detection model outputs an object recognition result.
  • the training process of the object detection model can be: firstly express a model through modeling, then evaluate the model by constructing an evaluation function, and finally optimize the evaluation function according to the sample data and the optimization method, and adjust the model to the optimum.
  • modeling is to convert practical problems into problems that can be understood by computers, that is, to convert practical problems into ways that computers can represent.
  • Modeling generally refers to the process of estimating the objective function of the model based on a large number of sample data.
  • evaluation is an indicator used to represent the quality of the model.
  • evaluation indicators will involve some evaluation indicators and the design of some evaluation functions.
  • evaluation indicators There will be targeted evaluation indicators in machine learning. For example, after the modeling is completed, a loss function needs to be designed for the model to evaluate the output error of the model.
  • the goal of optimization is the evaluation function. That is, the optimization method is used to optimize the evaluation function and find the model with the highest evaluation. For example, an optimization method such as gradient descent can be used to find the minimum value (optimal solution) of the output error of the loss function, and adjust the parameters of the model to the optimum.
  • an optimization method such as gradient descent can be used to find the minimum value (optimal solution) of the output error of the loss function, and adjust the parameters of the model to the optimum.
  • the existing object detection models have been able to achieve very good results in object detection, and can detect objects with high accuracy, but the inventors found that the accuracy of the object detection models still cannot reach a 100% perfect state. From the perspective of business scenarios, for example, in the field of vehicle driving, in some extreme scenarios, subtle defects in the object detection model may have serious consequences for the safe driving of vehicles. From a technical point of view, it is extremely challenging and difficult to further solve subtle defects on the basis of the existing high accuracy, because in the field of machine learning, as mentioned above, there are many links from modeling to training. For example, the selection and processing of sample data, the design of data features, the design of models, the design of loss functions or the design of optimization methods, etc., subtle differences in any link are factors that lead to subtle defects in detection accuracy.
  • the inventors of the present application focused their research on objects to be detected.
  • object detection often focuses on large-sized objects such as people, vehicles, roads, or trees, and these larger-sized objects in the sample data often have a large amount of data.
  • the model will tend to be globally optimal during the training process.
  • the features of objects with larger sizes are often more obvious and easier to be noticed.
  • the features of objects with small sizes are relatively subtle and difficult to obtain the attention of the model, which makes the model biased towards Extracting features of large objects, this paranoia eventually leads to models that recognize large objects well, but not small objects well, and this is one of the reasons for the subtle flaws in object detection models.
  • FIG. 1 is a flowchart of an object detection method provided by the embodiment of the present application, including the following steps:
  • step 102 a plurality of sampling points are obtained by spatial sampling of the sensor of the movable platform
  • step 104 an object detection model is used to detect a target sampling point of the same object corresponding to the plurality of sampling points, and the position information of the object is detected according to the position information of the target sampling point.
  • FIG. 2 it is a schematic diagram of an object detection model provided by an embodiment of the present application, wherein the object detection model includes: a feature extraction network layer, a size information extraction network layer, an object detection network layer and a position detection network layer ;
  • the feature extraction network layer is used to obtain feature information of the sampling points based on a neural network
  • the size information extraction network layer is used to determine the size information of the sampling point according to the feature information of the sampling point, and the size information is used to indicate that the sampling point belongs to an object whose size is within the target size range. probability;
  • the object detection network layer is configured to detect target sampling points of the same object corresponding to the plurality of sampling points based on the feature information and size information;
  • the position detection network layer is used to detect the position information of the object according to the position information of the target sampling point.
  • the object detection model has two information extraction network layers: a feature extraction network layer and a size information extraction network layer.
  • the feature extraction network layer is used to obtain the feature information of the sampling point
  • the size information extraction network layer can further determine the size information of the sampling point on the basis of the feature information, and the size information indicates that the sampling point belongs to the size
  • the probability of objects within the target size range enables the object detection model to pay attention to the size information of objects belonging to the target size range in addition to the feature information of the objects.
  • the size information represents objects of a specific size. Enables the model to increase the consideration of objects of a specific size, thereby enabling further accurate identification of objects of a specific size.
  • the method of the embodiment of the present application can be applied to a movable platform, and the movable platform recognizes an object during the movement process, so as to perform movement control based on the recognition result.
  • the movable platform in this embodiment of the present application may include: a car, an unmanned aerial vehicle, an unmanned ship, or a robot, wherein the car may be an unmanned vehicle or a manned vehicle, and the unmanned aerial vehicle may be an unmanned aerial vehicle or other unmanned aerial vehicle.
  • the movable platform is not limited to the movable platforms listed above, and can also be other movable platforms.
  • an object detection model may be pre-trained, and the object detection model may be set in the movable platform, or may be set in a server connected to the movable platform.
  • the object detection model may be pre-trained by the business party, and the trained object detection model may be stored in the movable platform, so that the movable platform can recognize the object.
  • the mobile platform may also send the collected data to the server, and the object detection model configured on the server uses the collected data to identify the object, and then returns the identification result to the mobile platform.
  • the business party may prepare sample data for training in advance.
  • the sample data may include: data belonging to objects whose size is within the target size range.
  • it is referred to as the second type of object.
  • the second type of object has a specific size, and the specific size can be based on actual business requirements. Flexible configuration is not limited in this embodiment.
  • the target size range in this embodiment may include: a size range smaller than a preset target size threshold, and the preset target size threshold can be flexibly configured according to business needs ;
  • the sample data also includes data of objects that do not belong to the target size range, which is referred to as the first type of objects in this embodiment.
  • Objects of the second type have a certain size relative to the objects of the first type, eg, objects of smaller size.
  • the data of the second type of object is added to the sample data, which can enhance the recognition of the second type of object by the object detection model.
  • Model training in this embodiment may be supervised training or unsupervised training.
  • a supervised training method can be used to improve the training speed, and the real values can be marked in the sample data.
  • the sample data is marked with the position information of the object.
  • the position information of the object may include one or more kinds of information, and the specific information may be configured according to business needs.
  • the position information of the object may include any of the following: size information, coordinate information or direction information.
  • the sample data may be point cloud data for multiple sampling points or image data for multiple sampling points.
  • point cloud data can be collected by sensors such as lidar or millimeter wave radar.
  • sensors such as lidar or millimeter wave radar.
  • unmanned vehicles use on-board sensors to sense the surrounding environment of the vehicle, and control the steering and speed of the vehicle according to the road, vehicle position and object information obtained by sensing, so that the vehicle can be safely, Drive reliably on the road.
  • Vehicle sensors can include lidar, millimeter-wave radar, and vision sensors.
  • the basic principle of lidar is to actively transmit a laser pulse signal to the detected object, and obtain the reflected pulse signal, and calculate the depth information of the detected object according to the time difference between the transmitted signal and the received signal; Know the emission direction, and obtain the angle information of the measured object relative to the lidar; combine the aforementioned depth and angle information to obtain point cloud data.
  • point cloud data can be converted to image data.
  • the vehicle-mounted sensor may further include a plurality of cameras that can collect image data from multiple viewing angles and a plurality of depth cameras that can collect image data with depth information from multiple viewing angles.
  • Image data can also be collected by image acquisition sensors such as cameras.
  • the above-mentioned sample data may be data obtained after feature engineering of the original data.
  • Feature engineering refers to the process of finding some physically meaningful features from the original data to participate in model training. This process involves data cleaning, data dimensionality reduction, feature extraction, feature normalization, feature evaluation and screening, feature dimensionality reduction or Feature encoding, etc.
  • the point cloud data is unstructured data and needs to be processed into a format that can be input to the object detection model.
  • the point cloud data is processed to obtain the point cloud density corresponding to each voxel of the point cloud data.
  • the point cloud density corresponding to each voxel of the point cloud data is used as the input of the object detection model.
  • the point cloud data processing method may be point cloud three-dimensional grid processing.
  • the point cloud data is divided into grids to obtain multiple voxels of the point cloud data, and the points contained in each voxel in the point cloud data are obtained.
  • the ratio of the number of clouds to the number of all point clouds in the point cloud data constitutes the point cloud density of the voxel.
  • the point cloud density represents the number of point clouds contained in the voxel
  • the point cloud density is large, it means that the voxel has a greater probability of corresponding to an object, so the point cloud corresponding to each voxel of the point cloud data is Density can be used as characteristic information of objects. Processing the irregular point cloud into a regular representation can better represent the contour information of the object.
  • grid data including point cloud density as model input, the number of point clouds of each 3D grid can be distinguished, and the accuracy of object detection can be improved.
  • sample data refers to image data containing real objects, including image data containing first-type objects and image data containing second-type objects.
  • the object detection model can automatically identify the image data based on real objects. Object characteristics.
  • the object detection model can be obtained by training a machine learning model using the sample data.
  • the machine learning model may be a neural network model or the like, such as a deep learning-based neural network model.
  • the specific structural design of the object detection model is one of the important aspects of the training process.
  • the structure of the object detection model at least includes: a feature extraction network layer, a size information extraction network layer, an object detection network layer and a position detection network layer.
  • the feature extraction network layer is used for acquiring feature information of the sampling points based on a deep learning neural network.
  • the object detection model of this embodiment additionally adds a size information extraction network layer, so as to enhance the model's recognition of the second type of objects by extracting the size information of the second type of objects.
  • the size information extraction network layer can be independent of the feature extraction network layer, that is, two independent neural networks are used to extract two types of information respectively. This implementation requires training two independent neural networks, and the algorithm overhead is relatively large. .
  • it can be implemented in the form of a backbone network and a branch network. The backbone network is used to receive the input data and extract the feature information of the object, and the branch network is dedicated to extracting the size information, so that the feature information can be extracted on the basis of The size information is further extracted, the execution overhead is correspondingly reduced, and the model execution efficiency is improved.
  • the feature may be the pixel value of the image data; for point cloud data, the feature may be the point cloud density corresponding to each voxel of the point cloud data.
  • the size information is used to represent the probability that each voxel of the point cloud data belongs to an object whose size is within the target size range, or the size information The probability that each pixel used to characterize the image data belongs to an object whose size is within the target size range.
  • the size information is used to: enhance the recognition by the object detection network layer that the sampling point belongs to an object whose size is within the target size range.
  • the class feature information may be in the form of a feature map, and the feature map may have a specified size, such as the size of H*W, where H and W represent length and width respectively, and specific values can be flexibly configured as needed; Figure A is taken as an example.
  • size information is further extracted, which can be size information extracted from each position in feature map A.
  • the size information represents the probability that the sampling point belongs to an object whose size is within the target size range. , which means that each position in the feature map A represents the probability that the point belongs to an object whose size is within the target size range.
  • the extracted size information can be combined with the feature information for object recognition, and the combination of the two can be implemented in various ways.
  • the two are independent, and they are used as two types of data for object recognition.
  • the feature information and the size information can also be added, and the target sampling point of the same object corresponding to the multiple sampling points can be detected according to the addition result.
  • the size information can be used as additional information of the feature information.
  • a feature map A representing feature information is extracted, and size information can be added to the feature map A to form a new feature map B, and object recognition can be performed based on feature B.
  • the dimension information network layer can be implemented with various network structures.
  • a graph can be used to represent the feature information, which can be implemented by using a convolutional neural network.
  • the size information extraction network layer includes a convolution layer, for example, there may be at least two convolution layers.
  • the convolutional layer in the size information network layer has at least two layers.
  • the size information may be obtained by performing at least two convolution operations on the feature information by the at least two convolution layers.
  • the loss function is also called the cost function.
  • the real value is marked in the sample data, and the loss function is used to estimate the error between the detection value of the model and the real value.
  • the loss function is very important to the recognition accuracy of the model, and it is difficult to design what kind of loss function based on the existing sample data and the requirements of the model.
  • some existing loss functions such as logarithmic loss functions, squared loss functions, exponential loss functions, 0/1 loss functions, etc. can be used to form loss functions of corresponding scenarios.
  • the object detection network layer is used to: obtain the confidence that the sampling point belongs to an object based on the feature information and size information of the sampling point, and use the confidence that the sampling point belongs to the object to detect A target sampling point of the same object corresponding to the plurality of sampling points.
  • the loss functions used in the training process of the object detection model include: an object position information loss sub-function, an object confidence loss sub-function and a size information coefficient loss sub-function.
  • the loss function additionally adds a size information coefficient loss sub-function, so that the model can pay attention to the second type of objects, so that the model can Objects are more clearly differentiated.
  • the optimization objective of the object position information loss function includes: reducing the difference between the object position information obtained from the sample data by the object detection model and the object position information calibrated by the sample data.
  • the position information of the object includes size information, position information or orientation information of the object, and based on this, the position difference includes: the size information, position information or orientation information of the object obtained by the object detection model from the sample data, The difference from the size information, position information or direction information of the object calibrated by the sample data respectively.
  • the length, width, height, position, and orientation of an object are specifically expressed as (x, y, z, l, h, w, ⁇ ), where x, y, and z represent length, width, and height.
  • x, y, and z represent length, width, and height.
  • l, h, w represent the position (that is, the coordinate information of the point), and ⁇ represents the orientation; then in the object state information loss sub-function, the sub-function used to describe the position difference can include:
  • floc( xi ) represents the position information of the object identified by the object detection model.
  • the loss function of this embodiment further includes an object confidence loss sub-function, and the optimization goal of the object confidence loss sub-function includes: improving the confidence of the object detected by the object detection model from the sample data; as an example:
  • fpred(x i ) represents the confidence of the object detected by the object detection model from the sample data, that is, the probability of recognizing the object.
  • a size information coefficient loss function is also added to the loss function in this embodiment, and the size information coefficient loss function is used to: enhance the features of sampling points belonging to objects whose size is within the target size range value, in some examples, the optimization goal of the size information coefficient loss function includes: improving the confidence of the object detection model to detect objects belonging to the size within the target size range from the sample data.
  • the object detection model uses the size information, for each sampling point, detects that the point belongs to the object whose size is within the target size range. Therefore, the loss function of the size information coefficient is to make the eigenvalue of the corresponding position of the small object large, and the eigenvalue of the non-small object to be small, so that the network can distinguish small objects more clearly:
  • fseg(x k ) represents the confidence level that the object detection model detects objects whose size is within the target size range from the sample data.
  • the loss function used in the training process of the object detection model of this embodiment may include:
  • the optimization method In the training process, it is necessary to use the optimization method to optimize the evaluation function and find the model with the highest evaluation. For example, the minimum value (optimal solution) of the output error of the loss function can be found through optimization methods such as gradient descent, and the parameters of the model can be adjusted to the optimum, that is, the optimal coefficients of each network layer in the model can be solved.
  • the process of solving may be to solve for gradients that adjust model parameters by computing the output of the model and the error value of the loss function.
  • a back-propagation function can be called to calculate the gradient, and the calculation result of the loss function can be back-propagated into the object detection model, so that the object detection model can update model parameters.
  • the solution of the loss function described above can be solved using a stand-alone solver.
  • a network branch may be set on the basis of the backbone network to calculate the loss function of the network.
  • the loss function can be divided into the above three sub-functions: the object position information loss sub-function, the object confidence loss sub-function and the size information coefficient loss sub-function.
  • the corresponding three network branches can be set to solve separately.
  • the object detection model is obtained after the training, and the obtained object detection model can also be tested by using the test sample to check the recognition accuracy of the object detection model.
  • the finally obtained object detection model can be set in the movable platform or the server.
  • the space is sampled by the sensors of the movable platform to obtain multiple sampling points to be identified, and the object detection model detects the position information of the object. .
  • the sensors may be lidars, millimeter-wave radars, vision sensors, etc., correspondingly, point cloud data of multiple sampling points to be identified may be collected; Can be converted to image data.
  • the sensor may also be an image sensor, and correspondingly, image data of multiple sampling points to be identified may be collected.
  • the point cloud data can be divided into a grid to obtain multiple voxels. After calculating the point cloud density corresponding to each voxel, the point cloud density corresponding to each voxel is input into the object detection model for identification.
  • the data input to the object detection model includes pixel values of image data. The voxels of the point cloud data are obtained by dividing the point cloud data into a grid.
  • the size information of the sampling points is further extracted. Since the size information represents the probability that the sampling points belong to objects whose size is within the target size range, the size information of the sampling points in this embodiment is Object detection models are better at recognizing objects of a certain size.
  • objects whose size is within the target size range are referred to as the second type of objects; in some examples, the target size range includes: a size range smaller than a preset target size threshold, that is, the second type of objects is a size smaller objects.
  • the feature information can be in the form of a feature map, and the feature map can have a specified size, such as the size of H*W, where H and W represent length and width respectively, and the specific values can be flexibly configured as needed;
  • the size information is further extracted on the basis of the feature map A, which can be the size information extracted from each position in the feature map A.
  • the size information represents the probability that the sampling point belongs to an object whose size is within the target size range, That is, it represents the probability that each position in the feature map A belongs to an object whose size is within the target size range.
  • the extracted size information can be combined with the feature information for object detection, and the combination of the two can be implemented in various ways.
  • the two are independent, and they are used as two types of data for object recognition.
  • the feature information and the size information can also be added, and the target sampling point of the same object corresponding to the multiple sampling points can be detected according to the addition result.
  • the size information can be used as additional information of the feature information.
  • a feature map A representing feature information is extracted, and size information can be added to the feature map A to form a new feature map B, and object recognition can be performed based on feature B.
  • the size information extraction network layer can be implemented with various network structures.
  • a graph can be used to represent the feature information, which can be implemented by using a convolutional neural network.
  • the size information extraction network layer includes a convolution layer, for example, there may be at least two convolution layers. There are at least two convolution layers in the size information extraction network layer.
  • the size information may be obtained by performing at least two convolution operations on the feature information by the at least two convolution layers.
  • the feature extraction network layer is used to obtain the feature information of the sampling points based on the neural network;
  • the size information extraction network layer is used to extract the feature information according to the sampling points the feature information, determine the size information of the sampling point, and the size information is used to represent the probability that the sampling point belongs to an object whose size is within the target size range;
  • the object detection network layer is used based on the feature The information and size information detect the target sampling point of the same object corresponding to the plurality of sampling points;
  • the position detection network layer is used to detect the position information of the object according to the position information of the target sampling point.
  • the object detection network layer is configured to: based on the feature information and size information of the sampling point, obtain a confidence that the sampling point belongs to an object, and use the confidence that the sampling point belongs to an object to detect multiple The target sampling points of the same object corresponding to the sampling points.
  • the model in this embodiment can extract specific size information, and since the size information represents the probability that the sampling point belongs to an object whose size is within the target size range, the object detection model can improve the accuracy of the sampling point. It belongs to the attention of objects whose size is within the target size range, and thus can better detect objects whose size is within the target size range.
  • the object recognition accuracy can still be guaranteed under the condition of sparse point clouds.
  • the point cloud obtained by the mobile platform using a single sensor detection is generally sparse. Under the condition of sparse point cloud, the object detection effect is usually not good.
  • related technologies use a combination of multiple sensors to obtain point clouds. For example, while using lidar detection to obtain point clouds, other sensors are used for auxiliary fusion to obtain dense and high-quality point clouds. The data fusion process between them is more complicated, and at the same time, multiple sensors also increase the hardware cost.
  • the object recognition accuracy can be achieved, so that the movable platform does not need to use other sensors for auxiliary fusion.
  • the neural network in deep learning is used to detect The position and confidence of the three-dimensional object, by adding a network branch to the neural network, and improving the training strategy of the neural network in the deep learning algorithm, so that the branch is used to extract the size information of small objects, and finally makes the model small object detection more accurate. friendly.
  • the point cloud is divided into three-dimensional grid according to a certain resolution in the xyz direction, that is, the three-dimensional space is voxelized.
  • the point cloud density in the voxel determines the point cloud feature corresponding to the voxel; if there is a point cloud in each voxel, calculate the point cloud density p of the position, and set the feature of the position as p. voxels, set their point cloud features to 0; thus generating the input dimension that the neural network can receive.
  • the input data will first go through the feature extraction network layer in the object detection model, so that the features of the point cloud can be extracted to generate a feature map, which represents the feature information of the sampling points.
  • another network branch that is, the size information extraction network layer, will be connected to generate a new feature map, which represents the size information of the sampling points.
  • the network branch can perform two convolution operations, and each position of the feature map will generate a size information, and the size information has better semantics to describe the objects belonging to the target size range Information that characterizes the probability that the location belongs to an object whose size is within the target size range.
  • the extracted size information can be added with the feature map of the feature information to obtain a new feature map B, and the addition operation can be used to fuse stronger semantic information.
  • the target sampling points of the same object corresponding to the plurality of sampling points can be detected based on the feature information and size information, and then according to the position of the target sampling point
  • the information detects the position information of the object.
  • a series of candidate frames can be identified. Each candidate frame may correspond to an object.
  • the sampling points in the candidate frame are the target sampling points corresponding to the same object.
  • the confidence level represents the probability of belonging to the object, the probability that the candidate frame corresponds to belonging to the object can be determined, the probability corresponding to each candidate frame is sorted, and the sorting is performed according to the set threshold.
  • An object is identified, and then the position information of the object can be detected according to the position information of the target sampling point in the candidate frame, and the final detection result is obtained.
  • the position information of the object identified by the object detection model of this embodiment can be used for automatic movement decision of the movable platform, for example, it can be used for automatic driving decision of a car, automatic flight decision of an unmanned aerial vehicle, and the like.
  • the foregoing method embodiments may be implemented by software, and may also be implemented by hardware or a combination of software and hardware.
  • software implementation as an example, as a device in a logical sense, it is formed by reading the corresponding computer program instructions in the non-volatile memory into the memory through the processor that detects the object in which it is located.
  • FIG. 3 which is a hardware structure diagram for implementing the object detection apparatus 300 of this embodiment, except for the processor 301 , the memory 302 , and the non-volatile memory 303 shown in FIG. 3
  • the object detection device used for implementing the object recognition method in the embodiment usually according to the actual function of the object detection device, may also include other hardware, which will not be repeated here.
  • the processor 301 implements the following steps when executing the computer program:
  • a plurality of sampling points are obtained by sampling the space through the sensor of the movable platform
  • the object detection model includes: a feature extraction network layer, a size information extraction network layer, an object detection network layer and a position detection network layer;
  • the feature extraction network layer is used to obtain feature information of the sampling points based on a neural network
  • the size information extraction network layer is used to determine the size information of the sampling point according to the feature information of the sampling point, and the size information is used to indicate that the sampling point belongs to an object whose size is within the target size range. probability;
  • the object detection network layer is configured to detect target sampling points of the same object corresponding to the plurality of sampling points based on the feature information and size information;
  • the position detection network layer is used to detect the position information of the object according to the position information of the target sampling point.
  • the target size range includes a size range smaller than a preset target size threshold.
  • the data of the multiple sampling points obtained by spatial sampling includes: point cloud data of the multiple sampling points and/or image data of the multiple sampling points.
  • the data input to the object detection model includes: a point cloud density corresponding to each voxel of the point cloud data and/or a pixel value of the image data.
  • the voxels are obtained by rasterizing the point cloud data.
  • the size information is used to characterize the probability that each voxel of the point cloud data belongs to an object whose size is within the target size range, or the size information is used to characterize that each pixel of the image data belongs to Probability of an object whose size is within the target size range.
  • the size information is used to: enhance the recognition by the object detection network layer that the sampling point belongs to an object whose size is within a target size range.
  • the object detection network layer is configured to: add the feature information and the size information, and use the addition result to detect a target sampling point of the same object corresponding to the plurality of sampling points.
  • the feature extraction network layer is used for acquiring feature information of the sampling points based on a deep learning neural network.
  • the size information extraction network layer is a branch of the feature extraction network layer.
  • the dimension information extraction network layer includes a convolutional layer.
  • the size information is obtained by using the convolution layer to perform a convolution operation on the feature information.
  • the size information is obtained by performing at least two convolution operations on the feature information by the at least two convolutional layers.
  • the object detection network layer is configured to: based on the feature information and size information of the sampling point, obtain a confidence that the sampling point belongs to an object, and use the confidence that the sampling point belongs to an object to detect multiple The target sampling points of the same object corresponding to the sampling points.
  • the loss functions used in the training process of the object detection model include: an object position information loss sub-function, an object confidence loss sub-function, and a size information coefficient loss sub-function.
  • the optimization objective of the object position information loss function includes: reducing the difference between the object position information obtained from the sample data by the object detection model and the object position information calibrated by the sample data.
  • the optimization objective of the object confidence loss sub-function includes: improving the confidence of the object detected by the object detection model from the sample data.
  • the size information coefficient loss function is used to enhance feature values of sample points belonging to objects whose size is within the target size range.
  • the optimization objective of the size information coefficient loss function includes: improving the confidence that the object detection model detects objects whose size is within the target size range from the sample data.
  • the training process of the object detection model includes back-propagating the calculation result of the loss function into the object detection model, so that the object detection model updates model parameters.
  • the location information of the object includes any of the following: size information, coordinate information or orientation information.
  • the apparatus is applied to a movable platform.
  • the point cloud data is acquired by using a lidar or a camera device with a depth information acquisition function configured on the movable platform.
  • the image data is acquired using a camera device disposed on the movable platform.
  • the movable platform includes: an unmanned aerial vehicle, a car, an unmanned boat, or a robot.
  • the detected location information of the object is used to: make an autonomous driving decision for the car.
  • the detected location information of the object is used to: make automatic flight decisions for the UAV.
  • an embodiment of the present application further provides a movable platform 400, including: a body 401; a power system 402 installed in the body 401 to provide power for the movable platform; The object detection device 300 described in the embodiment.
  • the movable platform 400 is a vehicle, an unmanned aerial vehicle, an unmanned ship or a mobile robot.
  • the embodiments of this specification further provide a computer-readable storage medium, where several computer instructions are stored on the readable storage medium, and when the computer instructions are executed, the steps of the object detection method in any one of the embodiments are implemented.
  • Embodiments of the present specification may take the form of a computer program product embodied on one or more storage media having program code embodied therein, including but not limited to disk storage, CD-ROM, optical storage, and the like.
  • Computer-usable storage media includes permanent and non-permanent, removable and non-removable media, and storage of information can be accomplished by any method or technology.
  • Information may be computer readable instructions, data structures, modules of programs, or other data.
  • Examples of computer storage media include, but are not limited to, phase-change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Flash Memory or other memory technology, Compact Disc Read Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cassettes, magnetic tape magnetic disk storage or other magnetic storage devices or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
  • PRAM phase-change memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • RAM random access memory
  • ROM read only memory
  • EEPROM Electrically Erasable Programmable Read Only Memory
  • Flash Memory or other memory technology
  • CD-ROM Compact Disc Read Only Memory
  • CD-ROM Compact Disc Read Only Memory
  • DVD Digital Versatile Disc
  • Magnetic tape cassettes magnetic tape magnetic disk storage or other magnetic storage devices or any other non-

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé de détection d'objet, un dispositif, une plateforme mobile et un support de stockage lisible par ordinateur ; une couche de réseau d'extraction de caractéristiques dans un modèle de détection d'objet peut être utilisée pour obtenir des informations de caractéristiques de points d'échantillon ; une couche de réseau d'extraction d'informations de taille est utilisée pour, en fonction desdites informations de caractéristique desdits points d'échantillonnage, déterminer des informations de taille des points d'échantillonnage, lesdites informations de taille étant utilisées pour caractériser la probabilité que lesdits points d'échantillon appartiennent à un objet dont la taille se situe dans une plage de taille cible ; une couche de réseau de détection d'objet est utilisée pour détecter un point d'échantillon cible correspondant au même objet dans une pluralité de points d'échantillon sur la base des informations de caractéristique et des informations de taille ; sur la base des informations de caractéristiques, il est possible d'extraire des informations de taille, ce qui permet d'améliorer le foyer du modèle de détection d'objet sur un objet dont les points d'échantillon appartiennent à une plage de taille cible, ainsi, la couche de réseau de détection de position peut détecter avec précision les informations de position de l'objet en fonction des informations de position dudit point d'échantillon cible, ce qui permet une meilleure identification d'objets ayant une taille dans la plage de taille cible.
PCT/CN2020/137299 2020-12-17 2020-12-17 Procédé de détection d'objet, dispositif, plateforme mobile et support de stockage lisible par ordinateur WO2022126523A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/137299 WO2022126523A1 (fr) 2020-12-17 2020-12-17 Procédé de détection d'objet, dispositif, plateforme mobile et support de stockage lisible par ordinateur

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/137299 WO2022126523A1 (fr) 2020-12-17 2020-12-17 Procédé de détection d'objet, dispositif, plateforme mobile et support de stockage lisible par ordinateur

Publications (1)

Publication Number Publication Date
WO2022126523A1 true WO2022126523A1 (fr) 2022-06-23

Family

ID=82058813

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/137299 WO2022126523A1 (fr) 2020-12-17 2020-12-17 Procédé de détection d'objet, dispositif, plateforme mobile et support de stockage lisible par ordinateur

Country Status (1)

Country Link
WO (1) WO2022126523A1 (fr)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110032962A (zh) * 2019-04-03 2019-07-19 腾讯科技(深圳)有限公司 一种物体检测方法、装置、网络设备和存储介质
CN110298262A (zh) * 2019-06-06 2019-10-01 华为技术有限公司 物体识别方法及装置
CN110942000A (zh) * 2019-11-13 2020-03-31 南京理工大学 一种基于深度学习的无人驾驶车辆目标检测方法
US20200145569A1 (en) * 2017-10-19 2020-05-07 DeepMap Inc. Lidar to camera calibration for generating high definition maps

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200145569A1 (en) * 2017-10-19 2020-05-07 DeepMap Inc. Lidar to camera calibration for generating high definition maps
CN110032962A (zh) * 2019-04-03 2019-07-19 腾讯科技(深圳)有限公司 一种物体检测方法、装置、网络设备和存储介质
CN110298262A (zh) * 2019-06-06 2019-10-01 华为技术有限公司 物体识别方法及装置
CN110942000A (zh) * 2019-11-13 2020-03-31 南京理工大学 一种基于深度学习的无人驾驶车辆目标检测方法

Similar Documents

Publication Publication Date Title
Chen et al. Lidar-histogram for fast road and obstacle detection
CN110988912B (zh) 自动驾驶车辆的道路目标与距离检测方法、系统、装置
EP3506158A1 (fr) Procédé, appareil et dispositif de détermination de ligne de voie sur une route
CN112101092A (zh) 自动驾驶环境感知方法及系统
CN111222395B (zh) 目标检测方法、装置与电子设备
CN113706480B (zh) 一种基于关键点多尺度特征融合的点云3d目标检测方法
CN114820465B (zh) 点云检测模型训练方法、装置、电子设备及存储介质
WO2022126522A1 (fr) Procédé de reconnaissance d'objets, appareil, plate-forme mobile et support de stockage
CN113807350A (zh) 一种目标检测方法、装置、设备及存储介质
CN111428859A (zh) 自动驾驶场景的深度估计网络训练方法、装置和自主车辆
CN110674705A (zh) 基于多线激光雷达的小型障碍物检测方法及装置
CN115436920A (zh) 一种激光雷达标定方法及相关设备
CN115393601A (zh) 一种基于点云数据的三维目标检测方法
CN113536920B (zh) 一种半监督三维点云目标检测方法
CN114241448A (zh) 障碍物航向角的获取方法、装置、电子设备及车辆
WO2022126523A1 (fr) Procédé de détection d'objet, dispositif, plateforme mobile et support de stockage lisible par ordinateur
CN114638996A (zh) 基于对抗学习的模型训练方法、装置、设备和存储介质
Wang et al. Research on vehicle detection based on faster R-CNN for UAV images
CN116246119A (zh) 3d目标检测方法、电子设备及存储介质
CN115453570A (zh) 一种多特征融合的矿区粉尘滤除方法
CN112712062A (zh) 基于解耦截断物体的单目三维物体检测方法和装置
CN116964472A (zh) 用于借助于雷达传感器系统的反射信号来探测环境的至少一个对象的方法
Pereira et al. A 3-D Lightweight Convolutional Neural Network for Detecting Docking Structures in Cluttered Environments
US20240135195A1 (en) Efficient search for data augmentation policies
Babolhavaeji et al. Multi-Stage CNN-Based Monocular 3D Vehicle Localization and Orientation Estimation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20965539

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20965539

Country of ref document: EP

Kind code of ref document: A1