CN115457492A - Target detection method and device, computer equipment and storage medium - Google Patents

Target detection method and device, computer equipment and storage medium Download PDF

Info

Publication number
CN115457492A
CN115457492A CN202211206016.9A CN202211206016A CN115457492A CN 115457492 A CN115457492 A CN 115457492A CN 202211206016 A CN202211206016 A CN 202211206016A CN 115457492 A CN115457492 A CN 115457492A
Authority
CN
China
Prior art keywords
point cloud
target
cloud data
sample
sample target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211206016.9A
Other languages
Chinese (zh)
Inventor
李�瑞
王亚军
王邓江
司宇
马冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Wanji Iov Technology Co ltd
Original Assignee
Suzhou Wanji Iov Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Wanji Iov Technology Co ltd filed Critical Suzhou Wanji Iov Technology Co ltd
Priority to CN202211206016.9A priority Critical patent/CN115457492A/en
Publication of CN115457492A publication Critical patent/CN115457492A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/52Surveillance or monitoring of activities, e.g. for recognising suspicious objects
    • G06V20/54Surveillance or monitoring of activities, e.g. for recognising suspicious objects of traffic, e.g. cars on the road, trains or boats
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/07Target detection

Abstract

The application relates to a target detection method, a target detection device, computer equipment and a storage medium. The method comprises the following steps: the method comprises the steps of obtaining current frame point cloud data of a target scene, inputting the current frame point cloud data into a point cloud target detection model, detecting targets in the current frame point cloud data to obtain target detection results in the current frame point cloud data, and training the point cloud target detection model based on a prediction detection frame of each sample target in a point cloud data training set and a standard detection frame of each sample target. By the method, the problem that the point cloud target detection model is trained through information which does not have fitting performance with the characteristic information of the target is solved, the point cloud target detection model can be trained through the characteristic information which has fitting performance with the characteristic information of the target, namely the characteristic information of the detection frame of the sample target, and therefore the accuracy and the precision of the result of the point cloud target detection model for detecting the target are higher.

Description

Target detection method, target detection device, computer equipment and storage medium
Technical Field
The present application relates to the field of object detection technologies, and in particular, to an object detection method and apparatus, a computer device, and a storage medium.
Background
With the development of intelligent traffic technology, the point cloud target detection technology has increasingly important value. The method is a mainstream method for detecting the target in the current intelligent traffic field by collecting point cloud data by using a laser radar and detecting a point cloud target through a deep neural network model. In the method, the quality of the deep neural network model directly influences the accuracy of the detection result.
In the deep neural network model training process, the related technology mainly calculates loss through a Smooth (Smooth) L loss function, and supervises and optimizes the point cloud target detection network through the loss value, however, the related technology causes the regression target in the point cloud detection to be not accurate enough, and the accuracy of detecting the target through the point cloud target detection network is poor.
Disclosure of Invention
In view of the foregoing, it is desirable to provide an object detection method, an object detection apparatus, a computer device, and a storage medium.
In a first aspect, the present application provides a target detection method, including:
acquiring current frame point cloud data of a target scene;
inputting the current frame point cloud data into a point cloud target detection model, and detecting a target in the current frame point cloud data to obtain a target detection result in the current frame point cloud data;
the point cloud target detection model is obtained by training a prediction detection frame of each sample target and a standard detection frame of each sample target in a point cloud data training set.
In one embodiment, the process of constructing the point cloud target detection model includes:
determining a point cloud data training set according to original point cloud data under various target scenes; the point cloud data training set comprises a plurality of sample targets;
acquiring sample target loss information of an initial point cloud target detection model according to the point cloud data training set;
and training the initial point cloud target detection model through sample target loss information until the initial point cloud target detection model is trained, so as to obtain the point cloud target detection model.
In one embodiment, determining a point cloud data training set according to original point cloud data under a plurality of target scenes comprises:
preprocessing original point cloud data to obtain a sample point cloud data set;
and dividing the sample point cloud data set to obtain a point cloud data training set.
In one embodiment, format conversion processing is carried out on original point cloud data to obtain a standard format point cloud data set; the standard format point cloud data set is a point cloud data set with a format matched with a format required by a training point cloud target detection model;
labeling each sample target in the standard format point cloud data set to obtain a labeled standard format point cloud data set; the marked point cloud data set in the standard format comprises standard marking data of each sample target; the standard marking data comprises a standard detection frame;
and determining the marked standard format point cloud data set as a sample point cloud data set.
In one embodiment, before performing the format conversion process on the raw point cloud data, the method further comprises:
removing invalid point cloud data in the original point cloud data; the invalid point cloud data represents the original point cloud data where the sample object is not present.
In one embodiment, the method further includes:
inputting the point cloud data verification set into a point cloud target detection model to obtain a test target detection result;
and if the similarity between the detection result of the test target and the standard detection result of the sample target is greater than the preset value, determining that the point cloud target detection model passes verification.
In one embodiment, obtaining sample target loss information of an initial point cloud target detection model according to a point cloud data training set includes:
performing voxel grid division processing on the point cloud data training set according to a preset voxel grid specification to obtain a plurality of voxel grid point cloud data; each voxel grid point cloud data comprises coordinate information of each point cloud in the voxel grid;
and obtaining sample target loss information of the initial point cloud target detection model according to the point cloud data of each voxel grid.
In one embodiment, the initial point cloud target detection model comprises an initial sparse network model and an initial detection network model;
obtaining sample target loss information of an initial point cloud target detection model according to point cloud data of each voxel grid, wherein the sample target loss information comprises:
inputting the point cloud data of each voxel grid into an initial sparse network model to obtain a two-dimensional characteristic map corresponding to the point cloud data of each voxel grid;
inputting the two-dimensional characteristic diagram into an initial detection network model to obtain a sample target tensor of an initial point cloud target detection model;
and obtaining sample target loss information of the initial point cloud target detection model according to the sample target tensor.
In one embodiment, the sample target tensor comprises a sample target class tensor, a sample target regression tensor, and a sample target direction tensor; the sample target loss information comprises sample target category loss information, sample target direction loss information and sample target regression loss information;
according to the sample target tensor, obtaining sample target loss information of the initial point cloud target detection model, comprising the following steps:
generating sample target category loss information according to the sample target category tensor; generating sample target direction loss information according to the sample target direction tensor; and decoding the sample target regression tensor to obtain sample target horizontal loss information and sample target vertical loss information, and generating sample target regression loss information according to the sample target horizontal loss information and the sample target vertical loss information.
In one embodiment, decoding the sample target regression tensor to obtain sample target horizontal loss information and sample target vertical loss information includes:
decoding the sample target regression tensor to obtain a sample target horizontal tensor and a sample target vertical tensor;
obtaining sample target vertical loss information through a preset loss function and a sample target vertical tensor; and acquiring the loss information of the sample target level according to the sample target level tensor and the sample target standard level tensor.
In one embodiment, the sample target level tensor comprises attribute information of the sample target prediction detection box, and the sample target standard level tensor comprises the attribute information of the sample target standard detection box;
obtaining sample target level loss information according to the sample target level tensor and the sample target standard level tensor, including:
fitting the sample target level tensor to obtain distribution data of a prediction detection frame, and fitting the sample target standard level tensor to obtain distribution data of a standard detection frame;
and determining the target level loss information of the sample according to the similarity between the distribution data of the predicted detection frame and the distribution data of the standard detection frame.
In a second aspect, the present application provides an object detection apparatus, comprising:
the point cloud data acquisition module is used for acquiring current frame point cloud data of a target scene;
the target detection module is used for inputting the current frame point cloud data into the point cloud target detection model, detecting a target in the current frame point cloud data and obtaining a target detection result in the current frame point cloud data;
the point cloud target detection model is obtained by training a prediction detection frame of each sample target and a standard detection frame of each sample target in a point cloud data training set.
In a third aspect, the present application provides a computer device, comprising a memory and a processor, wherein the memory stores a computer program, and the processor implements the steps of the method in any of the embodiments of the first aspect when executing the computer program.
In a fourth aspect, the present application provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method in any of the embodiments of the first aspect described above.
In a fifth aspect, the present application also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of the method of any one of the above first aspects.
According to the target detection method, the target detection device, the computer equipment and the storage medium, the computer equipment can obtain current frame point cloud data of a target scene, the current frame point cloud data is input into a point cloud target detection model, targets in the current frame point cloud data are detected, a target detection result in the current frame point cloud data is obtained, and the point cloud target detection model is obtained by training a prediction detection frame of each sample target and a standard detection frame of each sample target in a point cloud data training set; the method avoids the problem that the point cloud target detection model is trained by information which does not have the fitting property with the characteristic information of the target (the related offset information of the detection frame of the sample target), namely the offset of the detection frame of the sample target relative to the anchor frame, and can directly train the point cloud target detection model by the characteristic information which has the fitting property with the characteristic information of the target, namely the characteristic information of the detection frame of the sample target, so that the accuracy and the precision of the result of the point cloud target detection model for detecting the target are higher.
Drawings
FIG. 1 is a diagram illustrating an internal structure of a computer device according to an embodiment;
FIG. 2 is a schematic flow chart diagram of a method for object detection in one embodiment;
FIG. 3 is a schematic representation of a heading angle of a target in a three-dimensional coordinate system with respect to a lidar in one embodiment;
FIG. 4 is a schematic flow chart of a method for constructing a point cloud object detection model in another embodiment;
FIG. 5 is a schematic flow chart illustrating a method for determining a training set of point cloud data from raw point cloud data in a plurality of target scenes in accordance with another embodiment;
FIG. 6 is a schematic flow chart of a method for pre-processing raw point cloud data according to another embodiment;
FIG. 7 is a schematic flow chart illustrating a method for obtaining sample target loss information of an initial point cloud target detection model according to a point cloud data training set in another embodiment;
FIG. 8 is a schematic flow chart illustrating a method for obtaining sample target loss information of an initial point cloud target detection model according to point cloud data of each voxel grid in another embodiment;
FIG. 9 is a schematic flowchart illustrating a method for obtaining sample target loss information of an initial point cloud target detection model according to a sample target tensor in another embodiment;
FIG. 10 is a flowchart illustrating a method for obtaining loss information of the sample target level based on the sample target level tensor and the sample standard target level tensor in another embodiment;
FIG. 11 is a block diagram showing the structure of an object detection apparatus according to an embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application.
The target detection method provided by the application can be applied to the computer equipment shown in FIG. 1. Optionally, the computer device may be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices, and may also be implemented by an independent server or a server cluster formed by a plurality of servers, and the embodiment does not limit the specific form of the computer device. The following embodiments will specifically describe the specific processes of the target detection method.
In order to improve the accuracy of the target detection result, sample data with depth information can be collected, target detection is carried out on three-dimensional sample data through a target detection algorithm, and meanwhile the accuracy of the target detection algorithm can be improved, and the accuracy of the target detection result is further improved. Based on this, an embodiment of the present application provides an object detection method, as shown in fig. 2, which is a schematic flowchart of the object detection method, and is described by taking as an example that the method is applied to the computer device in fig. 1, the method includes the following steps:
s100, obtaining current frame point cloud data of a target scene.
Specifically, the laser radar can acquire a frame of point cloud data of a target scene in a space range to be detected, namely current frame point cloud data. Alternatively, the lidar may be a single line lidar, a multi-line lidar or the like. Wherein, one or more targets to be detected are included in the space range to be detected. Alternatively, the target may be a pedestrian, a vehicle, an obstacle, a mechanical device, or the like. Optionally, the current frame point cloud data may be three-dimensional point cloud data.
Further, the laser method can send the collected current frame point cloud data of the target scene in the space range to be detected to the computer device, and then the computer device obtains the current frame point cloud data of the target scene in the space range to be detected.
S200, inputting the current frame point cloud data into a point cloud target detection model, and detecting a target in the current frame point cloud data to obtain a target detection result in the current frame point cloud data. The point cloud target detection model is obtained by training a prediction detection frame of each sample target and a standard detection frame of each sample target in a point cloud data training set.
Specifically, the computer device may input the acquired current frame point cloud data into the point cloud target detection model, so as to detect a target in the current frame point cloud data through the point cloud target detection model, and obtain a target detection result in the current frame point cloud data.
It should be noted that the target detection result may be size information, position information, total amount of point cloud data corresponding to the target, distance between the target and the laser radar, and a heading angle between the target and the laser radar, etc. of the target in the current frame point cloud data. Optionally, the heading angle between the target and the lidar may represent an included angle between the movement direction of the target and the y-axis direction in the three-dimensional coordinate system, and the included angle may increase clockwise, and the value range is 0 to 360 degrees. Alternatively, the three-dimensional coordinate system may be a spatial coordinate system in which the lidar is located. Fig. 3 is a schematic diagram showing a heading angle r between a target and a laser radar in a three-dimensional coordinate system xyz.
It is understood that the point cloud target detection model may be a pre-trained detection model. Optionally, the computer device may be obtained by training the prediction detection frame of each sample target and the standard detection frame of each sample target in the point cloud data training set. Optionally, the point cloud data training set may include multiple frames of three-dimensional point cloud data acquired by a laser radar.
In the embodiment of the application, in the point cloud target detection model training process, the feature information of the predicted detection frame of the sample target and the feature information of the standard detection frame of the sample target may be used. Optionally, the feature information of the detection frame may include size information, position information, total amount of point cloud data contained in the detection frame, distance between the detection frame and the lidar, and heading angle between the detection frame and the lidar, and the like. Alternatively, the standard detection box may be a sample target standard box labeled by multiple frames of point cloud data in the point cloud data training set.
The target detection method provided by the embodiment of the application can be used for directly training the characteristic information of the prediction detection frame of each sample target and the standard detection frame of each sample target in the point cloud data training set to obtain a point cloud target detection model, then obtaining current frame point cloud data of a target scene, inputting the current frame point cloud data into the point cloud target detection model, and detecting the target in the current frame point cloud data to obtain a target detection result in the current frame point cloud data; the method can realize target detection through three-dimensional point cloud data, and can improve the accuracy of target detection; meanwhile, the method avoids the problem that the point cloud target detection model is trained by information (related offset information of the detection frame of the sample target) which does not have the fitting property with the characteristic information of the target, namely the offset of the detection frame of the sample target relative to the anchor frame, and can directly train the point cloud target detection model by the characteristic information which has the fitting property with the characteristic information of the target, namely the characteristic information of the detection frame of the sample target, so that the result accuracy and precision of the point cloud target detection model for detecting the target are higher.
The construction process of the point cloud target detection model will be described below. In an embodiment, as shown in fig. 4, the process of constructing the point cloud target detection model includes:
s300, determining a point cloud data training set according to original point cloud data under various target scenes; the point cloud data training set includes a plurality of sample targets.
Specifically, in order to improve the generalization capability of the point cloud target detection model and enable the trained point cloud target detection model to be applicable to target point cloud detection in any scene, multi-frame original point cloud data in multiple different target scenes in different preset spatial ranges can be acquired through a laser radar. The laser radar can acquire original point cloud data in real time, or can store the acquired multi-frame original point cloud data to positions such as a local position, a hard disk position and a cloud position, and when the point cloud target detection model is constructed, the computer equipment acquires the multi-frame original point cloud data in various target scenes from the positions such as the local position, the hard disk position and the cloud position.
When the laser radar collects original point cloud data, multi-frame original point cloud data can be collected in the same target scene. It should be noted that the computer device may acquire multiple frames of original point cloud data acquired by the laser radar in multiple different target scenes, and combine the acquired multiple frames of original point cloud data in the multiple different target scenes to form a point cloud data training set.
Alternatively, each frame of raw point cloud data may not include a sample target, or may include one or more sample targets, however, multiple sample targets may be included in all of the raw point cloud data in the training set of point cloud data. Alternatively, the sample object may be a pedestrian, a vehicle, an obstacle, a mechanical device, or the like; the more the types of the sample targets are, the better the generalization ability of the point cloud target detection model obtained after training is.
S400, obtaining sample target loss information of the initial point cloud target detection model according to the point cloud data training set. The sample target loss information comprises target regression loss information constructed based on the prediction detection frame of each sample target and the standard detection frame of each sample target.
Specifically, the computer device may perform arithmetic operation, data conversion processing, comparison processing, and/or analysis processing on multiple frames of original point cloud data in the point cloud data training set to obtain a processing result, and then input the processing result to the initial point cloud target detection model to obtain sample target loss information of the initial point cloud target detection model.
The arithmetic operation may be an addition, a subtraction, a multiplication, a division, a derivation, a logarithm, an exponent, or the like. In the embodiment of the present application, the arithmetic operations may be the same or different. The data conversion process may be a data format conversion process and/or a data shift process, and the like. Alternatively, the data format conversion process may be understood as a process of converting raw point cloud data from one format to another. Alternatively, the data shift process may be understood as a process of moving each data in the original point cloud data from one location to another within a preset spatial range.
In addition, the computer equipment can also directly input multi-frame original point cloud data in the point cloud data training set into the initial point cloud target detection model to obtain sample target loss information of the initial point cloud target detection model. Alternatively, the above-mentioned initial point cloud target detection model may be implemented by at least one of a long-short term memory neural network model, a recurrent neural network model, an antagonistic neural network model, and the like. Alternatively, the initial point cloud target detection model may perform convolution, down-sampling, up-sampling, and/or arithmetic operations on the original point cloud data.
It should be noted that the sample target loss information may be understood as a sample target loss value calculated through a loss function adopted in the training process of the initial point cloud target detection model, the feature information of a prediction detection frame output by the initial point cloud target detection model, and the feature information of a corresponding standard detection frame pre-labeled by the point cloud data training set. The sample target loss information may be spatial loss information, i.e., X-direction loss information, Y-direction loss information, and Z-direction loss information.
In the embodiment of the present application, an absolute value loss function, a log-log loss function, an exponential loss function, a perceptual loss function, or the like may be used in the calculation of the loss information in the Z direction.
S500, training the initial point cloud target detection model through sample target loss information until the initial point cloud target detection model is trained, and obtaining the point cloud target detection model.
Specifically, the computer device can perform back propagation on the sample target loss information to update network parameters in the initial point cloud target detection model to obtain an updated initial point cloud target detection model, then continuously iterate the steps by adopting a gradient optimization algorithm, obtain the updated initial point cloud target detection model sample target loss information according to the point cloud data training set until the sample target loss information meets preset error information or the iteration number reaches a preset iteration number threshold, indicate that the initial point cloud target detection model is trained completely, and determine the current updated initial point cloud target detection model as a pre-trained point cloud target detection model.
The target detection method provided by the embodiment of the application can determine a point cloud data training set according to original point cloud data under various target scenes, obtain sample target loss information constructed based on a prediction detection frame of each sample target and a standard detection frame of each sample target according to the point cloud data training set, train an initial point cloud target detection model through the sample target loss information until the initial point cloud target detection model is trained completely, and obtain the point cloud target detection model; the method can train the initial point cloud target detection model directly through the characteristic information which is more in fit with the characteristic information of the target, namely the characteristic information of the detection frame of the sample target, so that the detection frame output by the point cloud target detection model obtained through training in the application process can be fitted with the standard frame of the target to be detected, and the accuracy of the detection result is guaranteed.
In some scenarios, in the network model training process, not only the training set is required to train the network model, but also the optimal network model is determined by combining the verification set. In an embodiment, as shown in fig. 5, the step of determining a point cloud data training set according to the original point cloud data in a plurality of target scenes in S300 may include the following steps:
s310, preprocessing the original point cloud data to obtain a sample point cloud data set.
Specifically, the computer device may pre-process multi-frame original point cloud data of different times in different target scenes in different preset spatial ranges collected by the laser radar to obtain a sample point cloud data set. The preprocessing may be denoising processing, format conversion processing, random screening processing, segmentation processing, and/or the like.
In the embodiment of the application, the sample point cloud data set is different from the multi-frame original point cloud data under a plurality of different target scenes in different preset spatial ranges.
The following describes how to pre-process the original point cloud data. In another embodiment, as shown in fig. 6, the step in S310 may include:
s311, carrying out format conversion processing on the original point cloud data to obtain a standard format point cloud data set; the standard format point cloud data set is a point cloud data set with a format matched with a format required by the training point cloud target detection model.
Specifically, the original point cloud data acquired by the laser radar is point cloud data in a pcd format, and the point cloud data in the pcd format cannot be directly used as a training set. Before the actual network model is trained, the point cloud data in the pcd format needs to be converted into a format matched with the format required by the training point cloud target detection model, namely a standard format.
It should be noted that, the computer device may perform format conversion processing on the multi-frame original point cloud data acquired by the laser radar in different preset spatial ranges under different target scenes, and convert the multi-frame pcd-format original point cloud data into multi-frame standard-format point cloud data, that is, a standard-format point cloud data set.
Alternatively, the standard format may be txt, xyz, or the like. However, in the embodiment of the present application, the standard format may be a bin format.
In addition, in some scenes, the laser radar acquires multi-frame original point cloud data of different moments in a plurality of different target scenes in different preset spatial ranges, so that the situation that a sample target does not appear in a part of original point cloud data is inevitable. Based on this, in order to avoid increasing the data processing amount by processing the invalid point cloud data in the network model training process, in an embodiment, before the step in S311 is executed, the target detection method may further include: removing invalid point cloud data in the original point cloud data; the invalid point cloud data represents the original point cloud data where the sample object is not present.
It can be understood that the original point cloud data acquired by the laser radar carries the reflection intensity of each position point cloud data, and the reflection intensity of the same position point cloud data is different between the case that a sample target exists and the case that no sample target exists, so based on the principle, the computer equipment can eliminate invalid point cloud data in all frames of original point cloud data to remove all frames of original point cloud data without the sample target, and further perform format conversion processing and other subsequent processing on all frames of original point cloud data with the sample target.
Meanwhile, in order to improve the generalization capability of the point cloud target detection model obtained by training, after invalid point cloud data in all frames of original point cloud data are eliminated, data enhancement processing can be performed on the eliminated point cloud data so as to increase the data quantity in the point cloud data training set. Alternatively, the data enhancement processing described above may include a sample target size conversion processing and a sample target rotation processing, and the like. Alternatively, the sample target size conversion process may be a process of converting size information of the sample target in each frame of point cloud data. Alternatively, the sample target rotation process may be a process of rotating the sample target in each frame of point cloud data.
In practical applications, in order to make the point cloud data after data enhancement process fit the actual scene, the size information of the sample object cannot be too large, and the rotation angle of the sample object cannot be too large or too small, therefore, in the embodiment of the present application, the size information transformation range of the sample object may be set to [0.9,1.1], but the range is related to the size information of the sample object itself and is not fixed; the rotation angle range of the sample object may be set to [ -pi/2, pi/2 ].
S312, labeling each sample target in the standard format point cloud data set to obtain a labeled standard format point cloud data set; the marked point cloud data set in the standard format comprises standard marking data of each sample target; the standard annotation data comprises a standard detection box.
Specifically, the computer device may label each sample target in each frame of point cloud data in the standard format point cloud data set to obtain standard label data of each sample target in each frame of point cloud data.
Each frame of point cloud data can be a set of a group of vectors in a three-dimensional coordinate system, so that a three-dimensional rectangular frame can be marked on the point cloud data of each sample target in each frame of point cloud data, then standard marking data of each sample target is calculated according to the three-dimensional rectangular frame marked on the point cloud data of each sample target, and then the standard marking data of all sample targets corresponding to each frame of point cloud data are combined to obtain a marked standard format point cloud data set. Optionally, the solid rectangular frame may be a standard detection frame for labeling. Optionally, the standard labeling data of each sample target may include position information, size information, a type of the corresponding sample target, and a heading angle between the sample target and the lidar, of a center point of the standard detection frame where each sample target is located. The position information is determined by a three-dimensional coordinate system where the laser radar is located.
And S313, determining the marked standard format point cloud data set as a sample point cloud data set.
Specifically, the computer device may perform data conversion processing on the labeled standard format point cloud data set again to obtain a sample point cloud data set. However, in the embodiment of the present application, the obtained marked point cloud data set in the standard format is directly determined as a sample point cloud data set.
And S320, dividing the sample point cloud data set to obtain a point cloud data training set.
It can be understood that the computer device may perform partition processing on the obtained sample point cloud data set according to a preset proportion to obtain a point cloud data verification set and a point cloud data training set. In the network model training process, the data amount of the verification set is usually smaller than that of the training set, so if the preset ratio is a: b, A is smaller than B. Optionally, the preset ratio may be a ratio of a user-defined data amount in the verification set and the training set when the network model is trained.
Optionally, the point cloud data verification set and the point cloud data training set acquired by the computer device may be directly used to train and verify the initial point cloud target detection model. Or, the point cloud data verification set and the point cloud data training set acquired by the computer device may be stored to a local location, a hard disk, a cloud and the like, and may be acquired from the local location, the hard disk, the cloud and the like when the initial point cloud target detection model needs to be trained and verified. Alternatively, the training set storage file and the verification set storage file may be created at a local location, a hard disk, a cloud, or the like. Alternatively, the format of the exercise set storage file and the validation set storage file may be in pkl format. The point cloud data training set can be stored through a training set storage file (train. Pkl), and the point cloud data verification set can be stored through a verification set storage file (val. Pkl).
Further, in the network model training process, in order to obtain an optimal point cloud target detection model, the point cloud target detection model needs to be verified through the point cloud data verification set, and a process of verifying the point cloud target detection model through the point cloud data verification set will be described below. In an embodiment, after the step of S500, the target detecting method may further include: inputting the point cloud data verification set into a point cloud target detection model to obtain a test target detection result; and if the similarity between the detection result of the test target and the standard detection result of the sample target is greater than a preset value, determining that the point cloud target detection model passes verification.
It should be noted that the computer device may use similarity calculation methods such as euclidean distance, pearson correlation coefficient, cosine similarity, and/or generalized Jaccard similarity to calculate the similarity between the test target detection result and the standard detection result of the sample target, and when it is determined that the similarity is greater than the preset value, it may be determined that the point cloud target detection model passes verification, and the current point cloud target detection model is determined as the optimal point cloud target detection model obtained in the training process.
The target detection method provided by the embodiment of the application can be used for carrying out a series of processing on original point cloud data acquired by a laser radar so as to obtain a point cloud data set of a format matched with that required by a point cloud target detection model, so that the point cloud target detection model can be ensured to be trained sequentially, and the problem of point cloud target detection model training failure is avoided.
In order to improve the accuracy of detecting all sample targets in each frame of point cloud data, each frame of point cloud data can be divided into multiple groups of small data, and then the network model training process is completed through each group of small data. In an embodiment, as shown in fig. 7, the step of obtaining the sample target loss information of the initial point cloud target detection model according to the point cloud data training set in S400 may be implemented by the following steps:
s410, performing voxel grid division processing on the point cloud data training set according to a preset voxel grid specification to obtain a plurality of voxel grid point cloud data; each voxel grid point cloud data comprises coordinate information of each point cloud in the voxel grid.
Specifically, the preset voxel grid specification may be a size of a voxel grid set by a user, that is, a length, a width, and a height of the voxel grid. The computer equipment can perform voxel grid division processing on each frame of point cloud data in the point cloud data training set according to a preset voxel grid specification to obtain a plurality of voxel grid point cloud data.
Optionally, the size of each voxel grid corresponding to each frame of point cloud data may be equal or unequal. If the size of each voxel grid corresponding to each frame of point cloud data is equal, the preset voxel grid specification can be one; if the sizes of the voxel grids corresponding to the frames of point cloud data are not equal, the preset voxel grid specifications can be multiple.
And S420, obtaining sample target loss information of the initial point cloud target detection model according to the point cloud data of each voxel grid.
In this embodiment, the computer device may input each voxel grid point cloud data corresponding to each frame of point cloud data to the initial point cloud target detection model to perform convolution operation, down-sampling, up-sampling, arithmetic operation, and/or the like, so as to obtain sample target loss information of the initial point cloud target detection model.
According to the target detection method provided by the embodiment of the application, the point cloud data training set of the large sample can be subjected to voxel grid division processing to obtain the multiple voxel grid point cloud data of the small sample, and then the grouped point cloud data is processed to obtain the sample target loss information of the initial point cloud target detection model, so that the accuracy of all sample target detection in each frame of point cloud data can be improved.
The following describes how to obtain sample target loss information of the initial point cloud target detection model from each voxel grid point cloud data. In an embodiment, as shown in fig. 8, the initial point cloud target detection model includes an initial sparse network model and an initial detection network model; the step in S420 may be implemented by the following steps:
and S421, inputting the point cloud data of each voxel grid point into the initial sparse network model to obtain a two-dimensional characteristic map corresponding to the point cloud data of each voxel grid point.
In the embodiment of the application, the initial point cloud target detection model is composed of an initial sparse network model and an initial detection network model. Optionally, the initial sparse network model may be a sparse convolutional neural network model; the initial detection network model may be a convolutional neural network model. Alternatively, the sparse convolutional neural network model and the convolutional neural network model may be both composed of convolutional layers, pooling layers and/or fully-connected layers, etc., but the sparse convolutional neural network model and the convolutional neural network model have different composition structures.
It should be noted that, the computer device may input each voxel grid point cloud data corresponding to each frame of point cloud data into the initial sparse network model to obtain the two-dimensional feature map corresponding to each voxel grid point cloud data.
However, in the embodiment of the present application, the computer device may first perform feature extraction on each voxel grid point cloud data corresponding to each frame of point cloud data to obtain a voxel feature corresponding to each voxel grid point cloud data, then input the voxel feature corresponding to each voxel grid point cloud data into the initial sparse network model, perform sparse convolution operation on the voxel feature and a preset convolution kernel to obtain a sparse convolution result, and then sequentially perform downsampling processing and feature compression processing on the sparse convolution result to obtain a two-dimensional feature map corresponding to each voxel grid point cloud data.
Optionally, the voxel characteristics may include coordinates of a center point of the voxel grid and a square value of a distance between the center point of the voxel grid and the lidar. Wherein, the characteristic compression processing realizes the function of reducing dimension. Optionally, the size of the convolution kernel may be set according to actual requirements.
S422, inputting the two-dimensional characteristic graph into an initial detection network model to obtain a sample target tensor of the initial point cloud target detection model.
Further, the computer device can input the two-dimensional feature map corresponding to the acquired point cloud data of each voxel grid point into the initial detection network model to obtain a sample target tensor of the initial point cloud target detection model. Optionally, the initial detection network model may perform convolution, down-sampling, up-sampling, and/or pooling on the two-dimensional feature map.
It should be noted that the sample target tensor may include spatial position information, size information, and heading angle between the sample target and the laser radar, etc. of the sample target.
And S423, obtaining sample target loss information of the initial point cloud target detection model according to the sample target tensor.
Specifically, the computer device may perform arithmetic operation, analysis processing, and/or comparison processing on the sample target tensor, and the like, and the sample target loss information of the point cloud target detection model is initialized.
The target detection method provided by the embodiment of the application can acquire the sample target loss information in the network model training process, so that the network parameters of the network model can be accurately adjusted through the sample target loss information, the network model can be rapidly converged, and the training time of the network model is shortened.
In the network model training process, in order to train the point cloud target detection model directly through the feature information of the detection frame of the sample target which has better fitness with the feature information of the target, the network parameters of the network model can be updated through the middle-horizontal direction loss information and the vertical direction loss information so as to improve the accuracy of network parameter adjustment, and the process of how to obtain the sample target loss information of the initial point cloud target detection model according to the sample target tensor will be described below. In one embodiment, the sample target tensor includes a sample target category tensor, a sample target regression tensor, and a sample target direction tensor; the sample target loss information comprises sample target category loss information, sample target direction loss information and sample target regression loss information; as shown in fig. 9, the step in S423 may be implemented by the following process:
s4231, generating sample target category loss information according to the sample target category tensor; and generating sample target direction loss information according to the sample target direction tensor.
Specifically, the computer device may perform prediction loss value calculation on the type information in the attribute information of the standard detection frame of each sample target in the sample target class tensor and the point cloud data training set by using the first loss function, so as to obtain sample target class loss information. Meanwhile, the computer equipment can also adopt a second loss function to calculate the predicted loss value of the heading angle information in the attribute information of the standard detection frame of each sample target in the sample target direction tensor and the point cloud data training set so as to obtain the sample target direction loss information.
In the embodiment of the present application, the first loss function and the second loss function are both cross entropy loss functions.
S4232, decoding the sample target regression tensor to obtain sample target horizontal loss information and sample target vertical loss information, and generating sample target regression loss information according to the sample target horizontal loss information and the sample target vertical loss information.
Specifically, the sample target regression tensor may include the central point prediction coordinates corresponding to the prediction detection frame of the sample target and the prediction size information corresponding to the prediction detection frame.
The computer equipment can decode the sample target regression tensor to obtain a decoding result, then performs first arithmetic operation on the decoding result to obtain sample target horizontal loss information and sample target vertical loss information, and then performs second arithmetic operation on the sample target horizontal loss information and the sample target vertical loss information to obtain sample target regression loss information.
Alternatively, the decoding process may be understood as a process of performing a third arithmetic operation on the predicted size information and the predicted position information of the predicted detection frame of the sample object, and the standard size information and the standard position information of the standard detection frame of the sample object, to obtain the actual size information and the actual position information of the detection frame of the sample object.
In this embodiment, the computer device may perform weighted summation on the sample target horizontal loss information and the sample target vertical loss information to obtain sample target regression loss information. The weight coefficients corresponding to the sample target horizontal loss information and the sample target vertical loss information may be determined according to an actual request, and may be the same or different.
In the target detection method provided by the embodiment of the application, in the network model training process, the sample target information can be decomposed into the horizontal information and the vertical information, different loss functions are respectively adopted to accurately calculate the sample target horizontal loss information corresponding to the horizontal information and the sample target vertical loss information corresponding to the vertical information according to actual requirements, and the sample target horizontal loss information and the sample target vertical loss information are further subjected to fusion processing to obtain the overall loss information of the network model, so that the accuracy of the obtained predicted loss value of the network model is higher, the network parameters of the network model are updated through the loss information, the convergence speed of the network model is higher, and the required iteration times are fewer.
The following describes how to decode the sample target regression tensor to obtain the sample target horizontal loss information and the sample target vertical loss information. In an embodiment, the step in S4232 may be implemented by: decoding the sample target regression tensor to obtain a sample target horizontal tensor and a sample target vertical tensor; acquiring vertical loss information of the sample target through a preset loss function and a vertical tensor of the sample target; and acquiring the loss information of the sample target level according to the sample target level tensor and the sample target standard level tensor.
In this embodiment of the present application, the decoding process is a process of performing a fourth arithmetic operation on the predicted central coordinate corresponding to the predicted detection frame of the sample target and the central coordinate information of the standard detection frame of the sample target to obtain a real central coordinate (x, y, z) of the predicted detection frame, performing a fifth arithmetic operation on the predicted size information corresponding to the predicted detection frame of the sample target and the size information of the standard detection frame of the sample target to obtain real size information (i.e., a real length w, a real width l, and a real height h) of the predicted detection frame, and then splitting the real central coordinate of the predicted detection frame and the real size information of the predicted detection frame according to horizontal information and vertical information, respectively, to obtain a horizontal tensor of the sample target and a vertical tensor of the sample target. Wherein the fourth arithmetic operation and the fifth arithmetic operation are different.
Alternatively, the sample target level tensor may include two-dimensional plane information of a prediction detection box of the sample target. Optionally, the sample target vertical tensor may include depth information of a prediction detection box of the sample target. Optionally, the two-dimensional plane information may include information related to an x-axis and information related to a y-axis in a three-dimensional coordinate system xyz; the depth information may include information about a z-axis in a three-dimensional coordinate system xyz.
Further, the computer device can calculate the predicted loss value of the sample target vertical tensor and the sample target vertical tensor by presetting the loss function, and obtain the sample target vertical loss information. In the embodiment of the present application, the preset loss function is a Smooth L1 loss function.
Meanwhile, the computer equipment can also perform arithmetic operation on the sample target level tensor and the sample target standard level tensor to obtain the sample target level loss information. Alternatively, the sample target standard level tensor may include two-dimensional plane information of a standard detection box of the sample target, and a length and a width of the standard detection box.
In the network model training process, the target detection method provided by the embodiment of the application can decompose the sample target information into horizontal information and vertical information, respectively calculate the sample target horizontal loss information corresponding to the horizontal information and the sample target vertical loss information corresponding to the vertical information according to actual requirements, and further fuse the sample target horizontal loss information and the sample target vertical loss information to obtain the overall loss information of the network model, so that the accuracy of the obtained predicted loss value of the network model is higher, and the network parameters of the network model are updated through the loss information, so that the convergence rate of the network model is higher, and the required number of iterations is less.
In an embodiment, as shown in fig. 10, the step of obtaining the sample target level loss information according to the sample target level tensor and the sample target standard level tensor can be implemented by the following processes:
and S4232a, fitting the sample target level tensor to obtain the distribution data of the prediction detection frame, and fitting the sample target standard level tensor to obtain the distribution data of the standard detection frame.
In this embodiment, the sample target level tensor includes two-dimensional plane real information (x, y) of a prediction detection frame of the sample target, a real length w and a real width l of the prediction detection frame, that is, x, y, w and l; the sample target vertical tensor includes vertical direction real information z of a prediction detection frame of the sample target and real height h of the prediction detection frame.
It should be noted that, the computer device may perform fitting processing on the two-dimensional plane real information (x, y) of the prediction detection frame of the sample target, the real length w and the real width l of the prediction detection frame by using a fitting algorithm, so as to obtain the distribution data of the prediction detection frame. Meanwhile, the computer equipment can also adopt a fitting algorithm to fit the vertical true information z of the prediction detection frame of the sample target and the true height h of the prediction detection frame to obtain the distribution data of the standard detection frame.
Alternatively, the fitting algorithm may be a least squares fitting method, a random sampling fitting method, and/or a closest point iteration fitting method, etc. However, in the embodiment of the present application, the fitting algorithm is a gaussian distribution fitting method.
And S4232b, determining the target level loss information of the sample according to the similarity between the distribution data of the prediction detection frame and the distribution data of the standard detection frame.
Specifically, the computer device may calculate the similarity between the predicted detection box distribution data and the standard detection box distribution data by using a similarity calculation method such as euclidean distance, pearson correlation coefficient, cosine similarity, and/or generalized Jaccard similarity, to obtain the sample target level loss information. In the embodiment of the present application, the similarity calculation method is a wotherstein distance.
In the network model training process, network parameters in the initial point cloud target detection model can be updated through sample target horizontal loss information and sample target vertical loss information.
The target detection method provided by the embodiment of the application can be used for fitting the sample target level tensor to obtain the distribution data of the prediction detection frame, fitting the sample target standard level tensor to obtain the distribution data of the standard detection frame, then determining the horizontal loss information of the sample target according to the similarity between the distribution data of the prediction detection frame and the distribution data of the standard detection frame, and further comprehensively considering the loss information of the sample target in the horizontal direction and the loss information of the sample target in the vertical direction to update the network parameters of the network model, so that the detection result of the network model can be more accurate.
In order to facilitate understanding of those skilled in the art, the object detection method provided in the present application is described by taking an execution subject as a computer device as an example, and specifically, the method includes:
(1) Eliminating invalid point cloud data in the original point cloud data; the invalid point cloud data represents the original point cloud data where the sample object is not present.
(2) Carrying out format conversion processing on the original point cloud data to obtain a standard format point cloud data set; the standard format point cloud data set is a point cloud data set with a format matched with a format required by the training point cloud target detection model.
(3) Labeling each sample target in the standard format point cloud data set to obtain a labeled standard format point cloud data set; the marked point cloud data set in the standard format comprises standard marking data of each sample target; the standard annotation data comprises a standard detection box.
(4) And determining the marked standard format point cloud data set as a sample point cloud data set.
(5) Dividing the sample point cloud data set to obtain a point cloud data training set; the point cloud data training set includes a plurality of sample targets.
(6) Performing voxel grid division processing on the point cloud data training set according to a preset voxel grid specification to obtain a plurality of voxel grid point cloud data; the point cloud data of each voxel grid comprises coordinate information of each point cloud in the voxel grid.
(7) And inputting the point cloud data of each voxel grid into an initial sparse network model in an initial point cloud target detection model to obtain a two-dimensional characteristic map corresponding to the point cloud data of each voxel grid.
(8) And inputting the two-dimensional characteristic graph into an initial detection network model in the initial point cloud target detection model to obtain a sample target tensor of the initial point cloud target detection model.
(9) Generating sample target category loss information in the sample target loss information according to a sample target category tensor in the sample target tensor; and generating sample target direction loss information in the sample target loss information according to the sample target direction tensor in the sample target tensor.
(10) And decoding the sample target regression tensor in the sample target tensor to obtain a sample target horizontal tensor and a sample target vertical tensor.
(11) And obtaining the sample target vertical loss information in the sample target loss information through a preset loss function and the sample target vertical tensor.
(12) And fitting the sample target level tensor to obtain distribution data of the prediction detection frame, and fitting the sample target standard level tensor to obtain distribution data of the standard detection frame.
(13) And determining sample target level loss information in the sample target loss information according to the similarity between the predicted detection frame distribution data and the standard detection frame distribution data.
(14) And generating sample target regression loss information in the sample target loss information according to the sample target horizontal loss information and the sample target vertical loss information.
(15) And training the initial point cloud target detection model through the sample target loss information until the initial point cloud target detection model is trained, so as to obtain the point cloud target detection model.
(16) And inputting the point cloud data verification set into a point cloud target detection model to obtain a test target detection result.
(17) And if the similarity between the detection result of the test target and the standard detection result of the sample target is greater than the preset value, determining that the point cloud target detection model passes verification.
(18) And acquiring current frame point cloud data of the target scene.
(19) Inputting the current frame point cloud data into a point cloud target detection model, and detecting a target in the current frame point cloud data to obtain a target detection result in the current frame point cloud data.
For the implementation processes in (1) to (19), reference may be specifically made to the description of the above embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.
It should be understood that although the various steps in the flowcharts of fig. 2 and 4-10 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not limited to being performed in the exact order illustrated and, unless explicitly stated herein, may be performed in other orders. Moreover, at least some of the steps in fig. 2 and 4-10 may include multiple steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed in turn or alternately with other steps or at least some of the other steps.
In one embodiment, as shown in fig. 11, there is provided an object detection apparatus including: a point cloud data acquisition module 11 and a target detection module 12, wherein:
the point cloud data acquisition module 11 is used for acquiring current frame point cloud data of a target scene;
the target detection module 12 is configured to input the current frame point cloud data into the point cloud target detection model, and detect a target in the current frame point cloud data to obtain a target detection result in the current frame point cloud data;
the point cloud target detection model is obtained by training a prediction detection frame of each sample target and a standard detection frame of each sample target in a point cloud data training set.
The target detection apparatus provided in this embodiment may implement the method embodiments described above, and the implementation principle and the technical effect are similar, which are not described herein again.
In one embodiment, the object detection apparatus further includes: training set determining module, loss information obtaining module and model training module, wherein:
the training set determining module is used for determining a point cloud data training set according to the original point cloud data under various target scenes; the point cloud data training set comprises a plurality of sample targets;
the loss information acquisition module is used for acquiring sample target loss information of the initial point cloud target detection model according to the point cloud data training set;
and the model training module is used for training the initial point cloud target detection model through sample target loss information until the initial point cloud target detection model is trained, so as to obtain the point cloud target detection model.
The target detection apparatus provided in this embodiment may implement the method embodiments described above, and the implementation principle and the technical effect are similar, which are not described herein again.
In one embodiment, the training set determination module comprises: a preprocessing unit and a data set partitioning unit, wherein:
the preprocessing unit is used for preprocessing the original point cloud data to obtain a sample point cloud data set;
and the data set dividing unit is used for dividing the sample point cloud data set to obtain a point cloud data training set.
The target detection apparatus provided in this embodiment may implement the method embodiments described above, and the implementation principle and the technical effect are similar, which are not described herein again.
In one embodiment, the pre-processing unit comprises: a format conversion processing subunit, a labeling subunit, and a data set determination subunit, wherein:
the format conversion processing subunit is used for carrying out format conversion processing on the original point cloud data to obtain a standard format point cloud data set; the standard format point cloud data set is a point cloud data set with a format matched with a format required by the training point cloud target detection model;
the labeling subunit is used for labeling each sample target in the standard format point cloud data set to obtain a labeled standard format point cloud data set; the marked point cloud data set with the standard format comprises standard marking data of each sample target; the standard marking data comprises a standard detection frame;
and the data set determining subunit is used for determining the marked standard format point cloud data set as a sample point cloud data set.
The target detection apparatus provided in this embodiment may implement the method embodiments, and the implementation principle and technical effects are similar, which are not described herein again.
In one embodiment, the object detection apparatus further includes: a data culling module, wherein:
the data removing module is used for removing invalid point cloud data in the original point cloud data; the invalid point cloud data represents the original point cloud data without the sample object.
The target detection apparatus provided in this embodiment may implement the method embodiments described above, and the implementation principle and the technical effect are similar, which are not described herein again.
In one embodiment, the object detection apparatus further includes: a verification module and a verification result determination module, wherein:
the verification module is used for inputting the point cloud data verification set into the point cloud target detection model to obtain a test target detection result;
and the verification result determining module is used for determining that the point cloud target detection model passes verification when the similarity between the detection result of the test target and the standard detection result of the sample target is greater than a preset value.
The target detection apparatus provided in this embodiment may implement the method embodiments, and the implementation principle and technical effects are similar, which are not described herein again.
In one embodiment, the loss information acquiring module includes: a voxel division unit and a loss information acquisition unit, wherein:
the voxel division unit is used for carrying out voxel grid division processing on the point cloud data training set according to a preset voxel grid specification to obtain a plurality of voxel grid point cloud data; each voxel grid point cloud data comprises coordinate information of each point cloud in the voxel grid;
and the loss information acquisition unit is used for acquiring sample target loss information of the initial point cloud target detection model according to the point cloud data of each voxel grid.
The target detection apparatus provided in this embodiment may implement the method embodiments described above, and the implementation principle and the technical effect are similar, which are not described herein again.
In one embodiment, the initial point cloud target detection model comprises an initial sparse network model and an initial detection network model; the loss information acquisition unit includes: sparse network model processing subunit, detection network model processing subunit and loss information acquisition subunit, wherein:
the sparse network model processing subunit is used for inputting the point cloud data of each voxel grid into the initial sparse network model to obtain a two-dimensional characteristic map corresponding to the point cloud data of each voxel grid;
the detection network model processing subunit is used for inputting the two-dimensional characteristic map into the initial detection network model to obtain a sample target tensor of the initial point cloud target detection model;
and the loss information acquisition subunit is used for acquiring sample target loss information of the initial point cloud target detection model according to the sample target tensor.
The target detection apparatus provided in this embodiment may implement the method embodiments described above, and the implementation principle and the technical effect are similar, which are not described herein again.
In one embodiment, the sample target tensor comprises a sample target class tensor, a sample target regression tensor, and a sample target direction tensor; the sample target loss information comprises sample target category loss information, sample target direction loss information and sample target regression loss information; the loss information acquisition subunit includes: a first loss information generating subunit and a second loss information generating subunit, wherein:
the first loss information generation subunit is used for generating sample target category loss information according to the sample target category tensor; generating sample target direction loss information according to the sample target direction tensor; and the number of the first and second groups,
a second loss information generation subunit, configured to decode the sample target regression tensor to obtain sample target horizontal loss information and sample target vertical loss information, and generate sample target regression loss information according to the sample target horizontal loss information and the sample target vertical loss information
The target detection apparatus provided in this embodiment may implement the method embodiments described above, and the implementation principle and the technical effect are similar, which are not described herein again.
In one embodiment, the second loss information generating subunit is specifically configured to perform fitting processing on the sample target level tensor to obtain predicted detection frame distribution data, perform fitting processing on the sample target standard level tensor to obtain standard detection frame distribution data, and determine sample target level loss information according to a similarity between the predicted detection frame distribution data and the standard detection frame distribution data.
The target detection apparatus provided in this embodiment may implement the method embodiments described above, and the implementation principle and the technical effect are similar, which are not described herein again.
For specific limitations of the target detection device, reference may be made to the above limitations of the target detection method, which are not described herein again. The modules in the target detection device may be wholly or partially implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent of a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, the internal structure of which may be as shown in FIG. 1. The computer device includes a processor, a memory, and a network interface connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The database of the computer equipment is used for storing all the acquired frame original point cloud data and the current frame point cloud data. The network interface of the computer device is used for communicating with an external endpoint through a network connection. The computer program is executed by a processor to implement a method of object detection.
Those skilled in the art will appreciate that the architecture shown in fig. 1 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.
In one embodiment, a computer device is provided, comprising a memory and a processor, the memory having a computer program stored therein, the processor implementing the following steps when executing the computer program:
acquiring current frame point cloud data of a target scene;
inputting the current frame point cloud data into a point cloud target detection model, and detecting a target in the current frame point cloud data to obtain a target detection result in the current frame point cloud data;
the point cloud target detection model is obtained by training a prediction detection frame of each sample target and a standard detection frame of each sample target in a point cloud data training set.
In one embodiment, a computer-readable storage medium is provided, having a computer program stored thereon, which when executed by a processor, performs the steps of:
acquiring current frame point cloud data of a target scene;
inputting the current frame point cloud data into a point cloud target detection model, and detecting a target in the current frame point cloud data to obtain a target detection result in the current frame point cloud data;
the point cloud target detection model is obtained by training a prediction detection frame of each sample target and a standard detection frame of each sample target in a point cloud data training set.
In one embodiment, a computer program product is provided, comprising a computer program which when executed by a processor performs the steps of:
acquiring current frame point cloud data of a target scene;
inputting the current frame point cloud data into a point cloud target detection model, and detecting a target in the current frame point cloud data to obtain a target detection result in the current frame point cloud data;
the point cloud target detection model is obtained by training a prediction detection frame of each sample target and a standard detection frame of each sample target in a point cloud data training set.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, may include the processes of the embodiments of the methods described above. Any reference to memory, storage, database or other medium used in the embodiments provided herein can include at least one of non-volatile and volatile memory. Non-volatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical storage, or the like. Volatile Memory can include Random Access Memory (RAM) or external cache Memory. By way of illustration and not limitation, RAM can take many forms, such as Static Random Access Memory (SRAM) or Dynamic Random Access Memory (DRAM), for example.
The technical features of the above embodiments can be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the above embodiments are not described, but should be considered as the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent application shall be subject to the appended claims.

Claims (14)

1. A method of object detection, the method comprising:
acquiring current frame point cloud data of a target scene;
inputting the current frame point cloud data into a point cloud target detection model, and detecting a target in the current frame point cloud data to obtain a target detection result in the current frame point cloud data;
the point cloud target detection model is obtained by training a prediction detection frame of each sample target in a point cloud data training set and a standard detection frame of each sample target.
2. The method of claim 1, wherein the point cloud target detection model is constructed by:
determining a point cloud data training set according to original point cloud data under various target scenes; the point cloud data training set comprises a plurality of sample targets;
acquiring sample target loss information of an initial point cloud target detection model according to the point cloud data training set;
and training the initial point cloud target detection model through the sample target loss information until the initial point cloud target detection model is trained, so as to obtain the point cloud target detection model.
3. The method of claim 2, wherein determining the training set of point cloud data from the raw point cloud data in the plurality of target scenes comprises:
preprocessing the original point cloud data to obtain a sample point cloud data set;
and dividing the sample point cloud data set to obtain the point cloud data training set.
4. The method of claim 3, wherein the pre-processing the raw point cloud data to obtain a sample point cloud data set comprises:
carrying out format conversion processing on the original point cloud data to obtain a standard format point cloud data set; the standard format point cloud data set is a point cloud data set with a format matched with a format required by training the point cloud target detection model;
labeling each sample target in the standard format point cloud data set to obtain a labeled standard format point cloud data set; the marked point cloud data set in the standard format comprises standard marking data of each sample target; the standard marking data comprises a standard detection frame;
and determining the marked standard format point cloud data set as the sample point cloud data set.
5. The method of claim 3, wherein prior to said format converting the raw point cloud data, the method further comprises:
removing invalid point cloud data in the original point cloud data; the invalid point cloud data represents raw point cloud data without a sample object.
6. The method of claim 3, further comprising:
inputting the point cloud data verification set into the point cloud target detection model to obtain a detection result of a test target;
and if the similarity between the test target detection result and the standard detection result of the sample target is greater than a preset value, determining that the point cloud target detection model passes verification.
7. The method of any one of claims 2-6, wherein obtaining sample target loss information for an initial point cloud target detection model from the training set of point cloud data comprises:
performing voxel grid division processing on the point cloud data training set according to a preset voxel grid specification to obtain a plurality of voxel grid point cloud data; each voxel grid point cloud data comprises coordinate information of each point cloud in the voxel grid;
and obtaining sample target loss information of an initial point cloud target detection model according to the point cloud data of each voxel grid.
8. The method of claim 7, wherein the initial point cloud target detection model comprises an initial sparse network model and an initial detection network model;
the method for obtaining the sample target loss information of the initial point cloud target detection model according to the point cloud data of each voxel grid comprises the following steps:
inputting the point cloud data of each voxel grid into the initial sparse network model to obtain a two-dimensional characteristic map corresponding to the point cloud data of each voxel grid;
inputting the two-dimensional characteristic diagram into the initial detection network model to obtain a sample target tensor of the initial point cloud target detection model;
and obtaining sample target loss information of the initial point cloud target detection model according to the sample target tensor.
9. The method of claim 8, wherein the sample target tensor comprises a sample target class tensor, a sample target regression tensor, and a sample target direction tensor; the sample target loss information comprises sample target category loss information, sample target direction loss information and sample target regression loss information;
the obtaining of the sample target loss information of the initial point cloud target detection model according to the sample target tensor comprises:
generating the sample target category loss information according to the sample target category tensor; generating the sample target direction loss information according to the sample target direction tensor; and the number of the first and second groups,
and decoding the sample target regression tensor to obtain sample target horizontal loss information and sample target vertical loss information, and generating the sample target regression loss information according to the sample target horizontal loss information and the sample target vertical loss information.
10. The method of claim 9, wherein the decoding the sample target regression tensor to obtain sample target horizontal loss information and sample target vertical loss information comprises:
decoding the sample target regression tensor to obtain a sample target horizontal tensor and a sample target vertical tensor;
obtaining the vertical loss information of the sample target through a preset loss function and the vertical tensor of the sample target; and acquiring the sample target level loss information according to the sample target level tensor and the sample target standard level tensor.
11. The method of claim 10, wherein obtaining the sample target level loss information from the sample target level tensor and the sample target normalized level tensor comprises:
fitting the sample target level tensor to obtain distribution data of a prediction detection frame, and fitting the sample target standard level tensor to obtain distribution data of a standard detection frame;
and determining the sample target level loss information according to the similarity between the predicted detection frame distribution data and the standard detection frame distribution data.
12. An object detection apparatus, characterized in that the apparatus comprises:
the point cloud data acquisition module is used for acquiring current frame point cloud data of a target scene;
the target detection module is used for inputting the current frame point cloud data into a point cloud target detection model, detecting a target in the current frame point cloud data and obtaining a target detection result in the current frame point cloud data;
the point cloud target detection model is obtained by training a prediction detection frame of each sample target in a point cloud data training set and a standard detection frame of each sample target.
13. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor, when executing the computer program, implements the steps of the object detection method of any of claims 1-11.
14. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the object detection method of any one of claims 1 to 11.
CN202211206016.9A 2022-09-30 2022-09-30 Target detection method and device, computer equipment and storage medium Pending CN115457492A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211206016.9A CN115457492A (en) 2022-09-30 2022-09-30 Target detection method and device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211206016.9A CN115457492A (en) 2022-09-30 2022-09-30 Target detection method and device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN115457492A true CN115457492A (en) 2022-12-09

Family

ID=84308124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211206016.9A Pending CN115457492A (en) 2022-09-30 2022-09-30 Target detection method and device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN115457492A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115984583A (en) * 2022-12-30 2023-04-18 广州沃芽科技有限公司 Data processing method, apparatus, computer device, storage medium and program product
CN116413740A (en) * 2023-06-09 2023-07-11 广汽埃安新能源汽车股份有限公司 Laser radar point cloud ground detection method and device
CN116778458A (en) * 2023-08-23 2023-09-19 苏州魔视智能科技有限公司 Parking space detection model construction method, parking space detection method, equipment and storage medium

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115984583A (en) * 2022-12-30 2023-04-18 广州沃芽科技有限公司 Data processing method, apparatus, computer device, storage medium and program product
CN115984583B (en) * 2022-12-30 2024-02-02 广州沃芽科技有限公司 Data processing method, apparatus, computer device, storage medium, and program product
CN116413740A (en) * 2023-06-09 2023-07-11 广汽埃安新能源汽车股份有限公司 Laser radar point cloud ground detection method and device
CN116413740B (en) * 2023-06-09 2023-09-05 广汽埃安新能源汽车股份有限公司 Laser radar point cloud ground detection method and device
CN116778458A (en) * 2023-08-23 2023-09-19 苏州魔视智能科技有限公司 Parking space detection model construction method, parking space detection method, equipment and storage medium
CN116778458B (en) * 2023-08-23 2023-12-08 苏州魔视智能科技有限公司 Parking space detection model construction method, parking space detection method, equipment and storage medium

Similar Documents

Publication Publication Date Title
JP6745328B2 (en) Method and apparatus for recovering point cloud data
Guo et al. Data‐driven flood emulation: Speeding up urban flood predictions by deep convolutional neural networks
CN111523414B (en) Face recognition method, device, computer equipment and storage medium
CN111161349B (en) Object posture estimation method, device and equipment
CN107784288B (en) Iterative positioning type face detection method based on deep neural network
CN115457492A (en) Target detection method and device, computer equipment and storage medium
CN112862874B (en) Point cloud data matching method and device, electronic equipment and computer storage medium
CN115601511B (en) Three-dimensional reconstruction method and device, computer equipment and computer readable storage medium
CN112328715A (en) Visual positioning method, training method of related model, related device and equipment
CN112884820A (en) Method, device and equipment for training initial image registration and neural network
CN111179270A (en) Image co-segmentation method and device based on attention mechanism
CN113326851A (en) Image feature extraction method and device, electronic equipment and storage medium
CN113537180A (en) Tree obstacle identification method and device, computer equipment and storage medium
CN114663598A (en) Three-dimensional modeling method, device and storage medium
Wang et al. Instance segmentation of point cloud captured by RGB-D sensor based on deep learning
CN116051699B (en) Dynamic capture data processing method, device, equipment and storage medium
CN117132649A (en) Ship video positioning method and device for artificial intelligent Beidou satellite navigation fusion
Barba-Guaman et al. Object detection in rural roads using Tensorflow API
CN113704276A (en) Map updating method and device, electronic equipment and computer readable storage medium
CN111914809A (en) Target object positioning method, image processing method, device and computer equipment
CN116246119A (en) 3D target detection method, electronic device and storage medium
CN116206302A (en) Three-dimensional object detection method, three-dimensional object detection device, computer equipment and storage medium
CN114863201A (en) Training method and device of three-dimensional detection model, computer equipment and storage medium
CN113610856A (en) Method and device for training image segmentation model and image segmentation
CN113487374A (en) Block E-commerce platform transaction system based on 5G network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination