CN115761723A - 3D target detection method and device based on multi-sensor fusion - Google Patents

3D target detection method and device based on multi-sensor fusion Download PDF

Info

Publication number
CN115761723A
CN115761723A CN202211427433.6A CN202211427433A CN115761723A CN 115761723 A CN115761723 A CN 115761723A CN 202211427433 A CN202211427433 A CN 202211427433A CN 115761723 A CN115761723 A CN 115761723A
Authority
CN
China
Prior art keywords
point cloud
target
target detection
training
fusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211427433.6A
Other languages
Chinese (zh)
Inventor
陆强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Network Technology Shanghai Co Ltd
Original Assignee
International Network Technology Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Network Technology Shanghai Co Ltd filed Critical International Network Technology Shanghai Co Ltd
Priority to CN202211427433.6A priority Critical patent/CN115761723A/en
Publication of CN115761723A publication Critical patent/CN115761723A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)

Abstract

The invention provides a 3D target detection method and a device based on multi-sensor fusion, wherein the method comprises the following steps: acquiring target images of at least two continuous frames and point cloud data corresponding to the target image time sequence; inputting the acquired point cloud data and the target image into a 3D target detection model to obtain a target detection result output by the 3D target detection model; the 3D target detection model carries out time sequence fusion and modal fusion based on the characteristics respectively extracted from the point cloud data and the target image, and then carries out 3D target detection to obtain a target detection result; the 3D target detection model is obtained based on training images, image characteristic true values corresponding to the training images, point cloud training data, point cloud true values corresponding to the point cloud training data and target true values. According to the invention, the target to be detected is detected through the 3D target detection model, and the accuracy of 3D detection is improved by combining the time sequence fusion of each mode, especially the effect of motion information such as target detection rate and speed.

Description

3D target detection method and device based on multi-sensor fusion
Technical Field
The invention relates to the technical field of image processing, in particular to a 3D target detection method and device based on multi-sensor fusion.
Background
In an automatic driving system, 3D Object Detection (singular 3D Object Detection) is a very important task in a perception module, and compared with planar Object Detection, it can directly provide the real position, shape size, orientation and category of an Object in the surrounding environment, and can directly provide a theoretical basis for decision planning of rear-end prediction, planning, motion control and the like.
There can be a relatively mature network design to capture the high-level features that describe the class of objects. However, the target detection result by means of the plane image cannot directly provide useful decision bases for the automatic driving task, such as information of the precise position, shape and size of the missing target. In an autopilot system, the point cloud data of another sensor lidar provides real spatial information of the surrounding 3D space, such as distance depth, surface shape, etc. However, the sensing characteristics of the laser radar determine the disorder and sparsity of the point cloud data, and the unstructured data are difficult to directly apply a conventional convolutional neural network structure for processing.
In the past years, the task of image-based target detection develops rapidly, however, the lidar-based 3D target detection method can generally provide precise positioning by means of distance information provided by point cloud, but the sparse characteristic of the point cloud determines that such a method only performs better on a large-volume object, and other processing methods based on point cloud voxels or projection lose more detailed information and are difficult to perform on a small object; the method based on the camera can better utilize compact image data to detect small objects, but the depth and the perspective deformation make the small objects difficult to compete in positioning and shape, and the overall performance has a large difference. Even if the algorithm is based on fusion, most of the algorithms are only based on the fusion of a single-frame view and a region frame level, and not only the point cloud information is damaged, but also the detail information of the surface structure of the object is not considered; even some fusion-based detection algorithms are only based on a region suggestion frame on an image, and then 3D information is acquired by using the point cloud of a corresponding part, so that the performance of the region suggestion can be directly limited by the factors such as perspective deformation, front and back occlusion and the like of an object on the image.
Disclosure of Invention
The invention provides a 3D target detection method and device based on multi-sensor fusion, which are used for solving the defect of inaccurate 3D target detection result in the prior art, and improving the 3D detection precision based on time sequence fusion of various modes.
The invention provides a 3D target detection method based on multi-sensor fusion, which comprises the following steps: acquiring target images of at least two continuous frames and point cloud data corresponding to the target images in time sequence; inputting the acquired point cloud data and the target image into a 3D target detection model to obtain a target detection result output by the 3D target detection model; performing time sequence fusion and modal fusion on the features respectively extracted by the point cloud data and the target image by the 3D target detection model, and then performing 3D target detection to obtain a target detection result; the 3D target detection model is obtained based on training of a training image, an image characteristic true value corresponding to the training image, point cloud training data, a point cloud true value corresponding to the point cloud training data and a target true value.
According to the 3D target detection method based on multi-sensor fusion provided by the invention, the 3D target detection model comprises the following steps: the point cloud feature extraction layer is used for respectively extracting features of each frame of input point cloud data to obtain point cloud features corresponding to each frame of point cloud data; the image feature extraction layer is used for respectively extracting features of input target images of each frame to obtain image features corresponding to the target images of each frame; the time sequence fusion layer carries out time sequence fusion based on the extracted image characteristics of each frame and the input point cloud data of each frame to obtain time sequence fusion characteristics;
the modal fusion layer is used for performing modal fusion on the time sequence fusion feature and the time sequence point cloud feature corresponding to the time sequence fusion feature to obtain a modal fusion feature; and the target detection layer is used for carrying out 3D target detection on the modal fusion characteristics to obtain a target detection result.
According to the 3D target detection method based on multi-sensor fusion provided by the invention, the time sequence fusion is carried out based on the extracted image features of each frame and the input point cloud data of each frame, and the method comprises the following steps: according to the input point cloud data of each frame, carrying out view transformation on the corresponding frame image characteristics to obtain view transformation characteristics; and selecting the view transformation characteristics of two adjacent frames, and mapping the view transformation characteristics of the previous frame to the view transformation characteristics of the current frame to obtain the time sequence fusion characteristics.
According to the 3D target detection method based on multi-sensor fusion provided by the invention, the view transformation is carried out on the corresponding frame image characteristics according to the input point cloud data of each frame, and the method comprises the following steps: respectively projecting each frame of point cloud data to corresponding frame image features, and endowing each pixel position of the corresponding image features with depth dimension coordinates based on the depth information of the point cloud data to obtain projection fusion features with three-dimensional pixel coordinates; and converting the pixel coordinates of the projection fusion characteristics to a radar coordinate system based on a preset coordinate conversion matrix to obtain view transformation characteristics.
According to the 3D target detection method based on multi-sensor fusion provided by the invention, the characteristic extraction is respectively carried out on the input point cloud data of each frame, and the method comprises the following steps: respectively uniformly dividing single-frame point cloud data into a plurality of grid columns with preset sizes; and respectively extracting the features of each grid column based on a preset sampling threshold value to obtain the corresponding point cloud features.
According to the 3D target detection method based on multi-sensor fusion provided by the invention, based on the preset sampling threshold, the characteristic extraction is respectively carried out on each grid column, and the method comprises the following steps: based on the fact that the number of point clouds in a single grid column is not smaller than the preset sampling threshold value, sampling point clouds in corresponding grid columns based on the preset sampling threshold value to obtain sampling features; based on the fact that the number of point clouds in a single grid column is smaller than the sampling threshold value, filling the corresponding grid column according to preset numerical values to obtain filling characteristics; and obtaining corresponding frame point cloud characteristics based on all the sampling characteristics and all the filling characteristics.
According to the 3D target detection method based on multi-sensor fusion provided by the invention, the training of the 3D target detection model comprises the following steps: acquiring a multi-frame training image, multi-frame point cloud training data and a target true value of the same target; the point cloud training data is used as input data for training, the target truth value is used as a label, and a pre-constructed 3D target detection model is trained to obtain a first training model; taking the training image as input data used for training, taking the target truth value as a label, and training the first training model to obtain a second training model; and taking the point cloud training data and the training image as input data for training, taking the target truth value as a label, and training the second training model to obtain a 3D target detection model.
The invention also provides a 3D target detection device based on multi-sensor fusion, which comprises: the data acquisition module is used for acquiring target images of at least two continuous frames and point cloud data corresponding to the target image time sequence; the 3D target detection module is used for inputting the acquired point cloud data and the target image into a 3D target detection model to obtain a target detection result output by the 3D target detection model; the 3D target detection model carries out time sequence fusion and modal fusion based on the characteristics respectively extracted from the point cloud data and the target image, and then carries out 3D target detection to obtain a target detection result; the 3D target detection model is obtained based on training of a training image, an image characteristic true value corresponding to the training image, point cloud training data, a point cloud true value corresponding to the point cloud training data and a target true value.
The invention also provides an electronic device, which comprises a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor executes the program to realize the steps of the multi-sensor fusion-based 3D target detection method.
The present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the multi-sensor fusion based 3D object detection method as described in any of the above.
The present invention also provides a computer program product comprising a computer program which, when executed by a processor, performs the steps of any one of the above-described multi-sensor fusion based 3D object detection methods.
According to the 3D target detection method and device based on multi-sensor fusion, the target to be detected is detected through the 3D target detection model, and the accuracy of 3D detection is improved by combining time sequence fusion of each mode, especially the effect of motion information such as target detection rate and speed; in addition, single-mode input is supported, and when a certain sensor fails, the normal operation of the model is not influenced.
Drawings
In order to more clearly illustrate the technical solutions of the present invention or the prior art, the drawings needed for the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
FIG. 1 is a schematic flow chart of a 3D object detection method based on multi-sensor fusion provided by the invention;
FIG. 2 is a second schematic flowchart of a 3D object detection method based on multi-sensor fusion according to the present invention;
FIG. 3 is a schematic structural diagram of a 3D object detection device based on multi-sensor fusion provided by the invention;
fig. 4 is a schematic structural diagram of an electronic device provided in the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is obvious that the described embodiments are some, but not all embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 shows a flow chart of a 3D target detection method based on multi-sensor fusion, which includes:
s11, acquiring target images of at least two continuous frames and point cloud data corresponding to the target images in time sequence;
s12, inputting the acquired point cloud data and the target image into a 3D target detection model to obtain a target detection result output by the 3D target detection model; the 3D target detection model carries out time sequence fusion and modal fusion based on the characteristics respectively extracted from the point cloud data and the target image, and then carries out 3D target detection to obtain a target detection result; the 3D target detection model is obtained based on training images, image characteristic true values corresponding to the training images, point cloud training data, point cloud true values corresponding to the point cloud training data and target true values.
It should be noted that S1N in this specification does not represent the sequence of the multi-sensor fusion-based 3D object detection method, and the multi-sensor fusion-based 3D object detection method of the present invention is described below with reference to fig. 2 specifically.
And S11, acquiring target images of at least two continuous frames and point cloud data corresponding to the target image time sequence.
In this embodiment, acquiring target images of at least two consecutive frames includes: acquiring a video stream in a target time period for a target area to be detected; and extracting video frames based on the video stream to obtain at least two continuous frame target images. It is necessary to add that, after extracting the video, the method includes: and sampling the extracted video frame based on a preset sampling frequency to obtain a target image.
In an alternative embodiment, acquiring at least two consecutive frames of the target image comprises: acquiring images to be detected which are continuously shot in a target time period of a target area to be detected; and sampling from the continuously shot detection images based on a preset sampling frequency to obtain a target image.
The target image is captured by a camera on the vehicle, and the camera may be a camera such as a camera of the vehicle body. In addition, the vehicle may be a vehicle, a ship, an airplane, or other vehicles for carrying people or goods, wherein the vehicle may be a private car or an operating vehicle, such as a shared automobile, a network appointment car, a taxi, a bus, a school bus, a truck, a passenger car, a train, a subway, a tram, or the like.
In addition, the point cloud data corresponding to the target image time sequence is obtained, and the method comprises the following steps: acquiring a radar point cloud data set aiming at a target area to be detected; and acquiring point cloud data of corresponding time sequence from the radar point cloud data set based on the time sequence corresponding to each frame of target image.
S12, inputting the acquired point cloud data and the target image into a 3D target detection model to obtain a target detection result output by the 3D target detection model; the 3D target detection model carries out time sequence fusion and modal fusion based on the characteristics respectively extracted from the point cloud data and the target image, and then carries out 3D target detection to obtain a target detection result; the 3D target detection model is obtained based on training images, image characteristic true values corresponding to the training images, point cloud training data, point cloud true values corresponding to the point cloud training data and target true values.
In this embodiment, referring to fig. 2,3D, the object detection model includes: the point cloud feature extraction layer is used for respectively extracting features of the input point cloud data of each frame to obtain point cloud features corresponding to the point cloud data of each frame; the image characteristic extraction layer is used for respectively extracting the characteristics of each input frame of target image to obtain the image characteristics corresponding to each frame of target image; the time sequence fusion layer carries out time sequence fusion based on the extracted image features of each frame and the input point cloud data of each frame to obtain time sequence fusion features; the modal fusion layer is used for performing modal fusion on the time sequence fusion characteristics and the time sequence point cloud characteristics corresponding to the time sequence fusion characteristics to obtain modal fusion characteristics; and the target detection layer is used for carrying out 3D target detection on the mode fusion characteristics to obtain a target detection result.
It should be noted that the target detection result includes a center point prediction result hm, a 3D size prediction result, and a prediction error reg. It is necessary to supplement that the obtained hm is a down-sampling of the original center point, and the result obtained by performing integer division and integer rounding based on a preset value is also required, so that the error between the hm and the actual center point position needs to be reflected by reg, and the actual center point position is convenient to be determined according to hm and reg.
Specifically, the method for extracting features of each frame of point cloud data comprises the following steps: respectively uniformly dividing single-frame point cloud data into a plurality of grid columns with preset sizes; and respectively extracting the features of each grid column based on a preset sampling threshold value to obtain the corresponding point cloud features.
Furthermore, the method for uniformly dividing the single-frame point cloud data into a plurality of grid columns with preset sizes includes: dividing single-frame point cloud data into a plurality of grids with preset sizes; and extending each grid in the Z-axis direction to obtain a corresponding grid column.
In addition, based on the preset sampling threshold, the feature extraction is respectively carried out on each grid column, and the method comprises the following steps: based on the fact that the number of point clouds in a single grid column is not smaller than a preset sampling threshold, sampling the point clouds in the corresponding grid columns based on the preset sampling threshold to obtain sampling characteristics; based on the fact that the number of point clouds in a single grid column is smaller than a sampling threshold value, filling the corresponding grid column according to preset numerical values to obtain filling characteristics; and obtaining corresponding frame point cloud characteristics based on all the sampling characteristics and all the filling characteristics.
It should be noted that, in an alternative embodiment, feature extraction may be performed on each frame of point cloud data input based on the point cloud Feature extraction result PFN (pilar Feature Net) to convert the input point cloud data into a sparse point cloud.
In addition, in this embodiment, performing time-series fusion based on the extracted image features of each frame and the input point cloud data of each frame includes: according to the input point cloud data of each frame, carrying out view transformation on the corresponding frame image characteristics to obtain view transformation characteristics; and selecting the view transformation characteristics of two adjacent frames, and mapping the view transformation characteristics of the previous frame to the view transformation characteristics of the current frame to obtain the time sequence fusion characteristics.
Furthermore, according to the input point cloud data of each frame, the view transformation is performed on the corresponding image features of the frame, which includes: respectively projecting each frame of point cloud data to the corresponding frame of image features, and endowing each pixel position of the corresponding image features with a depth dimension coordinate based on the depth information of the point cloud data to obtain a projection fusion feature with a three-dimensional pixel coordinate; and converting the pixel coordinates of the projection fusion characteristics to a radar coordinate system based on a preset coordinate conversion matrix to obtain view transformation characteristics.
In an alternative embodiment, training a 3D object detection model comprises: acquiring a multi-frame training image, multi-frame point cloud training data and a target true value of the same target; the method comprises the steps that point cloud training data are used as input data used for training, a target truth value is used as a label, a pre-constructed 3D target detection model is trained, and a first training model is obtained; training the first training model by taking the training image as input data used for training and taking the target truth value as a label to obtain a second training model; and taking the point cloud training data and the training image as input data used for training, taking the target truth value as a label, and training the second training model to obtain the 3D target detection model.
It should be noted that, when a pre-constructed 3D target detection model is trained, only point cloud training data is input, and no training image is input, so as to train only the 3D target detection model when point cloud is input; when the first training model is trained, only a training image is input, point cloud training data is not input, and only a detection model when the training image is input is trained; when the second training model is trained, the networks except the target detection layer are frozen, namely the point cloud feature extraction layer, the image feature extraction layer, the time sequence fusion layer and the modal fusion layer in the application are correspondingly frozen, so that the training for the target detection layer is realized, the detection precision of the whole 3D target detection model is further improved, especially the effects of motion information such as target detection rate, speed and the like are improved, single-modal input is supported, and when a certain sensor fails, the model can normally run.
In summary, the embodiment of the present invention detects the target to be detected through the 3D target detection model, and improves the precision of 3D detection, especially the effect of motion information such as target detection rate and speed, by combining the time sequence fusion of each mode; in addition, single-mode input is supported, and when a certain sensor fails, the normal operation of the model is not influenced.
The following describes the multi-sensor fusion-based 3D object detection apparatus provided in the present invention, and the multi-sensor fusion-based 3D object detection apparatus described below and the multi-sensor fusion-based 3D object detection method described above may be referred to in correspondence with each other.
Fig. 3 shows a schematic structural diagram of a 3D object detection device based on multi-sensor fusion, which includes:
the data acquisition module 31 is used for acquiring target images of at least two continuous frames and point cloud data corresponding to the target image time sequence;
the 3D target detection module 32 inputs the acquired point cloud data and the target image into the 3D target detection model to obtain a target detection result output by the 3D target detection model; the 3D target detection model carries out time sequence fusion and modal fusion based on the characteristics respectively extracted from the point cloud data and the target image, and then carries out 3D target detection to obtain a target detection result; the 3D target detection model is obtained based on training images, image characteristic true values corresponding to the training images, point cloud training data, point cloud true values corresponding to the point cloud training data and target true values.
In this embodiment, the data acquiring module 31 includes: the video acquisition unit is used for acquiring a video stream in a target time period for a target area to be detected; and the image extraction unit is used for extracting the video frames based on the video stream to obtain at least two continuous frames of target images. It should be added that the data obtaining module 31 further includes: and the sampling unit is used for sampling the extracted video frame based on a preset sampling frequency to obtain a target image.
In an optional embodiment, the data obtaining module 31 includes: acquiring images to be detected which are continuously shot in a target time period of a target area to be detected; and sampling from the continuously shot detection images based on a preset sampling frequency to obtain a target image.
In addition, the data acquisition module 31 further includes: the point cloud data acquisition unit is used for acquiring a radar point cloud data set aiming at a target area to be detected; and the point cloud data screening unit is used for acquiring point cloud data of corresponding time sequences from the radar point cloud data set based on the time sequences corresponding to the target images of the frames.
The 3D target detection module 32 comprises a data input unit, a 3D target detection model unit and a data output unit, wherein the data input unit is used for inputting the acquired point cloud data and the target image into the 3D target detection model unit; the 3D target detection model unit is used for carrying out target detection according to the input point cloud data and the target image to obtain a target detection result; and the data output unit is used for outputting the target detection result obtained by the 3D target detection model unit.
Still further, a 3D object detection model unit, comprising: the point cloud feature extraction unit is used for respectively extracting features of the input point cloud data of each frame to obtain point cloud features corresponding to the point cloud data of each frame; the image feature extraction unit is used for respectively extracting features of each input frame of target image to obtain image features corresponding to each frame of target image; the time sequence fusion unit is used for carrying out time sequence fusion on the basis of the extracted image features of each frame and the input point cloud data of each frame to obtain time sequence fusion features; the modal fusion unit is used for performing modal fusion on the time sequence fusion feature and the point cloud feature of the time sequence corresponding to the time sequence fusion feature to obtain a modal fusion feature; and the target detection unit is used for carrying out 3D target detection on the mode fusion characteristics to obtain a target detection result.
Wherein, the point cloud characteristic extraction unit includes: the grid dividing subunit uniformly divides the single-frame point cloud data into a plurality of grid columns with preset sizes; and the point cloud feature extraction subunit is used for respectively extracting features of each grid column based on a preset sampling threshold value to obtain corresponding point cloud features.
Further, the grid dividing subunit includes: a mesh division unit which divides the single-frame point cloud data into a plurality of meshes with preset sizes; and the grid column acquiring unit is used for extending each grid in the Z-axis direction to obtain a corresponding grid column.
The point cloud feature extraction subunit comprises: the sampling sun unit is used for sampling the point clouds in the corresponding grid columns based on the preset sampling threshold value to obtain sampling characteristics if the number of the point clouds in a single grid column is not less than the preset sampling threshold value; the filling sun unit is used for filling the corresponding grid columns according to preset numerical values based on the fact that the number of point clouds in a single grid column is smaller than a sampling threshold value, and filling characteristics are obtained; and the point cloud feature obtaining unit is used for obtaining the point cloud features of the corresponding frame based on all the sampling features and all the filling features.
In addition, in this embodiment, the timing fusion unit includes: the view transformation subunit performs view transformation on the corresponding frame image characteristics according to the input point cloud data of each frame to obtain view transformation characteristics; and the time sequence feature fusion subunit selects the view transformation features of two adjacent frames, and maps the view transformation features of the previous frame to the view transformation features of the current frame to obtain the time sequence fusion features.
Further, the view transformation subunit includes: the depth information fusion unit is used for respectively projecting each frame of point cloud data to the corresponding frame of image characteristics, endowing each pixel position of the corresponding image characteristics with a depth dimension coordinate based on the depth information of the point cloud data, and obtaining a projection fusion characteristic with a three-dimensional pixel coordinate; and the coordinate conversion unit is used for converting the pixel coordinates of the projection fusion characteristics to a radar coordinate system based on a preset coordinate conversion matrix to obtain the view transformation characteristics.
In an optional embodiment, the apparatus further comprises a training module for training the 3D object detection model. Specifically, a training module comprising: the training data acquisition unit is used for acquiring multi-frame training images, multi-frame point cloud training data and a target true value of the same target; the first training unit is used for training a pre-constructed 3D target detection model by taking point cloud training data as input data used for training and target truth values as labels to obtain a first training model; the second training unit is used for training the first training model by taking the training image as input data used for training and taking the target truth value as a label to obtain a second training model; and the third training unit is used for training the second training model by taking the point cloud training data and the training image as input data used for training and taking the target truth value as a label to obtain the 3D target detection model.
In summary, the embodiment of the present invention detects the target to be detected through the 3D target detection model, and improves the precision of 3D detection, especially the effect of motion information such as target detection rate and speed, by combining the time sequence fusion of each mode; in addition, single-mode input is supported, and when a certain sensor fails, the normal operation of the model is not influenced.
Fig. 4 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 4: a processor (processor) 41, a communication Interface (communication Interface) 42, a memory (memory) 43 and a communication bus 44, wherein the processor 41, the communication Interface 42 and the memory 43 complete communication with each other through the communication bus 44. Processor 41 may invoke logic instructions in memory 43 to perform a multi-sensor fusion based 3D object detection method comprising: acquiring target images of at least two continuous frames and point cloud data corresponding to the target image time sequence; inputting the acquired point cloud data and the target image into a 3D target detection model to obtain a target detection result output by the 3D target detection model; the 3D target detection model carries out time sequence fusion and modal fusion based on the characteristics respectively extracted from the point cloud data and the target image, and then carries out 3D target detection to obtain a target detection result; the 3D target detection model is obtained based on training images, image characteristic true values corresponding to the training images, point cloud training data, point cloud true values corresponding to the point cloud training data and target true values.
Furthermore, the logic instructions in the memory 43 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, the present invention also provides a computer program product, the computer program product comprising a computer program, the computer program being stored on a non-transitory computer-readable storage medium, wherein when the computer program is executed by a processor, a computer is capable of executing the multi-sensor fusion-based 3D object detection method provided by the above methods, the method comprising: acquiring target images of at least two continuous frames and point cloud data corresponding to the target image time sequence; inputting the acquired point cloud data and the target image into a 3D target detection model to obtain a target detection result output by the 3D target detection model; the 3D target detection model carries out time sequence fusion and modal fusion based on the characteristics respectively extracted from the point cloud data and the target image, and then carries out 3D target detection to obtain a target detection result; the 3D target detection model is obtained based on training images, image characteristic true values corresponding to the training images, point cloud training data, point cloud true values corresponding to the point cloud training data and target true values.
In yet another aspect, the present invention also provides a non-transitory computer-readable storage medium, on which a computer program is stored, the computer program being implemented by a processor to perform the multi-sensor fusion-based 3D object detection method provided by the above methods, the method including: acquiring target images of at least two continuous frames and point cloud data corresponding to the time sequence of the target images; inputting the acquired point cloud data and the target image into a 3D target detection model to obtain a target detection result output by the 3D target detection model; the 3D target detection model carries out time sequence fusion and modal fusion based on the characteristics respectively extracted from the point cloud data and the target image, and then carries out 3D target detection to obtain a target detection result; the 3D target detection model is obtained based on training images, image characteristic true values corresponding to the training images, point cloud training data, point cloud true values corresponding to the point cloud training data and target true values.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A3D target detection method based on multi-sensor fusion is characterized by comprising the following steps:
acquiring target images of at least two continuous frames and point cloud data corresponding to the target images in time sequence;
inputting the acquired point cloud data and the target image into a 3D target detection model to obtain a target detection result output by the 3D target detection model;
performing time sequence fusion and modal fusion on the features respectively extracted by the point cloud data and the target image by the 3D target detection model, and then performing 3D target detection to obtain a target detection result;
the 3D target detection model is obtained based on training of a training image, an image characteristic true value corresponding to the training image, point cloud training data, a point cloud true value corresponding to the point cloud training data and a target true value.
2. The multi-sensor fusion-based 3D object detection method according to claim 1, wherein the 3D object detection model comprises:
the point cloud feature extraction layer is used for respectively extracting features of the input point cloud data of each frame to obtain point cloud features corresponding to the point cloud data of each frame;
the image feature extraction layer is used for respectively extracting features of input target images of each frame to obtain image features corresponding to the target images of each frame;
the time sequence fusion layer carries out time sequence fusion based on the extracted image characteristics of each frame and the input point cloud data of each frame to obtain time sequence fusion characteristics;
the modal fusion layer is used for performing modal fusion on the time sequence fusion feature and the time sequence point cloud feature corresponding to the time sequence fusion feature to obtain a modal fusion feature;
and the target detection layer is used for carrying out 3D target detection on the modal fusion characteristics to obtain a target detection result.
3. The multi-sensor fusion-based 3D object detection method according to claim 2, wherein the performing time-series fusion based on the extracted image features of each frame and the input point cloud data of each frame comprises:
according to the input point cloud data of each frame, carrying out view transformation on the corresponding frame image characteristics to obtain view transformation characteristics;
and selecting the view transformation characteristics of two adjacent frames, and mapping the view transformation characteristics of the previous frame to the view transformation characteristics of the current frame to obtain the time sequence fusion characteristics.
4. The multi-sensor fusion-based 3D target detection method according to claim 3, wherein the view transformation of the corresponding frame image features according to the input frame point cloud data comprises:
respectively projecting each frame of point cloud data to corresponding frame image features, and endowing each pixel position of the corresponding image features with depth dimension coordinates based on the depth information of the point cloud data to obtain projection fusion features with three-dimensional pixel coordinates;
and converting the pixel coordinates of the projection fusion characteristics to a radar coordinate system based on a preset coordinate conversion matrix to obtain view transformation characteristics.
5. The multi-sensor fusion-based 3D target detection method according to claim 2, wherein the performing feature extraction on each frame of point cloud data respectively comprises:
respectively uniformly dividing single-frame point cloud data into a plurality of grid columns with preset sizes;
and respectively extracting the features of each grid column based on a preset sampling threshold value to obtain the corresponding point cloud features.
6. The multi-sensor fusion-based 3D target detection method according to claim 5, wherein the performing feature extraction on each grid pillar based on a preset sampling threshold comprises:
based on the fact that the number of point clouds in a single grid column is not smaller than the preset sampling threshold, sampling the point clouds in the corresponding grid columns based on the preset sampling threshold to obtain sampling characteristics;
based on the fact that the number of point clouds in a single grid column is smaller than the sampling threshold value, filling the corresponding grid column according to preset numerical values to obtain filling characteristics;
and obtaining corresponding frame point cloud characteristics based on all the sampling characteristics and all the filling characteristics.
7. The multi-sensor fusion-based 3D object detection method of claim 2, wherein training the 3D object detection model comprises:
acquiring a multi-frame training image, multi-frame point cloud training data and a target true value of the same target;
the point cloud training data is used as input data for training, the target truth value is used as a label, and a pre-constructed 3D target detection model is trained to obtain a first training model;
training the first training model by taking the training image as input data used for training and the target truth value as a label to obtain a second training model;
and taking the point cloud training data and the training image as input data for training, taking the target truth value as a label, and training the second training model to obtain a 3D target detection model.
8. A3D target detection device based on multi-sensor fusion is characterized by comprising:
the data acquisition module is used for acquiring target images of at least two continuous frames and point cloud data corresponding to the target image time sequence;
the 3D target detection module is used for inputting the acquired point cloud data and the target image into a 3D target detection model to obtain a target detection result output by the 3D target detection model;
the 3D target detection model carries out time sequence fusion and modal fusion based on the characteristics respectively extracted from the point cloud data and the target image, and then carries out 3D target detection to obtain a target detection result;
the 3D target detection model is obtained based on training of a training image, an image characteristic true value corresponding to the training image, point cloud training data, a point cloud true value corresponding to the point cloud training data and a target true value.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor, when executing the program, implements the steps of the multi-sensor fusion based 3D object detection method according to any of claims 1 to 7.
10. A non-transitory computer readable storage medium, having stored thereon a computer program, wherein the computer program, when being executed by a processor, is adapted to carry out the steps of the multi-sensor fusion based 3D object detection method according to any one of claims 1 to 7.
CN202211427433.6A 2022-11-14 2022-11-14 3D target detection method and device based on multi-sensor fusion Pending CN115761723A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211427433.6A CN115761723A (en) 2022-11-14 2022-11-14 3D target detection method and device based on multi-sensor fusion

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211427433.6A CN115761723A (en) 2022-11-14 2022-11-14 3D target detection method and device based on multi-sensor fusion

Publications (1)

Publication Number Publication Date
CN115761723A true CN115761723A (en) 2023-03-07

Family

ID=85371220

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211427433.6A Pending CN115761723A (en) 2022-11-14 2022-11-14 3D target detection method and device based on multi-sensor fusion

Country Status (1)

Country Link
CN (1) CN115761723A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116740669A (en) * 2023-08-16 2023-09-12 之江实验室 Multi-view image detection method, device, computer equipment and storage medium
CN117173692A (en) * 2023-11-02 2023-12-05 安徽蔚来智驾科技有限公司 3D target detection method, electronic device, medium and driving device
CN117312828A (en) * 2023-09-28 2023-12-29 光谷技术有限公司 Public facility monitoring method and system

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116740669A (en) * 2023-08-16 2023-09-12 之江实验室 Multi-view image detection method, device, computer equipment and storage medium
CN116740669B (en) * 2023-08-16 2023-11-14 之江实验室 Multi-view image detection method, device, computer equipment and storage medium
CN117312828A (en) * 2023-09-28 2023-12-29 光谷技术有限公司 Public facility monitoring method and system
CN117173692A (en) * 2023-11-02 2023-12-05 安徽蔚来智驾科技有限公司 3D target detection method, electronic device, medium and driving device
CN117173692B (en) * 2023-11-02 2024-02-02 安徽蔚来智驾科技有限公司 3D target detection method, electronic device, medium and driving device

Similar Documents

Publication Publication Date Title
CN115761723A (en) 3D target detection method and device based on multi-sensor fusion
CN108509820B (en) Obstacle segmentation method and device, computer equipment and readable medium
US11747444B2 (en) LiDAR-based object detection and classification
CN110163930A (en) Lane line generation method, device, equipment, system and readable storage medium storing program for executing
CN108470174B (en) Obstacle segmentation method and device, computer equipment and readable medium
CN111209825B (en) Method and device for dynamic target 3D detection
CN112001226B (en) Unmanned 3D target detection method, device and storage medium
CN112912890A (en) Method and system for generating synthetic point cloud data using generative models
KR102095842B1 (en) Apparatus for Building Grid Map and Method there of
CN110879994A (en) Three-dimensional visual inspection detection method, system and device based on shape attention mechanism
CN104077808A (en) Real-time three-dimensional face modeling method used for computer graph and image processing and based on depth information
CN112446227A (en) Object detection method, device and equipment
CN111257882B (en) Data fusion method and device, unmanned equipment and readable storage medium
US11544898B2 (en) Method, computer device and storage medium for real-time urban scene reconstruction
CN112883790A (en) 3D object detection method based on monocular camera
CN112154448A (en) Target detection method and device and movable platform
CN116740668B (en) Three-dimensional object detection method, three-dimensional object detection device, computer equipment and storage medium
CN116168384A (en) Point cloud target detection method and device, electronic equipment and storage medium
CN116310673A (en) Three-dimensional target detection method based on fusion of point cloud and image features
CN117197388A (en) Live-action three-dimensional virtual reality scene construction method and system based on generation of antagonistic neural network and oblique photography
CN115147798A (en) Method, model and device for predicting travelable area and vehicle
CN113255779B (en) Multi-source perception data fusion identification method, system and computer readable storage medium
CN114155414A (en) Novel unmanned-driving-oriented feature layer data fusion method and system and target detection method
WO2021098666A1 (en) Hand gesture detection method and device, and computer storage medium
CN117422884A (en) Three-dimensional target detection method, system, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination