CN113780257A - Multi-mode fusion weak supervision vehicle target detection method and system - Google Patents

Multi-mode fusion weak supervision vehicle target detection method and system Download PDF

Info

Publication number
CN113780257A
CN113780257A CN202111338590.5A CN202111338590A CN113780257A CN 113780257 A CN113780257 A CN 113780257A CN 202111338590 A CN202111338590 A CN 202111338590A CN 113780257 A CN113780257 A CN 113780257A
Authority
CN
China
Prior art keywords
point cloud
prediction frame
frame
target detection
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111338590.5A
Other languages
Chinese (zh)
Other versions
CN113780257B (en
Inventor
唐作进
戴捷
孙波
马铜伟
李道胜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zidong Information Technology Suzhou Co ltd
Original Assignee
Zidong Information Technology Suzhou Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zidong Information Technology Suzhou Co ltd filed Critical Zidong Information Technology Suzhou Co ltd
Priority to CN202111338590.5A priority Critical patent/CN113780257B/en
Publication of CN113780257A publication Critical patent/CN113780257A/en
Application granted granted Critical
Publication of CN113780257B publication Critical patent/CN113780257B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Traffic Control Systems (AREA)
  • Image Analysis (AREA)

Abstract

The invention relates to a multi-mode fusion weak supervision vehicle target detection method and system, wherein the method comprises the steps of obtaining 3D point cloud data and image data; acquiring 3D prediction frame parameters and characteristics thereof based on the 3D point cloud data, and acquiring a 2D point cloud map and characteristics thereof; acquiring a first-stage fusion feature of fusion of the feature of the 3D prediction frame and the feature of the image, generating a 2D target detection frame, acquiring a second-stage fusion feature of fusion of the feature of the 3D prediction frame and the feature of the 2D point cloud map, and generating a 3D candidate prediction frame; and filtering and screening the 3D candidate prediction frame based on the 2D target detection frame and a set supervision confidence threshold value between the image and the point cloud, and outputting a 3D target detection frame for detecting a target object in the scene. According to the method, the point cloud characteristics and the image characteristics are obtained under the condition of not depending on the labels, the dependence of 3D target detection on semantic labels is greatly reduced, and the detection precision is obviously improved.

Description

Multi-mode fusion weak supervision vehicle target detection method and system
Technical Field
The invention relates to the technical field of target detection, in particular to a method and a system for detecting a multi-mode fusion weakly supervised vehicle target.
Background
A key task in scene understanding is detection of three-dimensional objects, which has become a hot research problem in various application fields such as automatic driving, and the purpose of the three-dimensional target detection technology is to detect and locate a three-dimensional bounding box of a detected object from input sensor data. Most of the existing three-dimensional object detectors are based on complete supervised learning, a large number of modal three-dimensional bounding boxes need to be labeled manually in irregular point cloud data in the application of scenes lacking 3D labels, and the time cost of the labeling process greatly limits the application of the three-dimensional object detection technology.
The weak supervision detection is a method capable of effectively reducing the dependence of target detection on training labels, but the existing weak supervision object detector is mainly used for two-dimensional object detection, but not three-dimensional detection. A method for realizing weak supervision and even unsupervised learning of 3D object detection is found, dependence of a detector on training labels can be greatly reduced, and label cost is reduced. Therefore, the study on the weakly supervised or semi-supervised three-dimensional object detector model to adapt to the scene lacking the 3D label has very important practical significance.
On the other hand, most of the existing target detection methods use a visual sensor, and are extremely susceptible to interference of factors such as illumination and visibility in an outdoor environment, and the accuracy of depth information acquisition by the visual sensor alone cannot be guaranteed.
Disclosure of Invention
Therefore, the technical problem to be solved by the invention is to overcome the problems in the prior art, and provide a method and a system for detecting a multi-mode fusion weakly supervised vehicle target, wherein point cloud characteristics and image characteristics are obtained without depending on a label, so that the dependence of 3D target detection on a semantic label is greatly reduced, the detection precision is remarkably improved, and the accuracy and the applicability of the target detection are further improved.
In order to solve the technical problem, the invention provides a multi-mode fusion weak supervision vehicle target detection method, which comprises the following steps:
acquiring 3D laser point cloud data and image data in a scene;
acquiring a 3D prediction frame parameter based on the 3D laser point cloud data, performing grid pooling feature extraction on the 3D prediction frame parameter to acquire a feature of a 3D prediction frame, acquiring a 2D point cloud map based on the 3D laser point cloud data, and performing feature extraction on the 2D point cloud map to acquire a feature of the 2D point cloud map;
fusing the features of the 3D prediction frame with the features of an image acquired based on the image data to acquire first-stage fusion features, generating a 2D target detection frame based on the first-stage fusion features, fusing the features of the 3D prediction frame with the features of a 2D point cloud map to acquire second-stage fusion features, and generating a 3D candidate prediction frame based on the second-stage fusion features;
and filtering and screening the 3D candidate prediction frame based on the 2D target detection frame and a set supervision confidence threshold value between the image and the point cloud, and outputting a 3D target detection frame for detecting a target object in the scene, wherein the 3D target detection frame is the filtered and screened 3D candidate prediction frame.
In one embodiment of the invention, the 3D laser point cloud data in the scene is acquired by a laser radar device, and the image data in the scene is acquired by an RGB image acquisition device.
In one embodiment of the invention, the method for acquiring the 3D prediction frame parameters based on the 3D laser point cloud data comprises the following steps:
presetting a range anchor point for the 3D laser point cloud data by utilizing ground truth supervision, performing feature learning on the 3D laser point cloud data in the range anchor point through a PointNet network, extracting the 3D laser point cloud feature, and acquiring the 3D prediction frame parameter based on the 3D laser point cloud feature.
In an embodiment of the present invention, the method for performing mesh pooling feature extraction on the 3D prediction frame parameter to obtain the feature of the 3D prediction frame includes:
and learning the parameters of the 3D prediction frame by utilizing a PointNet network to obtain the 3D prediction frame with continuous parameters, and then deleting the overlapped 3D prediction frame to obtain the characteristics of the 3D prediction frame.
In one embodiment of the invention, a method of obtaining a 2D point cloud map based on the 3D laser point cloud data comprises:
and the 3D laser point cloud data generates a 2D point cloud map based on the same anchor point by using the preset anchor point projection.
In one embodiment of the invention, a method of acquiring features of an image based on the image data comprises:
and acquiring the characteristics of the image based on the image data by using the trained pre-training model.
In one embodiment of the present invention, a method for generating a 2D object detection frame based on the first-stage fusion features includes:
and classifying, regressing and projecting the first-stage fusion features to generate a 2D target detection frame.
In one embodiment of the present invention, generating a 3D candidate prediction box based on the second-stage fused features comprises:
and inputting the second-stage fusion features into an encoder and a decoder based on an attention mechanism for processing to obtain a 3D candidate prediction frame.
In one embodiment of the invention, the method for filtering and screening the 3D candidate prediction frame based on the 2D target detection frame and the set supervision confidence threshold between the image and the point cloud comprises the following steps:
projecting the 3D candidate prediction frame into a 2D candidate prediction frame, judging whether the similarity between the 2D candidate prediction frame and the 2D target detection frame is greater than a set similarity threshold, if not, traversing the 3D candidate prediction frame, if so, continuously judging whether the confidence coefficient of the 3D candidate prediction frame is greater than a set supervision confidence coefficient threshold, if not, returning to the step of traversing the 3D candidate prediction frame, and if so, outputting the 3D candidate prediction frame.
In addition, the invention also provides a multi-mode fusion weak supervision vehicle target detection system, which comprises:
the system comprises a data acquisition module, a data processing module and a data processing module, wherein the data acquisition module is used for acquiring 3D laser point cloud data and image data in a scene;
the point cloud data processing module is used for acquiring a 3D prediction frame parameter based on the 3D laser point cloud data, performing grid pooling feature extraction on the 3D prediction frame parameter to acquire a feature of a 3D prediction frame, acquiring a 2D point cloud map based on the 3D laser point cloud data, and performing feature extraction on the 2D point cloud map to acquire a feature of the 2D point cloud map;
a feature fusion module, configured to fuse features of the 3D prediction frame with features of an image obtained based on the image data to obtain first-stage fusion features, generate a 2D target detection frame based on the first-stage fusion features, fuse features of the 3D prediction frame with features of a 2D point cloud map to obtain second-stage fusion features, and generate a 3D candidate prediction frame based on the second-stage fusion features;
and the network supervision module is used for filtering and screening the 3D candidate prediction frame based on the 2D target detection frame and a supervision confidence threshold value between the set image and the point cloud, and outputting a 3D target detection frame for detecting a target object in the scene, wherein the 3D target detection frame is the filtered and screened 3D candidate prediction frame.
Compared with the prior art, the technical scheme of the invention has the following advantages:
the method obtains the point cloud characteristics and the image characteristics without depending on the labels, greatly reduces the dependence of 3D target detection on semantic labels, performs multi-stage fusion on the point cloud characteristics, the image characteristics and the point cloud characteristics, outputs a 3D target detection frame for detecting the target object in the scene in a network supervision mode, obviously improves the detection precision, and further improves the accuracy and the applicability of the target detection.
Drawings
In order that the present disclosure may be more readily and clearly understood, reference will now be made in detail to the present disclosure, examples of which are illustrated in the accompanying drawings.
FIG. 1 is a flow chart of the multi-modal fusion vehicle target detection method.
FIG. 2 is another flow chart of the multi-modal fusion vehicle target detection method.
FIG. 3 is a schematic diagram of the hardware structure of the multi-modal fusion vehicle target detection system with weak supervision of the invention.
Wherein the reference numerals are as follows: 10. a data acquisition module; 20. a point cloud data processing module; 30. a feature fusion module; 40. and a network supervision module.
Detailed Description
The present invention is further described below in conjunction with the following figures and specific examples so that those skilled in the art may better understand the present invention and practice it, but the examples are not intended to limit the present invention.
Example one
Referring to fig. 1 and 2, the present embodiment provides a method for detecting a multi-modal fusion vehicle target under weak supervision, including the following steps:
s100: acquiring 3D laser point cloud data and image data in a scene;
s200: acquiring a 3D prediction frame parameter based on the 3D laser point cloud data, performing grid pooling feature extraction on the 3D prediction frame parameter to acquire a feature of a 3D prediction frame, acquiring a 2D point cloud map based on the 3D laser point cloud data, and performing feature extraction on the 2D point cloud map to acquire a feature of the 2D point cloud map;
s300: fusing the features of the 3D prediction frame with the features of an image acquired based on the image data to acquire first-stage fusion features, generating a 2D target detection frame based on the first-stage fusion features, fusing the features of the 3D prediction frame with the features of a 2D point cloud map to acquire second-stage fusion features, and generating a 3D candidate prediction frame based on the second-stage fusion features;
s400: and filtering and screening the 3D candidate prediction frame based on the 2D target detection frame and a set supervision confidence threshold value between the image and the point cloud, and outputting a 3D target detection frame for detecting a target object in the scene, wherein the 3D target detection frame is the filtered and screened 3D candidate prediction frame.
The scene described in the present disclosure may be a scene in front of the vehicle, including a front scene and a side scene, for example, a scene in front of the host vehicle.
In the multi-modal fusion weak surveillance vehicle target detection method disclosed by the invention, the 3D laser point cloud data and the image data are acquired from the same scene.
In the multi-modal fusion weak supervision vehicle target detection method, the 3D laser point cloud data in a scene is acquired through a laser radar device, and the image data in the scene is acquired through an RGB image acquisition device. Color image data is acquired in an arbitrary scene, for example, using a common RGB camera. Or the laser radar 32/64 line is arranged above the vehicle, the laser radar is used as the origin of the vehicle coordinate system to be converted with the laser radar point cloud coordinate system, and the 3D laser point cloud data can be obtained by utilizing the conversion of the rotation matrix and the translation matrix.
The multi-mode fusion weak supervision vehicle target detection method disclosed by the invention has the advantages that the detection accuracy of the object target is further improved by adopting a method for carrying out feature fusion on the 3D laser point cloud data and the image data in the same scene.
For the multi-modal fusion vehicle target detection method of the above embodiment, in step S200, the method for obtaining the 3D prediction frame parameter based on the 3D laser point cloud data includes: presetting a range anchor point for the 3D laser point cloud data by using a small amount of ground truth supervision, performing feature learning on the 3D laser point cloud data in the range anchor point through a PointNet network, extracting the 3D laser point cloud feature, and acquiring the 3D prediction frame parameter based on the 3D laser point cloud feature.
When the 3D laser point cloud data passes through a PointNet network, the dimension of the characteristic point, the number of the seed points and the characteristic range radius r of the seed points are set according to the characteristics of x, y and z coordinates (length, width and height) and the depth D of the data, a small number of high-quality seed points with local proposals are generated through a plurality of characteristic extraction layers, the seed points are used as the central points of a 3D prediction frame, the VoteNet network is used for voting the central seed points, and the parameters of the 3D prediction frame are obtained.
For the multi-modal fusion vehicle target detection method of the above embodiment, in step S200, the method for extracting grid pooling features of the 3D prediction frame parameters to obtain the features of the 3D prediction frame includes: and learning the parameters of the 3D prediction frame by utilizing a PointNet network to obtain the 3D prediction frame with continuous parameters, and then deleting the overlapped 3D prediction frame to obtain the characteristics of the 3D prediction frame.
For the multi-modal fusion vehicle target detection method of the embodiment, in step S200, the method for acquiring a 2D point cloud map based on the 3D laser point cloud data includes: the 3D laser point cloud data generates a 2D point cloud map based on the same anchor point by using the preset anchor point projection, and because the point cloud data is influenced by the sparsity of the laser radar, the screening condition of projection is carried out by using the normalized point cloud density, and then the characteristic extraction is carried out by using a ResNet-50 residual network.
With the multimodal fusion unsupervised vehicle object detection method of the above embodiment, in step S300, the method of acquiring the features of an image based on the image data includes: and acquiring the characteristics of the image based on the image data by using the trained pre-training model.
With the multi-modal fusion unsupervised vehicle object detection method of the above embodiment, in step S400, the method of generating a 2D object detection box based on the first-stage fusion features includes: and classifying, regressing and projecting the first-stage fusion features to generate a 2D target detection frame.
With respect to the multi-modal fusion unsupervised vehicle object detection method of the above embodiment, in step S400, generating a 3D candidate prediction box based on the second-stage fusion features includes: and inputting the second-stage fusion features into an encoder and a decoder based on an attention mechanism for processing to obtain a 3D candidate prediction frame. Specifically, the encoder and the decoder comprise a query matrix, a key matrix, a value matrix and a plurality of attention heads. The single attention head firstly takes the key matrix and the value matrix as input, performs characteristic cross calculation after linear change, then adds the position mask to further learn the characteristics between the global proposal and the local proposal, then calculates the score of each position prediction target by the Softmax layer, and directly fuses and outputs the query matrix and the characteristics of the prediction target score after the linear change.
The 2D object detection frame generated in step S400 is obtained by performing learning training on the 2D object detection frame from data of any type of image in any scene, and the detection accuracy of the 2D object detection frame is good in the 2D object detection method. The method can regard the whole process of finally generating the 2D target detection frame by the image collected by the RGB color camera through multilayer feature extraction as a teacher network, and regard the whole process of generating the 3D candidate prediction frame by the point cloud data scanned by the laser radar through feature extraction and feature fusion as a student network.
For the multi-modal fusion vehicle target detection method of the above embodiment, in step S400, the method for filtering and screening the 3D candidate prediction frame based on the 2D target detection frame and the set surveillance confidence threshold between the image and the point cloud includes: projecting the 3D candidate prediction frame into a 2D candidate prediction frame, judging whether the similarity between the 2D candidate prediction frame and the 2D target detection frame is greater than a set similarity threshold, if not, traversing the 3D candidate prediction frame, if so, continuously judging whether the confidence coefficient of the 3D candidate prediction frame is greater than a set supervision confidence coefficient threshold, if not, returning to the step of traversing the 3D candidate prediction frame, and if so, outputting the 3D candidate prediction frame.
For the multi-modal fusion weak supervision vehicle target detection method of the above embodiment, the teacher network is used to supervise the student network, and the student network learns knowledge from the teacher network and evaluates the own target detection network. According to the method, the confidence level between the student network and the teacher network is evaluated by setting a monitoring confidence level threshold, when the confidence level of the teacher network is larger than or equal to the set threshold, the student network finally outputs a 3D target detection frame for detecting a target object in a scene under the supervision of the teacher network, wherein the 3D target detection frame is a filtered and screened 3D candidate prediction frame.
The method obtains the point cloud characteristics and the image characteristics without depending on the labels, greatly reduces the dependence of 3D target detection on semantic labels, performs multi-stage fusion on the point cloud characteristics, the image characteristics and the point cloud characteristics, outputs a 3D target detection frame for detecting the target object in the scene in a network supervision mode, obviously improves the detection precision, and further improves the accuracy and the applicability of the target detection.
Example two
In the following, a multi-modal fusion weakly supervised vehicle target detection system disclosed by the second embodiment of the present invention is introduced, and a multi-modal fusion weakly supervised vehicle target detection system described below and a multi-modal fusion weakly supervised vehicle target detection method described above may be referred to correspondingly.
Referring to fig. 3, a second embodiment of the invention discloses a multi-modal fusion vehicle target detection system with weak supervision, which specifically includes the following modules.
The system comprises a data acquisition module 10, a data acquisition module 10 and a data processing module, wherein the data acquisition module 10 is used for acquiring 3D laser point cloud data and image data in a scene;
the point cloud data processing module 20 is configured to obtain a 3D prediction frame parameter based on the 3D laser point cloud data, perform mesh pooling feature extraction on the 3D prediction frame parameter to obtain a feature of a 3D prediction frame, obtain a 2D point cloud map based on the 3D laser point cloud data, and perform feature extraction on the 2D point cloud map to obtain a feature of the 2D point cloud map;
a feature fusion module 30, where the feature fusion module 30 is configured to fuse features of the 3D prediction frame with features of an image obtained based on the image data to obtain a first-stage fusion feature, generate a 2D target detection frame based on the first-stage fusion feature, fuse features of the 3D prediction frame with features of a 2D point cloud map to obtain a second-stage fusion feature, and generate a 3D candidate prediction frame based on the second-stage fusion feature;
and the network supervision module 40 is configured to filter and screen the 3D candidate prediction frames based on the 2D target detection frames and a supervision confidence threshold between the set image and the point cloud, and output the 3D target detection frames for detecting the target object in the scene, where the 3D target detection frames are the filtered and screened 3D candidate prediction frames.
The multimodal fusion based unsupervised vehicle object detection system may include corresponding modules that perform each or several of the steps of the above-described flow diagrams. Thus, each step or several steps in the above-described flow diagrams may be performed by a respective module, and the system may comprise one or more of these modules. The modules may be one or more hardware modules specifically configured to perform the respective steps, or implemented by a processor configured to perform the respective steps, or stored within a computer-readable medium for implementation by a processor, or by some combination.
The hardware architecture may be implemented using a bus architecture. The bus architecture may include any number of interconnecting buses and bridges depending on the specific application of the hardware and the overall design constraints. The bus connects together various circuits including one or more processors, memories, and/or hardware modules. The bus may also connect various other circuits such as peripherals, voltage regulators, power management circuits, external antennas, and the like.
The multi-modal fusion vehicle target detection system of the embodiment is used for implementing the multi-modal fusion vehicle target detection method, so that the specific implementation of the system can be seen in the foregoing section of the embodiment of the multi-modal fusion vehicle target detection method, and therefore, the specific implementation thereof can refer to the description of the corresponding section of the embodiment, and will not be further described herein.
In addition, since the multi-modal fusion unsupervised vehicle target detection system of the embodiment is used for implementing the multi-modal fusion unsupervised vehicle target detection method, the function corresponds to that of the method, and the detailed description is omitted here.
As will be appreciated by one skilled in the art, embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It should be understood that the above examples are only for clarity of illustration and are not intended to limit the embodiments. Other variations and modifications will be apparent to persons skilled in the art in light of the above description. And are neither required nor exhaustive of all embodiments. And obvious variations or modifications of the invention may be made without departing from the spirit or scope of the invention.

Claims (10)

1. A multi-mode fusion weak supervision vehicle target detection method is characterized by comprising the following steps:
acquiring 3D laser point cloud data and image data in a scene;
acquiring a 3D prediction frame parameter based on the 3D laser point cloud data, performing grid pooling feature extraction on the 3D prediction frame parameter to acquire a feature of a 3D prediction frame, acquiring a 2D point cloud map based on the 3D laser point cloud data, and performing feature extraction on the 2D point cloud map to acquire a feature of the 2D point cloud map;
fusing the features of the 3D prediction frame with the features of an image acquired based on the image data to acquire first-stage fusion features, generating a 2D target detection frame based on the first-stage fusion features, fusing the features of the 3D prediction frame with the features of a 2D point cloud map to acquire second-stage fusion features, and generating a 3D candidate prediction frame based on the second-stage fusion features;
and filtering and screening the 3D candidate prediction frame based on the 2D target detection frame and a set supervision confidence threshold value between the image and the point cloud, and outputting a 3D target detection frame for detecting a target object in the scene, wherein the 3D target detection frame is the filtered and screened 3D candidate prediction frame.
2. The multi-modal fusion unsupervised vehicle object detection method of claim 1, wherein: the 3D laser point cloud data in the scene is acquired through a laser radar device, and the image data in the scene is acquired through an RGB image acquisition device.
3. The multi-modal fusion unsupervised vehicle object detection method of claim 1, wherein: the method for acquiring the 3D prediction frame parameters based on the 3D laser point cloud data comprises the following steps:
presetting a range anchor point for the 3D laser point cloud data by utilizing ground truth supervision, performing feature learning on the 3D laser point cloud data in the range anchor point through a PointNet network, extracting the 3D laser point cloud feature, and acquiring the 3D prediction frame parameter based on the 3D laser point cloud feature.
4. The multi-modal fusion unsupervised vehicle object detection method of claim 1, wherein: the method for extracting the grid pooling features of the 3D prediction frame parameters to obtain the features of the 3D prediction frame comprises the following steps:
and learning the parameters of the 3D prediction frame by utilizing a PointNet network to obtain the 3D prediction frame with continuous parameters, and then deleting the overlapped 3D prediction frame to obtain the characteristics of the 3D prediction frame.
5. The multi-modal fusion unsupervised vehicle object detection method of claim 1, wherein: the method for acquiring the 2D point cloud map based on the 3D laser point cloud data comprises the following steps:
and the 3D laser point cloud data generates a 2D point cloud map based on the same anchor point by using the preset anchor point projection.
6. The multi-modal fusion unsupervised vehicle object detection method of claim 1, wherein: the method for acquiring the characteristics of the image based on the image data comprises the following steps:
and acquiring the characteristics of the image based on the image data by using the trained pre-training model.
7. The multi-modal fusion unsupervised vehicle object detection method of claim 1, wherein: the method for generating the 2D target detection frame based on the first-stage fusion features comprises the following steps:
and classifying, regressing and projecting the first-stage fusion features to generate a 2D target detection frame.
8. The multi-modal fusion unsupervised vehicle object detection method of claim 1, wherein: generating a 3D candidate prediction box based on the second stage fused features comprises:
and inputting the second-stage fusion features into an encoder and a decoder based on an attention mechanism for processing to obtain a 3D candidate prediction frame.
9. The multi-modal fusion unsupervised vehicle object detection method of claim 1, wherein: the method for filtering and screening the 3D candidate prediction frame based on the 2D target detection frame and the set supervision confidence threshold value between the image and the point cloud comprises the following steps:
projecting the 3D candidate prediction frame into a 2D candidate prediction frame, judging whether the similarity between the 2D candidate prediction frame and the 2D target detection frame is greater than a set similarity threshold, if not, traversing the 3D candidate prediction frame, if so, continuously judging whether the confidence coefficient of the 3D candidate prediction frame is greater than a set supervision confidence coefficient threshold, if not, returning to the step of traversing the 3D candidate prediction frame, and if so, outputting the 3D candidate prediction frame.
10. A multi-modal fusion unsupervised vehicle object detection system, comprising:
the system comprises a data acquisition module, a data processing module and a data processing module, wherein the data acquisition module is used for acquiring 3D laser point cloud data and image data in a scene;
the point cloud data processing module is used for acquiring a 3D prediction frame parameter based on the 3D laser point cloud data, performing grid pooling feature extraction on the 3D prediction frame parameter to acquire a feature of a 3D prediction frame, acquiring a 2D point cloud map based on the 3D laser point cloud data, and performing feature extraction on the 2D point cloud map to acquire a feature of the 2D point cloud map;
a feature fusion module, configured to fuse features of the 3D prediction frame with features of an image obtained based on the image data to obtain first-stage fusion features, generate a 2D target detection frame based on the first-stage fusion features, fuse features of the 3D prediction frame with features of a 2D point cloud map to obtain second-stage fusion features, and generate a 3D candidate prediction frame based on the second-stage fusion features;
and the network supervision module is used for filtering and screening the 3D candidate prediction frame based on the 2D target detection frame and a supervision confidence threshold value between the set image and the point cloud, and outputting a 3D target detection frame for detecting a target object in the scene, wherein the 3D target detection frame is the filtered and screened 3D candidate prediction frame.
CN202111338590.5A 2021-11-12 2021-11-12 Multi-mode fusion weak supervision vehicle target detection method and system Active CN113780257B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111338590.5A CN113780257B (en) 2021-11-12 2021-11-12 Multi-mode fusion weak supervision vehicle target detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111338590.5A CN113780257B (en) 2021-11-12 2021-11-12 Multi-mode fusion weak supervision vehicle target detection method and system

Publications (2)

Publication Number Publication Date
CN113780257A true CN113780257A (en) 2021-12-10
CN113780257B CN113780257B (en) 2022-02-22

Family

ID=78873883

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111338590.5A Active CN113780257B (en) 2021-11-12 2021-11-12 Multi-mode fusion weak supervision vehicle target detection method and system

Country Status (1)

Country Link
CN (1) CN113780257B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114519853A (en) * 2021-12-29 2022-05-20 西安交通大学 Three-dimensional target detection method and system based on multi-mode fusion
CN114842313A (en) * 2022-05-10 2022-08-02 北京易航远智科技有限公司 Target detection method and device based on pseudo-point cloud, electronic equipment and storage medium
CN114881078A (en) * 2022-05-07 2022-08-09 安徽蔚来智驾科技有限公司 Method and system for screening data under predetermined scene
CN115049827A (en) * 2022-05-19 2022-09-13 广州文远知行科技有限公司 Target object detection and segmentation method, device, equipment and storage medium
CN116030023A (en) * 2023-02-02 2023-04-28 泉州装备制造研究所 Point cloud detection method and system
CN117671320A (en) * 2023-05-30 2024-03-08 合肥辉羲智能科技有限公司 Point cloud three-dimensional target automatic labeling method and system based on multi-model fusion

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171217A (en) * 2018-01-29 2018-06-15 深圳市唯特视科技有限公司 A kind of three-dimension object detection method based on converged network
CN109543601A (en) * 2018-11-21 2019-03-29 电子科技大学 A kind of unmanned vehicle object detection method based on multi-modal deep learning
CN112233097A (en) * 2020-10-19 2021-01-15 中国科学技术大学 Road scene other vehicle detection system and method based on space-time domain multi-dimensional fusion
US20210082181A1 (en) * 2019-06-17 2021-03-18 Sensetime Group Limited Method and apparatus for object detection, intelligent driving method and device, and storage medium
CN113435232A (en) * 2020-03-23 2021-09-24 北京京东乾石科技有限公司 Object detection method, device, equipment and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108171217A (en) * 2018-01-29 2018-06-15 深圳市唯特视科技有限公司 A kind of three-dimension object detection method based on converged network
CN109543601A (en) * 2018-11-21 2019-03-29 电子科技大学 A kind of unmanned vehicle object detection method based on multi-modal deep learning
US20210082181A1 (en) * 2019-06-17 2021-03-18 Sensetime Group Limited Method and apparatus for object detection, intelligent driving method and device, and storage medium
CN113435232A (en) * 2020-03-23 2021-09-24 北京京东乾石科技有限公司 Object detection method, device, equipment and storage medium
CN112233097A (en) * 2020-10-19 2021-01-15 中国科学技术大学 Road scene other vehicle detection system and method based on space-time domain multi-dimensional fusion

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114519853A (en) * 2021-12-29 2022-05-20 西安交通大学 Three-dimensional target detection method and system based on multi-mode fusion
CN114881078A (en) * 2022-05-07 2022-08-09 安徽蔚来智驾科技有限公司 Method and system for screening data under predetermined scene
CN114842313A (en) * 2022-05-10 2022-08-02 北京易航远智科技有限公司 Target detection method and device based on pseudo-point cloud, electronic equipment and storage medium
CN114842313B (en) * 2022-05-10 2024-05-31 北京易航远智科技有限公司 Target detection method and device based on pseudo point cloud, electronic equipment and storage medium
CN115049827A (en) * 2022-05-19 2022-09-13 广州文远知行科技有限公司 Target object detection and segmentation method, device, equipment and storage medium
CN116030023A (en) * 2023-02-02 2023-04-28 泉州装备制造研究所 Point cloud detection method and system
CN117671320A (en) * 2023-05-30 2024-03-08 合肥辉羲智能科技有限公司 Point cloud three-dimensional target automatic labeling method and system based on multi-model fusion

Also Published As

Publication number Publication date
CN113780257B (en) 2022-02-22

Similar Documents

Publication Publication Date Title
CN113780257B (en) Multi-mode fusion weak supervision vehicle target detection method and system
Li et al. Cross‐scene pavement distress detection by a novel transfer learning framework
US10373024B2 (en) Image processing device, object detection device, image processing method
CN114596555B (en) Obstacle point cloud data screening method and device, electronic equipment and storage medium
JP6700373B2 (en) Apparatus and method for learning object image packaging for artificial intelligence of video animation
CN113592905B (en) Vehicle driving track prediction method based on monocular camera
CN116188999A (en) Small target detection method based on visible light and infrared image data fusion
US11200455B2 (en) Generating training data for object detection
Zhang et al. Real-time lane detection by using biologically inspired attention mechanism to learn contextual information
Seo et al. Temporary traffic control device detection for road construction projects using deep learning application
Ammous et al. Improved YOLOv3-tiny for silhouette detection using regularisation techniques.
Kheder et al. Transfer learning based traffic light detection and recognition using CNN inception-V3 model
Shao et al. An efficient model for small object detection in the maritime environment
CN117789160A (en) Multi-mode fusion target detection method and system based on cluster optimization
Sekkat et al. Amodalsynthdrive: A synthetic amodal perception dataset for autonomous driving
CN116486239A (en) Image anomaly detection platform based on incremental learning and open set recognition algorithm
Saha et al. A newly proposed object detection method using faster R-CNN inception with ResNet based on Tensorflow
CN116052120A (en) Excavator night object detection method based on image enhancement and multi-sensor fusion
Afdhal et al. Evaluation of benchmarking pre-trained cnn model for autonomous vehicles object detection in mixed traffic
Xu A fusion-based approach to deep-learning and edge-cutting algorithms for identification and color recognition of traffic lights
US20230351734A1 (en) System and Method for Iterative Refinement and Curation of Images Driven by Visual Templates
Cortés et al. Semi-automatic tracking-based labeling tool for automotive applications
CN111126261B (en) Video data analysis method and device, raspberry group device and readable storage medium
US20240071105A1 (en) Cross-modal self-supervised learning for infrastructure analysis
Estrada et al. Object and Traffic Light Recognition Model Development Using Multi-GPU Architecture for Autonomous Bus.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant