CN117456493A - Target detection method, device, computer equipment and storage medium - Google Patents
Target detection method, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN117456493A CN117456493A CN202311423592.3A CN202311423592A CN117456493A CN 117456493 A CN117456493 A CN 117456493A CN 202311423592 A CN202311423592 A CN 202311423592A CN 117456493 A CN117456493 A CN 117456493A
- Authority
- CN
- China
- Prior art keywords
- feature
- features
- image
- point cloud
- standard
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 40
- 230000004927 fusion Effects 0.000 claims abstract description 50
- 238000000034 method Methods 0.000 claims abstract description 44
- 238000004590 computer program Methods 0.000 claims abstract description 24
- 239000011159 matrix material Substances 0.000 claims description 39
- 238000012545 processing Methods 0.000 claims description 23
- 238000000605 extraction Methods 0.000 claims description 7
- 230000007246 mechanism Effects 0.000 claims description 7
- 238000004364 calculation method Methods 0.000 abstract description 8
- 230000008447 perception Effects 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 230000016776 visual perception Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000009466 transformation Effects 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- 238000004422 calculation algorithm Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 238000006243 chemical reaction Methods 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- OKTJSMMVPCPJKN-UHFFFAOYSA-N Carbon Chemical compound [C] OKTJSMMVPCPJKN-UHFFFAOYSA-N 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 229910021389 graphene Inorganic materials 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000011897 real-time detection Methods 0.000 description 1
- 238000011179 visual inspection Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/40—Extraction of image or video features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
- G06V10/80—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
- G06V10/806—Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/82—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/50—Context or environment of the image
- G06V20/56—Context or environment of the image exterior to a vehicle by using sensors mounted on the vehicle
- G06V20/58—Recognition of moving objects or obstacles, e.g. vehicles or pedestrians; Recognition of traffic objects, e.g. traffic signs, traffic lights or roads
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/60—Type of objects
- G06V20/64—Three-dimensional objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V2201/00—Indexing scheme relating to image or video recognition or understanding
- G06V2201/07—Target detection
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Physics & Mathematics (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
Abstract
The present application relates to a target detection method, apparatus, computer device, storage medium and computer program product. The method comprises the following steps: acquiring an image to be detected and point cloud data with a corresponding relation; extracting features of the image to be detected to obtain a plurality of image features; extracting features of the point cloud data to obtain a plurality of point cloud features; determining standard features corresponding to the image features, and determining standard features corresponding to the cloud features; aiming at each standard feature, carrying out feature fusion on the image feature corresponding to the current standard feature and the point cloud feature corresponding to the current standard feature to obtain a fusion feature; and carrying out target detection on the image to be detected based on the fusion characteristics. By adopting the method, the feature fusion calculation mode can be simplified, and the accuracy of fusion results can be improved.
Description
Technical Field
The present application relates to the field of computer technology, and in particular, to a target detection method, apparatus, computer device, storage medium, and computer program product.
Background
In recent years, the visual perception technology based on deep learning is rapidly developed, and the method for detecting the target in the image based on the visual, laser point cloud and three-dimensional (three dimensional, 3D) target detection technology is endless, so that the application of the automatic driving technology is greatly promoted.
In the existing visual inspection method, a post-fusion strategy is mostly adopted, after a visual perception result and a laser radar perception result are output, the visual perception result and the laser radar perception result are fused through manually making the fusion strategy, and then target detection is carried out based on the fusion result. The method can obtain the fusion result only by the visual perception stage, the laser radar perception stage, the visual perception result and the laser radar perception result and the complex calculation based on the three stages of the post fusion stage of the manual rule, and the method is complex in calculation and low in accuracy of the fusion result obtained by manually formulating the rule.
Disclosure of Invention
In view of the foregoing, it is desirable to provide a target detection method, apparatus, computer device, computer-readable storage medium, and computer program product that can simplify the feature fusion calculation method and improve the accuracy of the fusion result.
In a first aspect, the present application further provides a target detection method, the method including:
acquiring an image to be detected and point cloud data with a corresponding relation;
extracting features of the image to be detected to obtain a plurality of image features; extracting features of the point cloud data to obtain a plurality of point cloud features;
determining standard features corresponding to the image features, and determining standard features corresponding to the cloud features;
aiming at each standard feature, carrying out feature fusion on the image feature corresponding to the current standard feature and the point cloud feature corresponding to the current standard feature to obtain a fusion feature;
and carrying out target detection on the image to be detected based on the fusion characteristics.
In one embodiment, the determining standard features corresponding to each image feature includes:
determining equipment parameters of acquisition equipment of the image to be detected;
for each image feature, determining a feature matrix corresponding to the image feature based on the equipment parameter; and determining three-dimensional space features corresponding to the feature matrix from the three-dimensional space feature set, and taking the three-dimensional space features as standard features corresponding to the image features.
In one embodiment, the determining the standard feature corresponding to the cloud feature of each point includes:
and determining the spatial position information contained in the point cloud features aiming at the point cloud features, determining the spatial position information contained in each three-dimensional spatial feature in the three-dimensional spatial feature set, and taking the three-dimensional spatial feature which is the same as the spatial position information of the point cloud features as a standard feature corresponding to the point cloud features.
In one embodiment, the feature fusion of the image feature corresponding to the current standard feature and the point cloud feature corresponding to the current standard feature to obtain a fused feature includes:
determining image feature weights of the image features corresponding to the current standard features through an attention mechanism, and determining point cloud feature weights of the point cloud features corresponding to the current standard features;
performing weighted summation processing based on the image feature weight, the point cloud feature weight, the image feature corresponding to the current standard feature and the point cloud feature corresponding to the current standard feature;
and determining the fusion characteristic based on the processing result.
In one embodiment, the performing weighted summation processing based on the image feature weight, the point cloud feature weight, the image feature corresponding to the current standard feature, and the point cloud feature corresponding to the current standard feature includes:
adjusting the image features corresponding to the current standard features and the feature matrix of the point cloud features corresponding to the current standard features to feature matrices with the same size;
and carrying out weighted summation processing based on the feature matrix of the image feature corresponding to the adjusted current standard feature, the feature matrix of the point cloud feature corresponding to the adjusted current standard feature, the image feature weight and the point cloud feature weight.
In one embodiment, after the determining the standard feature corresponding to each image feature and determining the standard feature corresponding to each cloud feature, the method further includes:
establishing a first corresponding relation between the image features and the standard features and a second corresponding relation between the point cloud features and the standard features;
and determining image features corresponding to the current standard features based on the first corresponding relation, and determining point cloud features corresponding to the current standard features based on the second corresponding relation.
In a second aspect, the present application further provides an object detection apparatus, the apparatus comprising:
the acquisition module is used for acquiring the image to be detected and the point cloud data with the corresponding relation;
the feature extraction module is used for extracting features of the image to be detected to obtain a plurality of image features; extracting features of the point cloud data to obtain a plurality of point cloud features;
the first determining module is used for determining standard features corresponding to the image features and determining standard features corresponding to the cloud features;
the feature fusion module is used for carrying out feature fusion on the image features corresponding to the current standard features and the point cloud features corresponding to the current standard features aiming at each standard feature to obtain fusion features;
and the target detection module is used for carrying out target detection on the image to be detected based on the fusion characteristics.
In a third aspect, the present application further provides a computer device comprising a memory storing a computer program and a processor executing the steps of the method embodiments described above.
In a fourth aspect, the present application also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the method embodiments described above.
In a fifth aspect, the present application also provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
The target detection method, the target detection device, the computer equipment, the storage medium and the computer program product are characterized in that after the image to be detected and the point cloud data with the corresponding relation are acquired, the image to be detected is subjected to feature extraction, and a plurality of image features are obtained; the method comprises the steps of extracting characteristics of point cloud data to obtain a plurality of point cloud characteristics, determining standard characteristics corresponding to each image characteristic, and determining standard characteristics corresponding to each point cloud characteristic, so that the image characteristics and the point cloud characteristics can be mapped to the standard characteristics, the image characteristics corresponding to the current standard characteristics and the point cloud characteristics corresponding to the current standard characteristics are fused by aiming at each standard characteristic, the image characteristics corresponding to the same standard characteristics and the point cloud characteristics can be fused to obtain fusion characteristics, and therefore target detection is carried out on an image to be detected based on the fusion characteristics.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings that are required to be used in the embodiments or the related technical descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to the drawings without inventive effort for a person having ordinary skill in the art.
FIG. 1 is a flow chart of a method of detecting targets in one embodiment;
FIG. 2 is a flow chart of a method for detecting an object according to another embodiment;
FIG. 3 is a flow chart of a method for detecting targets according to another embodiment;
FIG. 4 is a schematic diagram of a target detection apparatus according to another embodiment;
fig. 5 is an internal structural diagram of a computer device in one embodiment.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.
In one embodiment, as shown in fig. 1, a target detection method is provided, where the method is applied to a terminal to illustrate the method, it is understood that the method may also be applied to a server, and may also be applied to a system including the terminal and the server, and implemented through interaction between the terminal and the server. In this embodiment, the method includes the steps of:
step S101, obtaining an image to be detected and point cloud data with a corresponding relation;
acquiring a combined data set, wherein the combined data set comprises images to be detected and point cloud data, and the images to be detected and the point cloud data are required to be subjected to space-time, namely the images to be detected and the point cloud data at the same moment correspond to each other, and the images to be detected and the point cloud data at the same moment can correspond to each other.
The required a plurality of vehicle-mounted cameras are arranged around the automobile body to acquire images to be detected, and the mounting positions of the vehicle-mounted cameras are respectively as follows: front view, left front, right front, left rear, right rear, rear view. Meanwhile, a laser radar is required to collect the point cloud data around the vehicle, the laser radar is arranged on the roof of the vehicle, and the wire harness or the equivalent wire harness is not lower than 120 wires.
Step S102, extracting features of an image to be detected to obtain a plurality of image features; extracting characteristics of the point cloud data to obtain a plurality of point cloud characteristics;
carrying out preprocessing such as geometric translation, rotation, brightness transformation, mixed enhancement (mixup) and the like on an image to be detected by utilizing an image enhancement technology, enriching data set samples, and improving model robustness;
the preprocessed image to be detected is input into an image processing model to perform feature extraction to obtain a plurality of image features, the image processing model comprises an efficiency network (efficientnet), and the efficiency network can extract more time sequence information features and spatial information features compared with a common convolutional neural network.
Inputting the point cloud data into a point cloud processing model to perform feature extraction to obtain a plurality of point cloud features, wherein the point cloud processing model comprises a point cloud network (point);
in this embodiment, the point cloud data may be processed into a pseudo image, that is, three-dimensional data is compressed into two-dimensional data, but spatial information is not lost, for example, an image specification is generally x×y×3 and corresponds to a 3-channel image of Red Green Blue (RGB), x and y are the width and height of the image, but the pseudo image of the point cloud data is x×y×n, and the N channels are greater than three, and include both the point cloud data and spatial position information, so that the pseudo image is called as a pseudo image, and after the feature is extracted based on the pseudo image, the extracted feature actually includes the spatial information of the point cloud feature.
Step S103, determining standard features corresponding to the image features, and determining standard features corresponding to the cloud features;
the method comprises the steps that each image feature is projected to a standard feature space, standard features corresponding to each image feature can be found out from the standard feature space, meanwhile, each point cloud feature is projected to the standard feature space, and standard features corresponding to each point cloud feature can be found out from the standard feature space;
in this embodiment, one standard feature may correspond to one or more image features, and one standard feature may also correspond to one or more point cloud features.
Step S104, aiming at each standard feature, carrying out feature fusion on the image feature corresponding to the current standard feature and the point cloud feature corresponding to the current standard feature to obtain a fusion feature;
for each standard feature, determining one or more image features corresponding to the current standard feature, determining one or more point cloud features corresponding to the current standard feature, and fusing the determined one or more image features and the one or more point cloud features to obtain a fused feature.
Step S105, performing target detection on the image to be detected based on the fusion characteristics.
And outputting the fusion characteristics to a target detection head, detecting the type and the space information of the target, and outputting a detection result.
After obtaining an image to be detected and point cloud data with a corresponding relation, extracting features of the image to be detected to obtain a plurality of image features; the method comprises the steps of extracting characteristics of point cloud data to obtain a plurality of point cloud characteristics, determining standard characteristics corresponding to each image characteristic, and determining standard characteristics corresponding to each point cloud characteristic, so that the image characteristics and the point cloud characteristics can be mapped to the standard characteristics, the image characteristics corresponding to the current standard characteristics and the point cloud characteristics corresponding to the current standard characteristics are fused by aiming at each standard characteristic, the image characteristics corresponding to the same standard characteristics and the point cloud characteristics can be fused to obtain fusion characteristics, and therefore target detection is carried out on an image to be detected based on the fusion characteristics.
In one embodiment, the data set includes: the object type (such as vehicles, pedestrians and the like), whether the object is blocked by other objects or whether the object exceeds the boundary of an image to be detected, the object orientation angle, the three-dimensional boundary frame coordinates of the object, the three-dimensional object size, the three-dimensional object space position coordinates, and the three-dimensional object orientation, namely the included angle between the object advancing direction and the x axis of a camera coordinate system.
The data in the data set takes the center point of the vehicle as a main coordinate system, and a transformation matrix of the data of other cameras and point cloud needs to be given, so that the subsequent coordinate transformation is convenient. Each image to be detected needs to be marked with a current time stamp, and all cameras need to acquire the image to be detected under the same time stamp each time, so that the synchronization of the cameras is ensured.
In one embodiment, the conversion between the image features and the standard features is specifically implemented as follows:
the determining standard features corresponding to the image features comprises the following steps: determining equipment parameters of acquisition equipment of the image to be detected; for each image feature, determining a feature matrix corresponding to the image feature based on the equipment parameter; and determining three-dimensional space features corresponding to the feature matrix from the three-dimensional space feature set, and taking the three-dimensional space features as standard features corresponding to the image features.
The method comprises the steps of determining equipment parameters of the acquisition equipment of the image to be detected, wherein the acquisition equipment can be a camera, the equipment parameters comprise an external reference matrix and an internal reference matrix of the camera, a translation matrix and a rotation matrix of the image feature are obtained through calculation based on the external reference matrix and the internal reference matrix, and the three-dimensional space feature corresponding to the image feature is determined from a three-dimensional (3D) space feature set based on the translation matrix and the rotation matrix, so that projection of the image feature to the three-dimensional space feature is established.
The standard feature space can be a 3D sensing space, the three-dimensional space feature set is a 3D sensing space feature set of the 3D sensing space, the 3D sensing space feature set comprises circular features of a plurality of feature points, and the image features in each image to be detected can find out corresponding 3D sensing space features in the 3D sensing space features, and the standard features are the 3D sensing space features.
In one embodiment, the conversion between the point cloud feature and the standard feature is specifically implemented as follows:
the determining the standard feature corresponding to each point cloud feature comprises the following steps:
and determining the spatial position information contained in the point cloud features aiming at the point cloud features, determining the spatial position information contained in each three-dimensional spatial feature in the three-dimensional spatial feature set, and taking the three-dimensional spatial feature which is the same as the spatial position information of the point cloud features as a standard feature corresponding to the point cloud features.
The point cloud features themselves comprise spatial position information, the spatial position information in each three-dimensional spatial feature in the three-dimensional spatial feature set is also contained, if the spatial position information in the point cloud features is the same as the spatial position information in a certain three-dimensional spatial feature, the point cloud features are determined to correspond to the three-dimensional spatial features, and the three-dimensional spatial features are standard features corresponding to the point cloud features.
In one embodiment, referring to fig. 2, step S104 of feature-fusing the image feature corresponding to the current standard feature and the point cloud feature corresponding to the current standard feature to obtain a fused feature, including: s204, determining the image feature weight of the image feature corresponding to the current standard feature through an attention mechanism, and determining the point cloud feature weight of the point cloud feature corresponding to the current standard feature; s205, carrying out weighted summation processing based on the image feature weight, the point cloud feature weight, the image feature corresponding to the current standard feature and the point cloud feature corresponding to the current standard feature; s206, determining fusion characteristics based on the processing result.
In this embodiment, the two-dimensional image features and the three-dimensional point cloud features are fused into the three-dimensional perception space using a deformable attention mechanism.
Each image feature corresponding to the current standard feature has a corresponding image feature weight, and the image feature weights corresponding to each image feature can be different; each point cloud feature corresponding to the current standard feature has a corresponding point cloud feature weight, and the point cloud feature weights corresponding to each point cloud feature may be different.
For the current three-dimensional space feature, the number of corresponding image features and point cloud features is multiple, and assuming that there are 4 in total, four feature points (x 1, x2, x3 and x4 respectively) are calculated through a deformable attention mechanism, and weights corresponding to the four feature points are 0.2, 0.3 and 0.3 respectively, and weighted summation is performed: 0.2 x1+0.2 x2+0.3 x3+0.3 x4, resulting in a summation result, whereby the fusion characteristics are determined based on the summation result.
In one embodiment, the performing weighted summation processing based on the image feature weight, the point cloud feature weight, the image feature corresponding to the current standard feature, and the point cloud feature corresponding to the current standard feature includes: adjusting the image features corresponding to the current standard features and the feature matrix of the point cloud features corresponding to the current standard features to feature matrices with the same size; and carrying out weighted summation processing based on the feature matrix of the image feature corresponding to the adjusted current standard feature, the feature matrix of the point cloud feature corresponding to the adjusted current standard feature, the image feature weight and the point cloud feature weight.
In this embodiment, two-dimensional image features corresponding to current standard features form an image feature matrix, and the image feature matrix is adjusted to a feature matrix with a preset size through convolution operation; and forming a point cloud characteristic matrix by three-dimensional point cloud characteristics corresponding to the current standard characteristics, and adjusting the point cloud characteristic matrix into a characteristic matrix with a preset size through convolution operation. For example, feature matrices each adjusted to 328×328×128;
the feature matrix of the 2-dimensional image features and the feature matrix of the 3-dimensional point cloud features are adjusted to be the same size through convolution, and then the following calculation mode is adopted: and carrying out weighted summation on the characteristic matrix of the image characteristic corresponding to the adjusted current standard characteristic and the characteristic matrix of the point cloud characteristic corresponding to the adjusted current standard characteristic to obtain a fusion characteristic.
In one embodiment, after the determining the standard feature corresponding to each image feature and determining the standard feature corresponding to each cloud feature, the method further includes: establishing a first corresponding relation between the image features and the standard features and a second corresponding relation between the point cloud features and the standard features; and determining image features corresponding to the current standard features based on the first corresponding relation, and determining point cloud features corresponding to the current standard features based on the second corresponding relation.
The terminal may also calculate projection indexes of the 2D image features and the 3D point cloud features to the 3D perceived spatial features, and store the projection indexes as a lookup table, specifically: after the terminal determines the first corresponding relation between the image features and the 3D perceived spatial features, the corresponding relation is stored as a lookup table, and then a projection index from the 2D image features to the 3D spatial features is established. Subsequently, for each standard feature, the image feature corresponding to the current standard feature can be quickly retrieved based on the first corresponding relation;
meanwhile, the terminal establishes a second corresponding relation between the point cloud features and the standard features, and then for each standard feature, the point cloud features corresponding to the current standard feature can be quickly searched based on the second corresponding relation;
in this embodiment, the feature fusion time is greatly reduced by pre-calculating the projection indexes of the 2D image features and the 3D point cloud features to the 3D perception space features.
In one embodiment, before determining the standard feature corresponding to each image feature and determining the standard feature corresponding to each cloud feature, the method further includes:
a parameter-learnable 3D perceptual space is constructed. The distance between each feature point in the 3D sensing space and the actual space is 1m, the size of the feature space can be customized, but the size of the feature space is not excessively large because the excessively large feature space can cause the exponential increase of the calculated amount, the default is set to be 100 x 100, and the center point of the feature space is the center point of the vehicle.
The 3D perception space is constructed, so that standard features corresponding to all image features can be determined from the 3D perception space, and standard features corresponding to all point cloud features can be determined.
To sum up, the post-fusion sensing strategy has the problems of increased algorithm complexity, calculation cost and time delay, precision loss caused by manually making rules, poor stability of multiple sensors, failure of a single sensor and the like, and for the problems of the post-fusion sensing strategy, referring to fig. 3, the scheme can be disassembled into the following steps:
s1, preparing a joint data set;
s2, constructing a 3D sensing space with a parameter capable of being learned;
s3, extracting features of the image to be detected and the point cloud data;
s4, pre-calculating a projection index of the 2D image features to the three-dimensional space features, and storing the projection index as a lookup table;
s5, fusing the 2D image features and the 3D point cloud features into a 3D sensing space by using a deformable attention mechanism;
s6, outputting the fusion characteristics to a 3D target detection head, predicting the target type and the spatial position information, and outputting a detection result.
In order to implement the algorithm, at least one vehicle-mounted controller is needed for processing the original image and the point cloud data, and performing model reasoning and related operation. In order to ensure the real-time detection, the calculation power of the controller is recommended to be more than 300TOPS, and meanwhile, a graphic processor (gra phicsprocessing unit, GPU) is required to be mounted to accelerate the neural network model reasoning speed.
The method and the device can be used for rapidly outputting the obstacle list around the vehicle, and have higher detection precision and lower detection time delay.
It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.
Based on the same inventive concept, the embodiment of the application also provides an object detection device for realizing the above-mentioned object detection method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation of one or more embodiments of the object detection device provided below may be referred to the limitation of the object detection method hereinabove, and will not be repeated here.
In one exemplary embodiment, as shown in fig. 4, there is provided an object detection apparatus 400 including:
an acquisition module 401, configured to acquire an image to be detected and point cloud data having a corresponding relationship;
the feature extraction module 402 is configured to perform feature extraction on the image to be detected to obtain a plurality of image features; extracting features of the point cloud data to obtain a plurality of point cloud features;
a first determining module 403, configured to determine standard features corresponding to each image feature, and determine standard features corresponding to each cloud feature;
the feature fusion module 404 is configured to perform feature fusion on the image feature corresponding to the current standard feature and the point cloud feature corresponding to the current standard feature for each standard feature, so as to obtain a fusion feature;
and the target detection module 405 is configured to perform target detection on the image to be detected based on the fusion feature.
In one embodiment, the first determining module 403 is specifically configured to, when determining standard features corresponding to each image feature:
determining equipment parameters of acquisition equipment of the image to be detected;
for each image feature, determining a feature matrix corresponding to the image feature based on the equipment parameter; and determining three-dimensional space features corresponding to the feature matrix from the three-dimensional space feature set, and taking the three-dimensional space features as standard features corresponding to the image features.
In one embodiment, the first determining module 403 is specifically configured to, when determining the standard feature corresponding to the cloud feature of each point:
and determining the spatial position information contained in the point cloud features aiming at the point cloud features, determining the spatial position information contained in each three-dimensional spatial feature in the three-dimensional spatial feature set, and taking the three-dimensional spatial feature which is the same as the spatial position information of the point cloud features as a standard feature corresponding to the point cloud features.
In one embodiment, the feature fusion module 404 is specifically configured to, when feature fusion is performed on the image feature corresponding to the current standard feature and the point cloud feature corresponding to the current standard feature:
determining image feature weights of the image features corresponding to the current standard features through an attention mechanism, and determining point cloud feature weights of the point cloud features corresponding to the current standard features;
performing weighted summation processing based on the image feature weight, the point cloud feature weight, the image feature corresponding to the current standard feature and the point cloud feature corresponding to the current standard feature;
and determining the fusion characteristic based on the processing result.
In one embodiment, the feature fusion module 404 performs weighted summation processing based on the image feature weight, the point cloud feature weight, the image feature corresponding to the current standard feature, and the point cloud feature corresponding to the current standard feature, specifically:
adjusting the image features corresponding to the current standard features and the feature matrix of the point cloud features corresponding to the current standard features to feature matrices with the same size;
and carrying out weighted summation processing based on the feature matrix of the image feature corresponding to the adjusted current standard feature, the feature matrix of the point cloud feature corresponding to the adjusted current standard feature, the image feature weight and the point cloud feature weight.
In one embodiment, after determining the standard feature corresponding to each image feature and determining the standard feature corresponding to each cloud feature, the apparatus further includes:
the corresponding relation establishing module is used for establishing a first corresponding relation between the image characteristics and the standard characteristics and a second corresponding relation between the point cloud characteristics and the standard characteristics;
and the second determining module is used for determining the image characteristics corresponding to the current standard characteristics based on the first corresponding relation and determining the point cloud characteristics corresponding to the current standard characteristics based on the second corresponding relation.
The respective modules in the above-described object detection apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one exemplary embodiment, a computer device is provided, which may be a terminal, and an internal structure diagram thereof may be as shown in fig. 5. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a method of object detection. The display unit of the computer device is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.
It will be appreciated by those skilled in the art that the structure shown in fig. 5 is merely a block diagram of some of the structures associated with the present application and is not limiting of the computer device to which the present application may be applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.
In an exemplary embodiment, a computer device is provided, comprising a memory and a processor, the memory having stored therein a computer program, the processor performing the steps of the method embodiments described above when the computer program is executed.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, implements the steps of the method embodiments described above.
In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.
It should be noted that, the user information (including, but not limited to, user equipment information, user personal information, etc.) and the data (including, but not limited to, data for analysis, stored data, presented data, etc.) referred to in the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use, and processing of the related data are required to meet the related regulations.
Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in the various embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the various embodiments provided herein may include at least one of relational databases and non-relational databases. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processors referred to in the embodiments provided herein may be general purpose processors, central processing units, graphics processors, digital signal processors, programmable logic units, quantum computing-based data processing logic units, etc., without being limited thereto.
The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The above examples only represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the present application. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application shall be subject to the appended claims.
Claims (10)
1. A method of target detection, the method comprising:
acquiring an image to be detected and point cloud data with a corresponding relation;
extracting features of the image to be detected to obtain a plurality of image features; extracting features of the point cloud data to obtain a plurality of point cloud features;
determining standard features corresponding to the image features, and determining standard features corresponding to the cloud features;
aiming at each standard feature, carrying out feature fusion on the image feature corresponding to the current standard feature and the point cloud feature corresponding to the current standard feature to obtain a fusion feature;
and carrying out target detection on the image to be detected based on the fusion characteristics.
2. The method of claim 1, wherein determining standard features corresponding to each image feature comprises:
determining equipment parameters of acquisition equipment of the image to be detected;
for each image feature, determining a feature matrix corresponding to the image feature based on the equipment parameter; and determining three-dimensional space features corresponding to the feature matrix from the three-dimensional space feature set, and taking the three-dimensional space features as standard features corresponding to the image features.
3. The method of claim 1, wherein determining standard features corresponding to each point cloud feature comprises:
and determining the spatial position information contained in the point cloud features aiming at the point cloud features, determining the spatial position information contained in each three-dimensional spatial feature in the three-dimensional spatial feature set, and taking the three-dimensional spatial feature which is the same as the spatial position information of the point cloud features as a standard feature corresponding to the point cloud features.
4. The method according to claim 1, wherein the feature fusing the image feature corresponding to the current standard feature and the point cloud feature corresponding to the current standard feature to obtain the fused feature includes:
determining image feature weights of the image features corresponding to the current standard features through an attention mechanism, and determining point cloud feature weights of the point cloud features corresponding to the current standard features;
performing weighted summation processing based on the image feature weight, the point cloud feature weight, the image feature corresponding to the current standard feature and the point cloud feature corresponding to the current standard feature;
and determining the fusion characteristic based on the processing result.
5. The method of claim 4, wherein the performing a weighted summation process based on the image feature weights, the point cloud feature weights, the image features corresponding to the current standard feature, and the point cloud features corresponding to the current standard feature comprises:
adjusting the image features corresponding to the current standard features and the feature matrix of the point cloud features corresponding to the current standard features to feature matrices with the same size;
and carrying out weighted summation processing based on the feature matrix of the image feature corresponding to the adjusted current standard feature, the feature matrix of the point cloud feature corresponding to the adjusted current standard feature, the image feature weight and the point cloud feature weight.
6. The method of claim 1, wherein after determining the standard features corresponding to each image feature and determining the standard features corresponding to each point cloud feature, the method further comprises:
establishing a first corresponding relation between the image features and the standard features and a second corresponding relation between the point cloud features and the standard features;
and determining image features corresponding to the current standard features based on the first corresponding relation, and determining point cloud features corresponding to the current standard features based on the second corresponding relation.
7. An object detection device, the device comprising:
the acquisition module is used for acquiring the image to be detected and the point cloud data with the corresponding relation;
the feature extraction module is used for extracting features of the image to be detected to obtain a plurality of image features; extracting features of the point cloud data to obtain a plurality of point cloud features;
the first determining module is used for determining standard features corresponding to the image features and determining standard features corresponding to the cloud features;
the feature fusion module is used for carrying out feature fusion on the image features corresponding to the current standard features and the point cloud features corresponding to the current standard features aiming at each standard feature to obtain fusion features;
and the target detection module is used for carrying out target detection on the image to be detected based on the fusion characteristics.
8. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 6 when the computer program is executed.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
10. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311423592.3A CN117456493A (en) | 2023-10-30 | 2023-10-30 | Target detection method, device, computer equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311423592.3A CN117456493A (en) | 2023-10-30 | 2023-10-30 | Target detection method, device, computer equipment and storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117456493A true CN117456493A (en) | 2024-01-26 |
Family
ID=89586980
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311423592.3A Pending CN117456493A (en) | 2023-10-30 | 2023-10-30 | Target detection method, device, computer equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117456493A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118608913A (en) * | 2024-08-08 | 2024-09-06 | 浙江吉利控股集团有限公司 | Feature fusion method, device, apparatus, medium and program product |
-
2023
- 2023-10-30 CN CN202311423592.3A patent/CN117456493A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118608913A (en) * | 2024-08-08 | 2024-09-06 | 浙江吉利控股集团有限公司 | Feature fusion method, device, apparatus, medium and program product |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109508681B (en) | Method and device for generating human body key point detection model | |
CN105894551B (en) | Image drawing method and device | |
CN109003297A (en) | A kind of monocular depth estimation method, device, terminal and storage medium | |
CN117456493A (en) | Target detection method, device, computer equipment and storage medium | |
CN116740668B (en) | Three-dimensional object detection method, three-dimensional object detection device, computer equipment and storage medium | |
CN116740669B (en) | Multi-view image detection method, device, computer equipment and storage medium | |
CN114219855A (en) | Point cloud normal vector estimation method and device, computer equipment and storage medium | |
CN113240736A (en) | Pose estimation method and device based on YOLO6D improved network | |
CN115240168A (en) | Perception result obtaining method and device, computer equipment and storage medium | |
CN112991429B (en) | Box volume measuring method, device, computer equipment and storage medium | |
CN117197388A (en) | Live-action three-dimensional virtual reality scene construction method and system based on generation of antagonistic neural network and oblique photography | |
CN115457354A (en) | Fusion method, 3D target detection method, vehicle-mounted device and storage medium | |
JP7432793B1 (en) | Mapping methods, devices, chips and module devices based on three-dimensional point clouds | |
CN116012805B (en) | Target perception method, device, computer equipment and storage medium | |
CN116912791A (en) | Target detection method, device, computer equipment and storage medium | |
WO2024032101A1 (en) | Feature map generation method and apparatus, storage medium, and computer device | |
CN116758206A (en) | Vector data fusion rendering method and device, computer equipment and storage medium | |
CN116092035A (en) | Lane line detection method, lane line detection device, computer equipment and storage medium | |
CN116883770A (en) | Training method and device of depth estimation model, electronic equipment and storage medium | |
CN116758517B (en) | Three-dimensional target detection method and device based on multi-view image and computer equipment | |
CN115861316B (en) | Training method and device for pedestrian detection model and pedestrian detection method | |
CN117876669B (en) | Target detection method, device, computer equipment and storage medium | |
CN117576645B (en) | Parking space detection method and device based on BEV visual angle and computer equipment | |
CN116468868B (en) | Traffic signal lamp graph building method, device, equipment and storage medium | |
CN115527161B (en) | Abnormal discharge monitoring method, device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |