CN114550160A - Automobile identification method based on three-dimensional point cloud data and traffic scene - Google Patents
Automobile identification method based on three-dimensional point cloud data and traffic scene Download PDFInfo
- Publication number
- CN114550160A CN114550160A CN202111358810.0A CN202111358810A CN114550160A CN 114550160 A CN114550160 A CN 114550160A CN 202111358810 A CN202111358810 A CN 202111358810A CN 114550160 A CN114550160 A CN 114550160A
- Authority
- CN
- China
- Prior art keywords
- point cloud
- network
- column
- cloud data
- pseudo
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 238000001514 detection method Methods 0.000 claims abstract description 60
- 238000005070 sampling Methods 0.000 claims description 36
- 230000006870 function Effects 0.000 claims description 20
- 238000000605 extraction Methods 0.000 claims description 19
- 239000011159 matrix material Substances 0.000 claims description 12
- 238000010606 normalization Methods 0.000 claims description 10
- 230000007246 mechanism Effects 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 8
- 230000006835 compression Effects 0.000 claims description 6
- 238000007906 compression Methods 0.000 claims description 6
- 230000002708 enhancing effect Effects 0.000 claims description 5
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 3
- YGULWPYYGQCFMP-CEAXSRTFSA-N Metoprolol tartrate Chemical compound OC(=O)[C@H](O)[C@@H](O)C(O)=O.COCCC1=CC=C(OCC(O)CNC(C)C)C=C1.COCCC1=CC=C(OCC(O)CNC(C)C)C=C1 YGULWPYYGQCFMP-CEAXSRTFSA-N 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 230000007423 decrease Effects 0.000 claims description 3
- 230000005764 inhibitory process Effects 0.000 claims description 3
- 230000004807 localization Effects 0.000 claims description 3
- 238000011176 pooling Methods 0.000 claims description 3
- 238000002310 reflectometry Methods 0.000 claims description 3
- 238000012549 training Methods 0.000 claims description 3
- 239000013598 vector Substances 0.000 claims description 3
- 239000002253 acid Substances 0.000 claims description 2
- 238000010586 diagram Methods 0.000 description 7
- 230000006399 behavior Effects 0.000 description 4
- 230000033001 locomotion Effects 0.000 description 4
- 230000004913 activation Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- BLRPTPMANUNPDV-UHFFFAOYSA-N Silane Chemical compound [SiH4] BLRPTPMANUNPDV-UHFFFAOYSA-N 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013075 data extraction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 239000000203 mixture Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Molecular Biology (AREA)
- Artificial Intelligence (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
- Traffic Control Systems (AREA)
Abstract
The invention provides an automobile identification method based on three-dimensional point cloud data and a traffic scene. The invention can realize the identification of the traffic scene based on the point cloud data set through the three characteristic modules, and effectively improve the detection efficiency and the detection precision of the vehicle.
Description
Technical Field
The invention relates to the field of traffic detection, in particular to the field of vehicle and pedestrian detection, and particularly relates to an automobile identification method based on three-dimensional point cloud data and a traffic scene.
Background
With the continuous development of artificial intelligence, sensors and control theory, the automatic driving draws wide attention in the academic and industrial fields, and has bright application prospect. During automatic driving of a vehicle, detection and behavior prediction of surrounding objects such as vehicles and pedestrians are required. At present, a target detection method using a two-dimensional RGB image cannot accurately identify information such as space, position, depth and angle of an opposite vehicle, so that driving movement of a vehicle cannot be planned and controlled only by simple target azimuth information. The invention adopts three-dimensional point cloud data different from the traditional two-dimensional RGB image data, points in the point cloud data all contain characteristic information such as the position, distance, angle and the like of a target object, and the data composition is more in line with the actual situation of the real world than the two-dimensional RGB image. The data used by the three-dimensional point cloud data is mainly generated by a lidar (light Detection And ranging) sensor. LiDAR is also known as optical radar. The main working principle is realized by receiving laser beam reflected light emitted by a radar sensor. The method has the advantages of long distance measurement, high precision, high reliability and the like, and is widely applied to the field of vehicle-mounted automatic driving. Currently, manufacturers of LiDAR include Velodyne, IBEO, Quanergy, Silan technology and other companies, and Velodyne is the most well known in the industry.
By using computer vision technology, researchers can extract the outline and shape information of vehicles and pedestrians to detect targets. For example, CN111507340A discloses a method for extracting target point cloud data based on three-dimensional point cloud data, which includes: acquiring original three-dimensional point cloud data, and performing denoising processing on the original three-dimensional point cloud data to obtain denoised three-dimensional point cloud data; extracting intensity image data from the de-noised three-dimensional point cloud data; calling a preset target extraction algorithm to perform target extraction processing on the intensity image data to obtain target intensity image data; extracting target three-dimensional point cloud data from the original three-dimensional point cloud data according to the pixel coordinate value of each pixel in the target intensity image data; and calling a preset point cloud denoising algorithm to denoise the target three-dimensional point cloud data to obtain target point cloud data. Although three-dimensional point cloud data is utilized, behavior features and distance features after data extraction are not further fused and feature dimension reduction is not performed, and the detection precision in a traffic scene cannot meet the use requirement.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: in order to overcome the defects of the prior art, the invention provides the automobile identification method based on the three-dimensional point cloud data and the traffic scene, which can comprehensively judge whether vehicles exist in a target area or not by combining the position, the outline and the shape of the point cloud in the space and the surrounding scene information, effectively improve the detection precision and the detection effect and provide accurate surrounding driving environment information for an automatic driving system.
The technical scheme adopted by the invention for solving the technical problems is as follows: a car identification method based on three-dimensional point cloud data and a traffic scene comprises the following steps: 1) a multi-resolution column-by-column feature extraction network; 2) a spatial attention-based convolution detection framework; 3) a compression-activated attention based detection head.
Further, the multi-resolution-based column-by-column feature extraction network comprises point cloud data processing, column feature extraction and pseudo-image feature extraction which are sequentially carried out.
Further, the point cloud data processing specifically includes that a point I in the point cloud data is uniquely represented by four dimensions of x, y, z and r; uniformly dividing points in the point cloud into grids based on an x-y plane, wherein the grids form a group of column sets and are set as columns p, and the columns p have no height limitation on a z axis; enhancing the four-dimensional characteristics of the original input points x, y, z and r in the column into x, y, z, r and xc、yc、zc、xp、ypA nine dimensional feature where r is the point I reflectivity, the c subscript represents the arithmetic mean distance to all points I in the bar, and the p subscript represents the deviation from the center of the bar x, bar y.
Further, the extraction of the column features specifically comprises extracting the features in the column P by using a point cloud network for the point I in each column, and adopting high, medium and lowAcquiring column characteristics at three different resolutions, wherein the three resolutions respectively control the sparsity D by limiting the number P of non-empty columns of each sample and the total number N of points I in each column to generate a scale T belonging to RD×P×NThe density tensor of (a); extracting the characteristics of each point I in the column P by adopting a point cloud network, enabling each point in the column P to pass through a linear layer, a batch normalization layer and a ReLU layer respectively, and outputting the value Z belonging to RC×P×NA tensor; the features are combined and stacked according to the position of the original column to form the size of the S ∈ RC×H×WWherein the three resolutions of high, medium and low respectively generate corresponding pseudo-graphs SH、SM、SL。
Preferably, the fixed size of the retention frame tensor T is 10000; if the data within the collected sample or in the column is less than 10000, then the tensor T to 10000 is filled by using zero padding. With 10000 as a threshold, if sufficient means more than 10000 data points, the data is kept at 10000 by a random sampling method, and if too little means less than 10000, then the data needs to be supplemented to 10000 by data gain.
Further, the pseudo-image feature extraction comprises sequentially extracting the pseudo-image S by using a convolution operation for down-sampling and a deconvolution operation for up-samplingH、SM、SLThe medium vehicle characteristic information, after up-sampling and down-sampling, comprises a batch normalization layer and a ReLU layer, and a pseudo-map S obtained by up-samplingH、SM、SLThe characteristic information is combined to generate a new point cloud pseudo-map S.
Further, the spatial attention-based convolution detection framework comprises: 1) respectively extracting pseudo-map features by using 1C, 2C and 4C channels; 2) spatial information features are enhanced using a spatial attention mechanism.
Further, the characteristics of the pseudo-map extracted by using the 1C, 2C and 4C channels respectively are as follows:
inputting the pseudo-map S into a detection framework by using the area proposal network, wherein the detection framework is a downsampling network Net1And an upsampling network Net2;
Down-sampling network Net1By convolution operations to become smaller and smallerThe spatial resolution 1C, 2C and 4C carry out down-sampling on the feature map, the down-sampling network is represented by a series of (S, L and F) blocks, wherein S represents a step length, F represents the number of output channels, and L represents the number of 3 multiplied by 3 two-dimensional convolutional layer layers, a batch normalization layer and a ReLU layer are connected behind each channel, the first convolution step in each layer is S/S _ in, so that the size of the detection network is kept to be S after the detection network receives the input of the step length S _ in; the subsequent convolution steps in each layer are all 1, and the number of channels in each layer is [64,128,256 ]]Down-sampling networks produce successively smaller spatial resolutions;
upsampling network Net2Performing up-sampling operation on feature maps with different resolutions by deconvolution, and performing up-sampling on network Net2Represented by (S _ in, S _ out, F), where S _ in is the initial step, S _ out is the termination step, and F is the final characteristic; the pseudo graph S respectively generates a pseudo graph characteristic graph F through the up-sampling network and the down-sampling network1、F2、F3。
Further, the method for enhancing the spatial information characteristics by using the spatial attention mechanism comprises the following steps: sending the pseudo-map feature map F generated by the network into a space attention module, and generating two new feature maps G by using the feature map with two 1 multiplied by 1 convolution layers by the space attention module1And G2;
Wherein { G1,G2}∈RC×H×WG is1Conversion to RC×(H×W)Then to G1Transpose and G1Performing matrix multiplication;
the spatial attention matrix W is then calculated using the Softmax functionsa∈R(H×W)×(H×W)The matrix display-encodes spatially salient portions;
by the addition of an acid at G2And WsaPerforming matrix multiplication to generate a feature map;
and finally, combining and outputting the scene target characteristic graphs subjected to spatial attention re-weighting under the three scales.
Further, the compression-activation attention-based detection head re-weights the merged multi-scale feature map using a compression-activation attention mechanism;
when compressing, global average pooling is used to produce channel-by-channel vectors s ∈ RC;
When the module is activated, the module is realized by capturing channel-by-channel dependence;
se = ReLU (W2δ (W1s))
δ () is sigmoid function, ReLU () is ReLU function, W1 ∈ RC/ r ×C、 W2 ∈ RC× C/ r。
The above-mentioned compression-activation attention-based detection head has the following detection algorithm:
the output result of the network of the compression-activation attention detection head is used for target detection by a single-needle multi-box detector, the single-needle multi-box detector network is divided into six modules, the first module consists of five former Conv1, 2, 3, 4 and 5 convolutional layers of VGG16, and the second module is used for converting FC6 and FC7 full connection layers in VGG16 into Conv6 and Conv7 convolutional layers; the remaining four modules are added onv8, Conv9, Conv10 and Conv11 convolutional layers, so that target information under different scales is extracted, and the method finally performs target classification detection and non-maximum inhibition position regression operation.
The above detection algorithm has the following loss function formula:
the bounding box of the real object uses (x, y, z, w, l, h, theta) to represent the three-dimensional center, width, length, height and deflection, respectively, , , ,
Xgtand XaRespectively represent a real target and an anchor point, andwherein the localization loss function is:
because the positioning loss can not distinguish whether the bounding box is turned over, the method is usedLearning bounding box directions in discrete directions;
the classification loss uses a focal loss function as:
the overall loss function is:
The loss function uses an Adam optimizer and the learning rate decreases as the training period increases.
The invention has the advantages that the invention provides an automobile identification method based on three-dimensional point cloud data and traffic scenes,
(1) providing accurate surrounding driving environment information for an automatic driving system based on the point cloud data set;
(2) the detection precision and the detection effect of the invention under the actual driving environment are improved by utilizing a spatial attention mechanism and by multi-resolution joint detection;
(3) the detection result of the single-needle multi-box detector algorithm is improved by the detection head based on the compression-activation attention and by re-weighting the weight among different channels in the space.
Drawings
The invention is further illustrated with reference to the following figures and examples.
FIG. 1 is a system flow chart of the automobile identification method based on three-dimensional point cloud data and traffic scenes.
Fig. 2 is a schematic diagram of a multi-resolution based pillar-by-pillar feature extraction network proposed in the present invention.
Fig. 3 is a schematic diagram of the spatial attention-based convolution detection framework proposed in the present invention.
Fig. 4 is a schematic diagram of a compression-activated attention based detection head as proposed in the present invention.
Detailed Description
The present invention will now be described in further detail with reference to the accompanying drawings. The drawings are simplified schematic diagrams illustrating the basic structure of the present invention only in a schematic manner, and thus show only the constitution related to the present invention, and directions and references (e.g., upper, lower, left, right, etc.) may be used only to help the description of the features in the drawings. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the claimed subject matter is defined only by the appended claims and equivalents thereof.
As shown in fig. 1, a method for identifying an automobile based on three-dimensional point cloud data and a traffic scene comprehensively considers the factors of accurate distance measurement, high precision and more data characteristics of a point cloud data set, and invents a method for detecting vehicles and pedestrians based on the point cloud data set, which comprises the following steps:
1) a multi-resolution column-by-column feature extraction network;
2) a spatial attention-based convolution detection framework;
3) a compression-activated attention based detection head.
The method comprises the steps of detecting through a single-needle multi-box detector, further searching for an interested target by utilizing the motion information of pedestrians, and extracting a motion sequence, a surrounding traffic scene sequence and a track position of the interested target; the invention designs a three-dimensional convolution neural network to process the motion sequence of an interested target and obtain the behavior characteristics related to the intention of pedestrians to cross a road.
According to the invention, two weights are obtained according to the elements of the local traffic scene where the pedestrian is located and the vehicle running speed to correct the distance between the pedestrian and the vehicle, and the corrected distance is sent to the multilayer perceptron to be encoded to obtain the distance characteristic related to the intention of the pedestrian to cross the road.
And finally, performing information fusion on the behavior characteristics and the distance characteristics, reducing the dimension of the fused characteristics by using a full connection layer, and obtaining a result whether the pedestrian passes through the road or not through softmax operation.
Fig. 2 shows a schematic diagram of a multi-resolution based pillar-by-pillar feature extraction network. The different shades and shades of the gray in the figure represent features extracted from different scales, the scale in fig. 2 being the multi-resolution shown at the far left of the figure.
1) Point cloud data processing
According to the processing based on the point cloud data, the interested pedestrians are extracted, so that the time cost of processing non-interested pedestrians by an algorithm is reduced. The invention enhances the point I in the point cloud data into x, y, z, r and x by calculation from x, y, z and rc、yc、zc、xp、yp. Including the point space coordinates, r-reflectivity, the arithmetic mean distance of all points I in the c-bin, and the deviation of p from the x, y center of the bin, respectively.
2) Column feature extraction
And extracting the cloud characteristics of the points in the columns by adopting a point cloud network according to the high resolution, the medium resolution and the low resolution of the points I in each column. Controlling the sparsity of acquisition D by imposing a limit on the number of non-empty columns in each sample and the total number N of points I in each column to yield a scale T ∈ RD×P×NThe density tensor of (2). For data redundancy within the collected samples or in the column, the frame tensor T is kept fixed in size by randomly sampling to retain the data. If too little data is in the collected sample or in the column, the tensor T is expanded through zero padding to maintain the size of the frame tensor T.
The size of the point cloud network output is Z epsilon RC×P×NThe tensors are combined and piled up according to the positions of the original columns to form the size of S epsilon RC×H×WWherein the three resolutions of high, medium and low respectively generate corresponding pseudo-graphs SH、SM、SL。
3) Pseudo-graph feature extraction
The invention is used for the pseudo-graph S with high, medium and low resolutionsH、SM、SLThe method of downsampling using a convolution operation followed by upsampling using a deconvolution operation extracts the pseudo-map SH、SM、SLThe medium vehicle characteristic information, the up-sampling and the down-sampling comprise a batch normalization layer and a ReLU layer. And finally, combining the three feature map information obtained by up-sampling to generate a new point cloud pseudo-map S.
Fig. 3 presents a schematic diagram of a spatial attention-based convolution detection framework. In fig. 3, the different shades of gray in the leftmost stitched bitmap represent the three features resulting from the upsampling.
4) Extracting pseudo-map features using 1C, 2C, 4C channels respectively
The invention realizes the vehicle detection task under the real traffic situation by a detection method of respectively extracting the characteristics through multiple channels. The area proposal network detection framework is entered using the pseudo-graph S. The frame is mainly divided into two parts: lower miningSample networks and upsampling networks. The downsampling network downsamples the feature map with increasingly smaller spatial resolutions (1C, 2C, 4C) by convolution operations. The downsampled network is represented by a series of (S, L, F) blocks. Wherein S represents a step size, F represents the number of output channels, and L represents the number of 3 × 3 two-dimensional convolutional layer layers. And a batch normalization layer and a ReLU layer are connected behind each channel, and the first convolution step in each layer is S/S _ in so as to ensure that the size of the detection network is still kept to be S after the detection network receives the input of the step S _ in. The subsequent convolution steps in each layer are all 1, and the number of channels in each layer is [64,128,256 ]]. The downsampling network may produce successively smaller spatial resolutions; upsampling network Net2And performing an up-sampling operation on the feature maps with different resolutions through deconvolution, wherein an up-sampling network is represented by (S _ in, S _ out, F), S _ in is an initial step, S _ out is a termination step, and F is a final feature. The same as the down sampling network, the up sampling network is also connected with the batch normalization layer and the ReLU layer. The pseudo-graph S respectively generates a pseudo-graph characteristic graph F through an up-sampling network and a down-sampling network1、F2、F3。
5) Enhancing spatial information features using spatial attention mechanism
F is to be1、F2、F3The pseudo-map features are fed into a spatial attention module, which first uses two 1 × 1 convolutional layers to generate two new feature maps G1And G2Wherein { G1,G2}∈RC×H×WG is1Conversion to RC ×(H×W)Then to G1Transpose and G1Matrix multiplication is performed. The spatial attention matrix W is then calculated using the Softmax functionsa∈R(H ×W)×(H×W)The matrix pair may display code the spatially salient portion. Then, by passing through at G2And WsaPerforms matrix multiplication to generate a feature map. And finally, combining and outputting the scene target characteristic graphs subjected to spatial attention re-weighting under the three scales.
Fig. 4 shows a schematic diagram of a detection head based on SE attention. The different shades and shades of gray in fig. 4 represent different features.
6) Detection head based on compression-activation network
The merged multi-scale feature map is re-weighted using a compression-activation attention mechanism, which is implemented primarily through compression and activation operations. In compression operations, global average pooling is used to produce channel-by-channel vectors s ∈ RC. In the activation phase, the module is implemented by capturing channel-by-channel dependencies.
se = ReLU (W2δ (W1s))
δ () is sigmoid function ReLU () is ReLU function. W1 ∈ RC/ r ×C、 W2 ∈ RC× C/ r。
7) Detection algorithm
The invention uses the single-needle multi-box detector to detect the target of the output result of the compression-activation network, and the single-needle multi-box detector method has high detection speed and high detection precision. The method introduces the idea of anchor points, can adapt to multi-scale target detection tasks, and is more in line with the characteristic of larger scale transformation of point cloud data. The single-pin multi-box detector network is mainly divided into six modules, the first module is composed of the first five layers of Conv1, 2, 3, 4 and 5 convolutional layers of VGG16, and then FC6 and FC7 full-connection layers in VGG16 are converted into Conv6 and Conv7 convolutional layers. On the basis, four modules of Conv8, Conv9, Conv10 and Conv11 convolutional layers are added, so that target information under different scales is extracted, and the method finally carries out target classification detection and non-maximum inhibition position regression operation.
The bounding box of the real object uses (x, y, z, w, l, h, theta) to represent the three-dimensional center, width, length, height and deflection, respectively, , , ,
Xgtand XaRespectively represent a real target and an anchor point, andwherein the localization loss function is:
because the positioning loss can not distinguish whether the bounding box is turned over, the method is usedLearning bounding box directions in discrete directions;
the classification loss uses a focal loss function as:
the overall loss function is:
The loss function uses an Adam optimizer and the learning rate decreases as the training period increases.
In light of the foregoing description of the preferred embodiment of the present invention, many modifications and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The technical scope of the present invention is not limited to the content of the specification, and must be determined according to the scope of the claims.
Claims (10)
1. A car identification method based on three-dimensional point cloud data and traffic scenes is characterized in that: the method comprises the following steps:
1) a multi-resolution column-by-column feature extraction network;
2) a spatial attention-based convolution detection framework;
3) a compression-activated attention based detection head.
2. The automobile identification method based on the three-dimensional point cloud data and the traffic scene as claimed in claim 1, characterized in that: the multi-resolution-based column-by-column feature extraction network comprises point cloud data processing, column feature extraction and pseudo-image feature extraction which are sequentially carried out.
3. The automobile identification method based on the three-dimensional point cloud data and the traffic scene as claimed in claim 2, characterized in that: specifically, the point I in the point cloud data is uniquely represented by four dimensions of x, y, z and r; uniformly dividing points in the point cloud into grids based on an x-y plane, wherein the grids form a group of column sets and are set as columns p, and the columns p have no height limitation on a z axis; enhancing the four-dimensional characteristics of the original input points x, y, z and r in the column into x, y, z, r and xc、yc、zc、xp、ypA nine dimensional feature where r is the point I reflectivity, the c subscript represents the arithmetic mean distance to all points I in the bar, and the p subscript represents the deviation from the center of the bar x, bar y.
4. The method for recognizing the automobile based on the three-dimensional point cloud data and the traffic scene as claimed in claim 3, wherein: the column feature extraction specifically comprises the steps of extracting features in a column P by using a point cloud network for points I in each column respectively, collecting the column features by adopting three different resolutions of high resolution, medium resolution and low resolution, controlling the sparsity D by limiting the number P of non-empty columns of each sample and the total number N of the points I in each column respectively to generate a scale T belonging to the RD×P×NThe density tensor of (a);
extracting the characteristics of each point I in the column P by adopting a point cloud network, leading each point in the column P to respectively pass through a linear layer, a batch normalization layer and a ReLU layer, and leading the output size to be Z ∈ RC×P×NA tensor;
the features are combined and stacked according to the position of the original column to form the size of the S ∈ RC×H×WWherein the three resolutions of high, medium and low respectively generate corresponding pseudo-graphs SH、SM、SL。
5. The method for identifying the automobile based on the three-dimensional point cloud data and the traffic scene as claimed in claim 4, wherein: the fixed size of the frame tensor T is maintained at 10000; if the data within the collected sample or in the column is less than 10000, then the tensor T to 10000 is filled by using zero padding.
6. The method for identifying the automobile based on the three-dimensional point cloud data and the traffic scene as claimed in claim 4, wherein: the pseudo-graph characteristic extraction comprises the sequential steps of down-sampling by using convolution operation and up-sampling by using deconvolution operation to extract a pseudo-graph SH、SM、SLThe medium vehicle characteristic information, after up-sampling and down-sampling, comprises a batch normalization layer and a ReLU layer, and a pseudo-map S obtained by up-samplingH、SM、SLThe characteristic information is combined to generate a new point cloud pseudo-map S.
7. The method for identifying the automobile based on the three-dimensional point cloud data and the traffic scene as claimed in claim 6, wherein: the spatial attention-based convolution detection framework comprises:
1) respectively extracting pseudo-map features by using 1C, 2C and 4C channels;
2) spatial information features are enhanced using a spatial attention mechanism.
8. The method for identifying the automobile based on the three-dimensional point cloud data and the traffic scene as claimed in claim 7, wherein: the characteristics of the pseudo-map extracted by using the 1C, 2C and 4C channels are as follows:
inputting the pseudo-map S into a detection framework by using the area proposal network, wherein the detection framework is a downsampling network Net1And an upsampling network Net2;
Down-sampling network Net1Downsampling the feature map by convolution operation with smaller and smaller spatial resolutions 1C, 2C and 4C, wherein the downsampling network is represented by a series of (S, L and F) blocks, S represents a step length, F represents the number of output channels, L represents the number of 3 multiplied by 3 two-dimensional convolution layers, a batch normalization layer and a ReLU layer are connected behind each channel, and the first convolution step in each layer is S/S _ in so as to ensure that the size of the detection network is still kept to be S after the detection network receives the input of the step length S _ in; the subsequent convolution steps in each layer are all 1, and the number of channels in each layer is [64,128,256 ]]Down-sampling networks produce successively smaller spatial resolutions;
upsampling network Net2Performing up-sampling operation on feature maps with different resolutions by deconvolution, and performing up-sampling on network Net2Represented by (S _ in, S _ out, F), where S _ in is the initial step, S _ out is the termination step, and F is the final characteristic; the upper sampling network is connected with a batch normalization layer and a ReLU layer, and the pseudo-graph S respectively generates a pseudo-graph characteristic graph F through the upper sampling network and the lower sampling network1、F2、F3。
9. The method of claim 8, wherein the method comprises the following steps: the method for enhancing the spatial information characteristics by using the spatial attention mechanism comprises the following steps: sending the pseudo-map feature map F generated by the network into a space attention module, and generating two new feature maps G by using the feature map with two 1 multiplied by 1 convolution layers by the space attention module1And G2;
Wherein { G1,G2}∈RC×H×WG is to be1Conversion to RC×(H×W)Then to G1Transpose and G1Performing matrix multiplication;
the spatial attention matrix W is then calculated using the Softmax functionsa∈R(H×W)×(H×W)The matrix display-encodes spatially salient portions;
by the addition of an acid at G2And WsaPerforming matrix multiplication to generate a feature map;
and finally, combining and outputting the scene target characteristic graphs subjected to spatial attention re-weighting under the three scales.
10. The method for identifying automobiles according to claim 9 based on three-dimensional point cloud data and traffic scenes, wherein the method comprises the following steps: the compression-activation attention-based detection head re-weights the merged multi-scale feature map using a compression-activation attention mechanism;
when compressing, global average pooling is used to produce channel-by-channel vectors s ∈ RC;
When activated, the module is realized by capturing channel-by-channel dependence;
se = ReLU (W2δ (W1s))
δ () is sigmoid function, ReLU () is ReLU function, W1 belongs to RC/ r ×C、 W2 ∈ RC× C/ r;
The above-mentioned compression-activation attention-based detection head has the following detection algorithm:
the output result of the network of the compression-activation attention detection head is used for target detection by a single-needle multi-box detector, the single-needle multi-box detector network is divided into six modules, the first module consists of five former Conv1, 2, 3, 4 and 5 convolutional layers of VGG16, and the second module is used for converting FC6 and FC7 full connection layers in VGG16 into Conv6 and Conv7 convolutional layers; the remaining four modules are added onv8, Conv9, Conv10 and Conv11 convolutional layers, so that target information under different scales is extracted, and the method finally carries out target classification detection and non-maximum inhibition position regression operation;
the above detection algorithm has the following loss function formula:
the bounding box of the real object represents the three-dimensional center, width, length, height and deflection of the bounding box using (x, y, z, w, l, h, theta), , , ,
Xgtand XaRespectively represent a real target and an anchor point, andwherein the localization loss function is:
because the positioning loss can not distinguish whether the bounding box is turned over, the method is usedLearning bounding box directions in discrete directions;
the classification loss uses a focal loss function as:
the overall loss function is:
The loss function uses an Adam optimizer and the learning rate decreases as the training period increases.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111358810.0A CN114550160A (en) | 2021-11-17 | 2021-11-17 | Automobile identification method based on three-dimensional point cloud data and traffic scene |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111358810.0A CN114550160A (en) | 2021-11-17 | 2021-11-17 | Automobile identification method based on three-dimensional point cloud data and traffic scene |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114550160A true CN114550160A (en) | 2022-05-27 |
Family
ID=81668805
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111358810.0A Withdrawn CN114550160A (en) | 2021-11-17 | 2021-11-17 | Automobile identification method based on three-dimensional point cloud data and traffic scene |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114550160A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117974990A (en) * | 2024-03-29 | 2024-05-03 | 之江实验室 | Point cloud target detection method based on attention mechanism and feature enhancement structure |
-
2021
- 2021-11-17 CN CN202111358810.0A patent/CN114550160A/en not_active Withdrawn
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117974990A (en) * | 2024-03-29 | 2024-05-03 | 之江实验室 | Point cloud target detection method based on attention mechanism and feature enhancement structure |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Fernandes et al. | Point-cloud based 3D object detection and classification methods for self-driving applications: A survey and taxonomy | |
CN111429514B (en) | Laser radar 3D real-time target detection method integrating multi-frame time sequence point cloud | |
CN111242041B (en) | Laser radar three-dimensional target rapid detection method based on pseudo-image technology | |
Harley et al. | Simple-bev: What really matters for multi-sensor bev perception? | |
CN113269040B (en) | Driving environment sensing method combining image recognition and laser radar point cloud segmentation | |
CN109934163A (en) | A kind of aerial image vehicle checking method merged again based on scene priori and feature | |
Ohgushi et al. | Road obstacle detection method based on an autoencoder with semantic segmentation | |
CN113095152B (en) | Regression-based lane line detection method and system | |
CN112990065B (en) | Vehicle classification detection method based on optimized YOLOv5 model | |
CN114445430B (en) | Real-time image semantic segmentation method and system for lightweight multi-scale feature fusion | |
CN112288667B (en) | Three-dimensional target detection method based on fusion of laser radar and camera | |
CN115187964A (en) | Automatic driving decision-making method based on multi-sensor data fusion and SoC chip | |
CN117274749B (en) | Fused 3D target detection method based on 4D millimeter wave radar and image | |
CN115238758A (en) | Multi-task three-dimensional target detection method based on point cloud feature enhancement | |
US12079970B2 (en) | Methods and systems for semantic scene completion for sparse 3D data | |
CN116704304A (en) | Multi-mode fusion target detection method of mixed attention mechanism | |
CN113378647B (en) | Real-time track obstacle detection method based on three-dimensional point cloud | |
CN114550160A (en) | Automobile identification method based on three-dimensional point cloud data and traffic scene | |
CN114218999A (en) | Millimeter wave radar target detection method and system based on fusion image characteristics | |
CN117935088A (en) | Unmanned aerial vehicle image target detection method, system and storage medium based on full-scale feature perception and feature reconstruction | |
CN114048536A (en) | Road structure prediction and target detection method based on multitask neural network | |
Zhang et al. | Full-scale Feature Aggregation and Grouping Feature Reconstruction Based UAV Image Target Detection | |
CN114648698A (en) | Improved 3D target detection system based on PointPillars | |
CN115984568A (en) | Target detection method in haze environment based on YOLOv3 network | |
Chen et al. | Real-time road object segmentation using improved light-weight convolutional neural network based on 3D LiDAR point cloud |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20220527 |
|
WW01 | Invention patent application withdrawn after publication |