CN114550160A - Automobile identification method based on three-dimensional point cloud data and traffic scene - Google Patents

Automobile identification method based on three-dimensional point cloud data and traffic scene Download PDF

Info

Publication number
CN114550160A
CN114550160A CN202111358810.0A CN202111358810A CN114550160A CN 114550160 A CN114550160 A CN 114550160A CN 202111358810 A CN202111358810 A CN 202111358810A CN 114550160 A CN114550160 A CN 114550160A
Authority
CN
China
Prior art keywords
point cloud
network
column
cloud data
pseudo
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202111358810.0A
Other languages
Chinese (zh)
Inventor
杨彪
王姝媛
徐黎明
杨长春
陈阳
吕继东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Changzhou University
Original Assignee
Changzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Changzhou University filed Critical Changzhou University
Priority to CN202111358810.0A priority Critical patent/CN114550160A/en
Publication of CN114550160A publication Critical patent/CN114550160A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)
  • Traffic Control Systems (AREA)

Abstract

The invention provides an automobile identification method based on three-dimensional point cloud data and a traffic scene. The invention can realize the identification of the traffic scene based on the point cloud data set through the three characteristic modules, and effectively improve the detection efficiency and the detection precision of the vehicle.

Description

Automobile identification method based on three-dimensional point cloud data and traffic scene
Technical Field
The invention relates to the field of traffic detection, in particular to the field of vehicle and pedestrian detection, and particularly relates to an automobile identification method based on three-dimensional point cloud data and a traffic scene.
Background
With the continuous development of artificial intelligence, sensors and control theory, the automatic driving draws wide attention in the academic and industrial fields, and has bright application prospect. During automatic driving of a vehicle, detection and behavior prediction of surrounding objects such as vehicles and pedestrians are required. At present, a target detection method using a two-dimensional RGB image cannot accurately identify information such as space, position, depth and angle of an opposite vehicle, so that driving movement of a vehicle cannot be planned and controlled only by simple target azimuth information. The invention adopts three-dimensional point cloud data different from the traditional two-dimensional RGB image data, points in the point cloud data all contain characteristic information such as the position, distance, angle and the like of a target object, and the data composition is more in line with the actual situation of the real world than the two-dimensional RGB image. The data used by the three-dimensional point cloud data is mainly generated by a lidar (light Detection And ranging) sensor. LiDAR is also known as optical radar. The main working principle is realized by receiving laser beam reflected light emitted by a radar sensor. The method has the advantages of long distance measurement, high precision, high reliability and the like, and is widely applied to the field of vehicle-mounted automatic driving. Currently, manufacturers of LiDAR include Velodyne, IBEO, Quanergy, Silan technology and other companies, and Velodyne is the most well known in the industry.
By using computer vision technology, researchers can extract the outline and shape information of vehicles and pedestrians to detect targets. For example, CN111507340A discloses a method for extracting target point cloud data based on three-dimensional point cloud data, which includes: acquiring original three-dimensional point cloud data, and performing denoising processing on the original three-dimensional point cloud data to obtain denoised three-dimensional point cloud data; extracting intensity image data from the de-noised three-dimensional point cloud data; calling a preset target extraction algorithm to perform target extraction processing on the intensity image data to obtain target intensity image data; extracting target three-dimensional point cloud data from the original three-dimensional point cloud data according to the pixel coordinate value of each pixel in the target intensity image data; and calling a preset point cloud denoising algorithm to denoise the target three-dimensional point cloud data to obtain target point cloud data. Although three-dimensional point cloud data is utilized, behavior features and distance features after data extraction are not further fused and feature dimension reduction is not performed, and the detection precision in a traffic scene cannot meet the use requirement.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: in order to overcome the defects of the prior art, the invention provides the automobile identification method based on the three-dimensional point cloud data and the traffic scene, which can comprehensively judge whether vehicles exist in a target area or not by combining the position, the outline and the shape of the point cloud in the space and the surrounding scene information, effectively improve the detection precision and the detection effect and provide accurate surrounding driving environment information for an automatic driving system.
The technical scheme adopted by the invention for solving the technical problems is as follows: a car identification method based on three-dimensional point cloud data and a traffic scene comprises the following steps: 1) a multi-resolution column-by-column feature extraction network; 2) a spatial attention-based convolution detection framework; 3) a compression-activated attention based detection head.
Further, the multi-resolution-based column-by-column feature extraction network comprises point cloud data processing, column feature extraction and pseudo-image feature extraction which are sequentially carried out.
Further, the point cloud data processing specifically includes that a point I in the point cloud data is uniquely represented by four dimensions of x, y, z and r; uniformly dividing points in the point cloud into grids based on an x-y plane, wherein the grids form a group of column sets and are set as columns p, and the columns p have no height limitation on a z axis; enhancing the four-dimensional characteristics of the original input points x, y, z and r in the column into x, y, z, r and xc、yc、zc、xp、ypA nine dimensional feature where r is the point I reflectivity, the c subscript represents the arithmetic mean distance to all points I in the bar, and the p subscript represents the deviation from the center of the bar x, bar y.
Further, the extraction of the column features specifically comprises extracting the features in the column P by using a point cloud network for the point I in each column, and adopting high, medium and lowAcquiring column characteristics at three different resolutions, wherein the three resolutions respectively control the sparsity D by limiting the number P of non-empty columns of each sample and the total number N of points I in each column to generate a scale T belonging to RD×P×NThe density tensor of (a); extracting the characteristics of each point I in the column P by adopting a point cloud network, enabling each point in the column P to pass through a linear layer, a batch normalization layer and a ReLU layer respectively, and outputting the value Z belonging to RC×P×NA tensor; the features are combined and stacked according to the position of the original column to form the size of the S ∈ RC×H×WWherein the three resolutions of high, medium and low respectively generate corresponding pseudo-graphs SH、SM、SL
Preferably, the fixed size of the retention frame tensor T is 10000; if the data within the collected sample or in the column is less than 10000, then the tensor T to 10000 is filled by using zero padding. With 10000 as a threshold, if sufficient means more than 10000 data points, the data is kept at 10000 by a random sampling method, and if too little means less than 10000, then the data needs to be supplemented to 10000 by data gain.
Further, the pseudo-image feature extraction comprises sequentially extracting the pseudo-image S by using a convolution operation for down-sampling and a deconvolution operation for up-samplingH、SM、SLThe medium vehicle characteristic information, after up-sampling and down-sampling, comprises a batch normalization layer and a ReLU layer, and a pseudo-map S obtained by up-samplingH、SM、SLThe characteristic information is combined to generate a new point cloud pseudo-map S.
Further, the spatial attention-based convolution detection framework comprises: 1) respectively extracting pseudo-map features by using 1C, 2C and 4C channels; 2) spatial information features are enhanced using a spatial attention mechanism.
Further, the characteristics of the pseudo-map extracted by using the 1C, 2C and 4C channels respectively are as follows:
inputting the pseudo-map S into a detection framework by using the area proposal network, wherein the detection framework is a downsampling network Net1And an upsampling network Net2
Down-sampling network Net1By convolution operations to become smaller and smallerThe spatial resolution 1C, 2C and 4C carry out down-sampling on the feature map, the down-sampling network is represented by a series of (S, L and F) blocks, wherein S represents a step length, F represents the number of output channels, and L represents the number of 3 multiplied by 3 two-dimensional convolutional layer layers, a batch normalization layer and a ReLU layer are connected behind each channel, the first convolution step in each layer is S/S _ in, so that the size of the detection network is kept to be S after the detection network receives the input of the step length S _ in; the subsequent convolution steps in each layer are all 1, and the number of channels in each layer is [64,128,256 ]]Down-sampling networks produce successively smaller spatial resolutions;
upsampling network Net2Performing up-sampling operation on feature maps with different resolutions by deconvolution, and performing up-sampling on network Net2Represented by (S _ in, S _ out, F), where S _ in is the initial step, S _ out is the termination step, and F is the final characteristic; the pseudo graph S respectively generates a pseudo graph characteristic graph F through the up-sampling network and the down-sampling network1、F2、F3
Further, the method for enhancing the spatial information characteristics by using the spatial attention mechanism comprises the following steps: sending the pseudo-map feature map F generated by the network into a space attention module, and generating two new feature maps G by using the feature map with two 1 multiplied by 1 convolution layers by the space attention module1And G2
Wherein { G1,G2}∈RC×H×WG is1Conversion to RC×(H×W)Then to G1Transpose and G1Performing matrix multiplication;
the spatial attention matrix W is then calculated using the Softmax functionsa∈R(H×W)×(H×W)The matrix display-encodes spatially salient portions;
by the addition of an acid at G2And WsaPerforming matrix multiplication to generate a feature map;
and finally, combining and outputting the scene target characteristic graphs subjected to spatial attention re-weighting under the three scales.
Further, the compression-activation attention-based detection head re-weights the merged multi-scale feature map using a compression-activation attention mechanism;
when compressing, global average pooling is used to produce channel-by-channel vectors s ∈ RC
When the module is activated, the module is realized by capturing channel-by-channel dependence;
se = ReLU (W2δ (W1s))
δ () is sigmoid function, ReLU () is ReLU function, W1 ∈ RC/ r ×C、 W2 ∈ RC× C/ r
The above-mentioned compression-activation attention-based detection head has the following detection algorithm:
the output result of the network of the compression-activation attention detection head is used for target detection by a single-needle multi-box detector, the single-needle multi-box detector network is divided into six modules, the first module consists of five former Conv1, 2, 3, 4 and 5 convolutional layers of VGG16, and the second module is used for converting FC6 and FC7 full connection layers in VGG16 into Conv6 and Conv7 convolutional layers; the remaining four modules are added onv8, Conv9, Conv10 and Conv11 convolutional layers, so that target information under different scales is extracted, and the method finally performs target classification detection and non-maximum inhibition position regression operation.
The above detection algorithm has the following loss function formula:
the bounding box of the real object uses (x, y, z, w, l, h, theta) to represent the three-dimensional center, width, length, height and deflection, respectively,
Figure RE-336727DEST_PATH_IMAGE001
Figure RE-785026DEST_PATH_IMAGE002
Figure RE-299184DEST_PATH_IMAGE003
Figure RE-100918DEST_PATH_IMAGE004
Figure RE-712028DEST_PATH_IMAGE005
Figure RE-941015DEST_PATH_IMAGE006
Figure RE-739207DEST_PATH_IMAGE007
Xgtand XaRespectively represent a real target and an anchor point, and
Figure RE-344632DEST_PATH_IMAGE008
wherein the localization loss function is:
Figure RE-544669DEST_PATH_IMAGE009
because the positioning loss can not distinguish whether the bounding box is turned over, the method is used
Figure RE-944557DEST_PATH_IMAGE010
Learning bounding box directions in discrete directions;
the classification loss uses a focal loss function as:
Figure RE-433307DEST_PATH_IMAGE011
Figure RE-701478DEST_PATH_IMAGE012
a probability value for an anchor point, α =0.25, γ = 2;
the overall loss function is:
Figure RE-896967DEST_PATH_IMAGE013
Figure RE-857970DEST_PATH_IMAGE014
number of anchors representing positive probabilityAmount, betaloc=2 ,βcls=1,βdir=0.2;
The loss function uses an Adam optimizer and the learning rate decreases as the training period increases.
The invention has the advantages that the invention provides an automobile identification method based on three-dimensional point cloud data and traffic scenes,
(1) providing accurate surrounding driving environment information for an automatic driving system based on the point cloud data set;
(2) the detection precision and the detection effect of the invention under the actual driving environment are improved by utilizing a spatial attention mechanism and by multi-resolution joint detection;
(3) the detection result of the single-needle multi-box detector algorithm is improved by the detection head based on the compression-activation attention and by re-weighting the weight among different channels in the space.
Drawings
The invention is further illustrated with reference to the following figures and examples.
FIG. 1 is a system flow chart of the automobile identification method based on three-dimensional point cloud data and traffic scenes.
Fig. 2 is a schematic diagram of a multi-resolution based pillar-by-pillar feature extraction network proposed in the present invention.
Fig. 3 is a schematic diagram of the spatial attention-based convolution detection framework proposed in the present invention.
Fig. 4 is a schematic diagram of a compression-activated attention based detection head as proposed in the present invention.
Detailed Description
The present invention will now be described in further detail with reference to the accompanying drawings. The drawings are simplified schematic diagrams illustrating the basic structure of the present invention only in a schematic manner, and thus show only the constitution related to the present invention, and directions and references (e.g., upper, lower, left, right, etc.) may be used only to help the description of the features in the drawings. The following detailed description is, therefore, not to be taken in a limiting sense, and the scope of the claimed subject matter is defined only by the appended claims and equivalents thereof.
As shown in fig. 1, a method for identifying an automobile based on three-dimensional point cloud data and a traffic scene comprehensively considers the factors of accurate distance measurement, high precision and more data characteristics of a point cloud data set, and invents a method for detecting vehicles and pedestrians based on the point cloud data set, which comprises the following steps:
1) a multi-resolution column-by-column feature extraction network;
2) a spatial attention-based convolution detection framework;
3) a compression-activated attention based detection head.
The method comprises the steps of detecting through a single-needle multi-box detector, further searching for an interested target by utilizing the motion information of pedestrians, and extracting a motion sequence, a surrounding traffic scene sequence and a track position of the interested target; the invention designs a three-dimensional convolution neural network to process the motion sequence of an interested target and obtain the behavior characteristics related to the intention of pedestrians to cross a road.
According to the invention, two weights are obtained according to the elements of the local traffic scene where the pedestrian is located and the vehicle running speed to correct the distance between the pedestrian and the vehicle, and the corrected distance is sent to the multilayer perceptron to be encoded to obtain the distance characteristic related to the intention of the pedestrian to cross the road.
And finally, performing information fusion on the behavior characteristics and the distance characteristics, reducing the dimension of the fused characteristics by using a full connection layer, and obtaining a result whether the pedestrian passes through the road or not through softmax operation.
Fig. 2 shows a schematic diagram of a multi-resolution based pillar-by-pillar feature extraction network. The different shades and shades of the gray in the figure represent features extracted from different scales, the scale in fig. 2 being the multi-resolution shown at the far left of the figure.
1) Point cloud data processing
According to the processing based on the point cloud data, the interested pedestrians are extracted, so that the time cost of processing non-interested pedestrians by an algorithm is reduced. The invention enhances the point I in the point cloud data into x, y, z, r and x by calculation from x, y, z and rc、yc、zc、xp、yp. Including the point space coordinates, r-reflectivity, the arithmetic mean distance of all points I in the c-bin, and the deviation of p from the x, y center of the bin, respectively.
2) Column feature extraction
And extracting the cloud characteristics of the points in the columns by adopting a point cloud network according to the high resolution, the medium resolution and the low resolution of the points I in each column. Controlling the sparsity of acquisition D by imposing a limit on the number of non-empty columns in each sample and the total number N of points I in each column to yield a scale T ∈ RD×P×NThe density tensor of (2). For data redundancy within the collected samples or in the column, the frame tensor T is kept fixed in size by randomly sampling to retain the data. If too little data is in the collected sample or in the column, the tensor T is expanded through zero padding to maintain the size of the frame tensor T.
The size of the point cloud network output is Z epsilon RC×P×NThe tensors are combined and piled up according to the positions of the original columns to form the size of S epsilon RC×H×WWherein the three resolutions of high, medium and low respectively generate corresponding pseudo-graphs SH、SM、SL
3) Pseudo-graph feature extraction
The invention is used for the pseudo-graph S with high, medium and low resolutionsH、SM、SLThe method of downsampling using a convolution operation followed by upsampling using a deconvolution operation extracts the pseudo-map SH、SM、SLThe medium vehicle characteristic information, the up-sampling and the down-sampling comprise a batch normalization layer and a ReLU layer. And finally, combining the three feature map information obtained by up-sampling to generate a new point cloud pseudo-map S.
Fig. 3 presents a schematic diagram of a spatial attention-based convolution detection framework. In fig. 3, the different shades of gray in the leftmost stitched bitmap represent the three features resulting from the upsampling.
4) Extracting pseudo-map features using 1C, 2C, 4C channels respectively
The invention realizes the vehicle detection task under the real traffic situation by a detection method of respectively extracting the characteristics through multiple channels. The area proposal network detection framework is entered using the pseudo-graph S. The frame is mainly divided into two parts: lower miningSample networks and upsampling networks. The downsampling network downsamples the feature map with increasingly smaller spatial resolutions (1C, 2C, 4C) by convolution operations. The downsampled network is represented by a series of (S, L, F) blocks. Wherein S represents a step size, F represents the number of output channels, and L represents the number of 3 × 3 two-dimensional convolutional layer layers. And a batch normalization layer and a ReLU layer are connected behind each channel, and the first convolution step in each layer is S/S _ in so as to ensure that the size of the detection network is still kept to be S after the detection network receives the input of the step S _ in. The subsequent convolution steps in each layer are all 1, and the number of channels in each layer is [64,128,256 ]]. The downsampling network may produce successively smaller spatial resolutions; upsampling network Net2And performing an up-sampling operation on the feature maps with different resolutions through deconvolution, wherein an up-sampling network is represented by (S _ in, S _ out, F), S _ in is an initial step, S _ out is a termination step, and F is a final feature. The same as the down sampling network, the up sampling network is also connected with the batch normalization layer and the ReLU layer. The pseudo-graph S respectively generates a pseudo-graph characteristic graph F through an up-sampling network and a down-sampling network1、F2、F3
5) Enhancing spatial information features using spatial attention mechanism
F is to be1、F2、F3The pseudo-map features are fed into a spatial attention module, which first uses two 1 × 1 convolutional layers to generate two new feature maps G1And G2Wherein { G1,G2}∈RC×H×WG is1Conversion to RC ×(H×W)Then to G1Transpose and G1Matrix multiplication is performed. The spatial attention matrix W is then calculated using the Softmax functionsa∈R(H ×W)×(H×W)The matrix pair may display code the spatially salient portion. Then, by passing through at G2And WsaPerforms matrix multiplication to generate a feature map. And finally, combining and outputting the scene target characteristic graphs subjected to spatial attention re-weighting under the three scales.
Fig. 4 shows a schematic diagram of a detection head based on SE attention. The different shades and shades of gray in fig. 4 represent different features.
6) Detection head based on compression-activation network
The merged multi-scale feature map is re-weighted using a compression-activation attention mechanism, which is implemented primarily through compression and activation operations. In compression operations, global average pooling is used to produce channel-by-channel vectors s ∈ RC. In the activation phase, the module is implemented by capturing channel-by-channel dependencies.
se = ReLU (W2δ (W1s))
δ () is sigmoid function ReLU () is ReLU function. W1 ∈ RC/ r ×C、 W2 ∈ RC× C/ r
7) Detection algorithm
The invention uses the single-needle multi-box detector to detect the target of the output result of the compression-activation network, and the single-needle multi-box detector method has high detection speed and high detection precision. The method introduces the idea of anchor points, can adapt to multi-scale target detection tasks, and is more in line with the characteristic of larger scale transformation of point cloud data. The single-pin multi-box detector network is mainly divided into six modules, the first module is composed of the first five layers of Conv1, 2, 3, 4 and 5 convolutional layers of VGG16, and then FC6 and FC7 full-connection layers in VGG16 are converted into Conv6 and Conv7 convolutional layers. On the basis, four modules of Conv8, Conv9, Conv10 and Conv11 convolutional layers are added, so that target information under different scales is extracted, and the method finally carries out target classification detection and non-maximum inhibition position regression operation.
The bounding box of the real object uses (x, y, z, w, l, h, theta) to represent the three-dimensional center, width, length, height and deflection, respectively,
Figure RE-834016DEST_PATH_IMAGE001
Figure RE-781243DEST_PATH_IMAGE002
Figure RE-955873DEST_PATH_IMAGE003
Figure RE-25460DEST_PATH_IMAGE004
Figure RE-163836DEST_PATH_IMAGE005
Figure RE-39388DEST_PATH_IMAGE006
Figure RE-271786DEST_PATH_IMAGE007
Xgtand XaRespectively represent a real target and an anchor point, and
Figure RE-715537DEST_PATH_IMAGE008
wherein the localization loss function is:
Figure RE-197334DEST_PATH_IMAGE009
because the positioning loss can not distinguish whether the bounding box is turned over, the method is used
Figure RE-814260DEST_PATH_IMAGE010
Learning bounding box directions in discrete directions;
the classification loss uses a focal loss function as:
Figure RE-838848DEST_PATH_IMAGE011
Figure RE-312554DEST_PATH_IMAGE012
a probability value for an anchor point, α =0.25, γ = 2;
the overall loss function is:
Figure RE-484910DEST_PATH_IMAGE013
Figure RE-312052DEST_PATH_IMAGE014
representing the number of positive probability anchors, betaloc=2 ,βcls=1,βdir=0.2;
The loss function uses an Adam optimizer and the learning rate decreases as the training period increases.
In light of the foregoing description of the preferred embodiment of the present invention, many modifications and variations will be apparent to those skilled in the art without departing from the spirit and scope of the invention. The technical scope of the present invention is not limited to the content of the specification, and must be determined according to the scope of the claims.

Claims (10)

1. A car identification method based on three-dimensional point cloud data and traffic scenes is characterized in that: the method comprises the following steps:
1) a multi-resolution column-by-column feature extraction network;
2) a spatial attention-based convolution detection framework;
3) a compression-activated attention based detection head.
2. The automobile identification method based on the three-dimensional point cloud data and the traffic scene as claimed in claim 1, characterized in that: the multi-resolution-based column-by-column feature extraction network comprises point cloud data processing, column feature extraction and pseudo-image feature extraction which are sequentially carried out.
3. The automobile identification method based on the three-dimensional point cloud data and the traffic scene as claimed in claim 2, characterized in that: specifically, the point I in the point cloud data is uniquely represented by four dimensions of x, y, z and r; uniformly dividing points in the point cloud into grids based on an x-y plane, wherein the grids form a group of column sets and are set as columns p, and the columns p have no height limitation on a z axis; enhancing the four-dimensional characteristics of the original input points x, y, z and r in the column into x, y, z, r and xc、yc、zc、xp、ypA nine dimensional feature where r is the point I reflectivity, the c subscript represents the arithmetic mean distance to all points I in the bar, and the p subscript represents the deviation from the center of the bar x, bar y.
4. The method for recognizing the automobile based on the three-dimensional point cloud data and the traffic scene as claimed in claim 3, wherein: the column feature extraction specifically comprises the steps of extracting features in a column P by using a point cloud network for points I in each column respectively, collecting the column features by adopting three different resolutions of high resolution, medium resolution and low resolution, controlling the sparsity D by limiting the number P of non-empty columns of each sample and the total number N of the points I in each column respectively to generate a scale T belonging to the RD×P×NThe density tensor of (a);
extracting the characteristics of each point I in the column P by adopting a point cloud network, leading each point in the column P to respectively pass through a linear layer, a batch normalization layer and a ReLU layer, and leading the output size to be Z ∈ RC×P×NA tensor;
the features are combined and stacked according to the position of the original column to form the size of the S ∈ RC×H×WWherein the three resolutions of high, medium and low respectively generate corresponding pseudo-graphs SH、SM、SL
5. The method for identifying the automobile based on the three-dimensional point cloud data and the traffic scene as claimed in claim 4, wherein: the fixed size of the frame tensor T is maintained at 10000; if the data within the collected sample or in the column is less than 10000, then the tensor T to 10000 is filled by using zero padding.
6. The method for identifying the automobile based on the three-dimensional point cloud data and the traffic scene as claimed in claim 4, wherein: the pseudo-graph characteristic extraction comprises the sequential steps of down-sampling by using convolution operation and up-sampling by using deconvolution operation to extract a pseudo-graph SH、SM、SLThe medium vehicle characteristic information, after up-sampling and down-sampling, comprises a batch normalization layer and a ReLU layer, and a pseudo-map S obtained by up-samplingH、SM、SLThe characteristic information is combined to generate a new point cloud pseudo-map S.
7. The method for identifying the automobile based on the three-dimensional point cloud data and the traffic scene as claimed in claim 6, wherein: the spatial attention-based convolution detection framework comprises:
1) respectively extracting pseudo-map features by using 1C, 2C and 4C channels;
2) spatial information features are enhanced using a spatial attention mechanism.
8. The method for identifying the automobile based on the three-dimensional point cloud data and the traffic scene as claimed in claim 7, wherein: the characteristics of the pseudo-map extracted by using the 1C, 2C and 4C channels are as follows:
inputting the pseudo-map S into a detection framework by using the area proposal network, wherein the detection framework is a downsampling network Net1And an upsampling network Net2
Down-sampling network Net1Downsampling the feature map by convolution operation with smaller and smaller spatial resolutions 1C, 2C and 4C, wherein the downsampling network is represented by a series of (S, L and F) blocks, S represents a step length, F represents the number of output channels, L represents the number of 3 multiplied by 3 two-dimensional convolution layers, a batch normalization layer and a ReLU layer are connected behind each channel, and the first convolution step in each layer is S/S _ in so as to ensure that the size of the detection network is still kept to be S after the detection network receives the input of the step length S _ in; the subsequent convolution steps in each layer are all 1, and the number of channels in each layer is [64,128,256 ]]Down-sampling networks produce successively smaller spatial resolutions;
upsampling network Net2Performing up-sampling operation on feature maps with different resolutions by deconvolution, and performing up-sampling on network Net2Represented by (S _ in, S _ out, F), where S _ in is the initial step, S _ out is the termination step, and F is the final characteristic; the upper sampling network is connected with a batch normalization layer and a ReLU layer, and the pseudo-graph S respectively generates a pseudo-graph characteristic graph F through the upper sampling network and the lower sampling network1、F2、F3
9. The method of claim 8, wherein the method comprises the following steps: the method for enhancing the spatial information characteristics by using the spatial attention mechanism comprises the following steps: sending the pseudo-map feature map F generated by the network into a space attention module, and generating two new feature maps G by using the feature map with two 1 multiplied by 1 convolution layers by the space attention module1And G2
Wherein { G1,G2}∈RC×H×WG is to be1Conversion to RC×(H×W)Then to G1Transpose and G1Performing matrix multiplication;
the spatial attention matrix W is then calculated using the Softmax functionsa∈R(H×W)×(H×W)The matrix display-encodes spatially salient portions;
by the addition of an acid at G2And WsaPerforming matrix multiplication to generate a feature map;
and finally, combining and outputting the scene target characteristic graphs subjected to spatial attention re-weighting under the three scales.
10. The method for identifying automobiles according to claim 9 based on three-dimensional point cloud data and traffic scenes, wherein the method comprises the following steps: the compression-activation attention-based detection head re-weights the merged multi-scale feature map using a compression-activation attention mechanism;
when compressing, global average pooling is used to produce channel-by-channel vectors s ∈ RC
When activated, the module is realized by capturing channel-by-channel dependence;
se = ReLU (W2δ (W1s))
δ () is sigmoid function, ReLU () is ReLU function, W1 belongs to RC/ r ×C、 W2 ∈ RC× C/ r
The above-mentioned compression-activation attention-based detection head has the following detection algorithm:
the output result of the network of the compression-activation attention detection head is used for target detection by a single-needle multi-box detector, the single-needle multi-box detector network is divided into six modules, the first module consists of five former Conv1, 2, 3, 4 and 5 convolutional layers of VGG16, and the second module is used for converting FC6 and FC7 full connection layers in VGG16 into Conv6 and Conv7 convolutional layers; the remaining four modules are added onv8, Conv9, Conv10 and Conv11 convolutional layers, so that target information under different scales is extracted, and the method finally carries out target classification detection and non-maximum inhibition position regression operation;
the above detection algorithm has the following loss function formula:
the bounding box of the real object represents the three-dimensional center, width, length, height and deflection of the bounding box using (x, y, z, w, l, h, theta),
Figure RE-150573DEST_PATH_IMAGE001
Figure RE-998443DEST_PATH_IMAGE002
Figure RE-330198DEST_PATH_IMAGE003
Figure RE-351244DEST_PATH_IMAGE004
Figure RE-147162DEST_PATH_IMAGE005
Figure RE-838037DEST_PATH_IMAGE006
Figure RE-47302DEST_PATH_IMAGE007
Xgtand XaRespectively represent a real target and an anchor point, and
Figure RE-473036DEST_PATH_IMAGE008
wherein the localization loss function is:
Figure RE-185777DEST_PATH_IMAGE009
because the positioning loss can not distinguish whether the bounding box is turned over, the method is used
Figure RE-109871DEST_PATH_IMAGE010
Learning bounding box directions in discrete directions;
the classification loss uses a focal loss function as:
Figure RE-681798DEST_PATH_IMAGE011
Figure RE-779067DEST_PATH_IMAGE012
α =0.25, γ =2, which is the probability value of the anchor point;
the overall loss function is:
Figure RE-815156DEST_PATH_IMAGE013
Figure RE-582255DEST_PATH_IMAGE014
representing the number of positive probability anchors, betaloc=2 ,βcls=1,βdir=0.2;
The loss function uses an Adam optimizer and the learning rate decreases as the training period increases.
CN202111358810.0A 2021-11-17 2021-11-17 Automobile identification method based on three-dimensional point cloud data and traffic scene Withdrawn CN114550160A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111358810.0A CN114550160A (en) 2021-11-17 2021-11-17 Automobile identification method based on three-dimensional point cloud data and traffic scene

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111358810.0A CN114550160A (en) 2021-11-17 2021-11-17 Automobile identification method based on three-dimensional point cloud data and traffic scene

Publications (1)

Publication Number Publication Date
CN114550160A true CN114550160A (en) 2022-05-27

Family

ID=81668805

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111358810.0A Withdrawn CN114550160A (en) 2021-11-17 2021-11-17 Automobile identification method based on three-dimensional point cloud data and traffic scene

Country Status (1)

Country Link
CN (1) CN114550160A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117974990A (en) * 2024-03-29 2024-05-03 之江实验室 Point cloud target detection method based on attention mechanism and feature enhancement structure

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117974990A (en) * 2024-03-29 2024-05-03 之江实验室 Point cloud target detection method based on attention mechanism and feature enhancement structure

Similar Documents

Publication Publication Date Title
Fernandes et al. Point-cloud based 3D object detection and classification methods for self-driving applications: A survey and taxonomy
CN111429514B (en) Laser radar 3D real-time target detection method integrating multi-frame time sequence point cloud
CN111242041B (en) Laser radar three-dimensional target rapid detection method based on pseudo-image technology
Harley et al. Simple-bev: What really matters for multi-sensor bev perception?
CN113269040B (en) Driving environment sensing method combining image recognition and laser radar point cloud segmentation
CN109934163A (en) A kind of aerial image vehicle checking method merged again based on scene priori and feature
Ohgushi et al. Road obstacle detection method based on an autoencoder with semantic segmentation
CN113095152B (en) Regression-based lane line detection method and system
CN112990065B (en) Vehicle classification detection method based on optimized YOLOv5 model
CN114445430B (en) Real-time image semantic segmentation method and system for lightweight multi-scale feature fusion
CN112288667B (en) Three-dimensional target detection method based on fusion of laser radar and camera
CN115187964A (en) Automatic driving decision-making method based on multi-sensor data fusion and SoC chip
CN117274749B (en) Fused 3D target detection method based on 4D millimeter wave radar and image
CN115238758A (en) Multi-task three-dimensional target detection method based on point cloud feature enhancement
US12079970B2 (en) Methods and systems for semantic scene completion for sparse 3D data
CN116704304A (en) Multi-mode fusion target detection method of mixed attention mechanism
CN113378647B (en) Real-time track obstacle detection method based on three-dimensional point cloud
CN114550160A (en) Automobile identification method based on three-dimensional point cloud data and traffic scene
CN114218999A (en) Millimeter wave radar target detection method and system based on fusion image characteristics
CN117935088A (en) Unmanned aerial vehicle image target detection method, system and storage medium based on full-scale feature perception and feature reconstruction
CN114048536A (en) Road structure prediction and target detection method based on multitask neural network
Zhang et al. Full-scale Feature Aggregation and Grouping Feature Reconstruction Based UAV Image Target Detection
CN114648698A (en) Improved 3D target detection system based on PointPillars
CN115984568A (en) Target detection method in haze environment based on YOLOv3 network
Chen et al. Real-time road object segmentation using improved light-weight convolutional neural network based on 3D LiDAR point cloud

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20220527

WW01 Invention patent application withdrawn after publication