CN116071603A

CN116071603A - Multi-class target detection method based on camera and laser radar

Info

Publication number: CN116071603A
Application number: CN202310198312.7A
Authority: CN
Inventors: 张静; 许达; 李云松
Original assignee: Wuhu Research Institute of Xidian University
Current assignee: Wuhu Research Institute of Xidian University
Priority date: 2023-03-03
Filing date: 2023-03-03
Publication date: 2023-05-05

Abstract

The invention discloses a multi-class target detection method based on a camera and a laser radar, which mainly solves the problem of low detection accuracy of the existing target detection method. The implementation scheme is as follows: acquiring a pavement image and point cloud data; voxel pretreatment is carried out on the point cloud data; carrying out space information reinforcement on the preprocessed point cloud; color information fusion is carried out on the point cloud with the enhanced space information; convolving the fused point cloud to obtain a double-reinforcement pseudo image; obtaining a feature map to be detected according to the double-reinforcement pseudo image; and sending the feature map to be detected into an SSD detector to generate a detection result of the front target in the running process of the automobile. According to the invention, the space information of the point cloud is enhanced by establishing the intra-mode mapping matrix and sampling, and the color information of the RGB image and the point cloud are fused by size adjustment and generation of the pseudo-view transformation matrix, so that the accuracy of target detection is improved, and the method can be used for automatic driving of unmanned vehicles.

Description

Multi-class target detection method based on camera and laser radar

Technical Field

The invention belongs to the technical field of physics, and further relates to a multi-class target detection method which can be used for automatic driving of an unmanned automobile.

Background

Cameras, which are the most commonly used sensors in unmanned car driving systems, acquire images containing color information of the external environment, play an extremely important role in object detection, but have drawbacks in that they lack depth information and are affected by natural conditions. The laser radar has the advantages of low dependence on lighting conditions, difficult influence of severe weather and high detection precision. The data generated by the method is point cloud data, more detailed target shape and position information can be obtained by analyzing and processing the point cloud data, the environment can be better understood by the automobile, and a more stable and accurate perception result can be provided. Therefore, unmanned vehicles have become a trend to use data generated by cameras and lidar to detect objects located in front of the vehicle during the traveling process, such as vehicles, pedestrians, and riders.

Currently, two types of sensor data fusion algorithms for cameras and lidars fall into three categories: front end fusion, depth fusion, back end fusion. Front end fusion mainly aims at data layer fusion, wherein data of different modes are fused into a single feature vector before being input, and then the single feature vector is input for subsequent operation. The data may be raw data from the sensor or may be pre-processed data. The depth fusion is mainly aimed at the fusion of feature layers, and the depth fusion converts the original data of different modes into high-dimensional feature expression after extracting features, and then performs some interactive fusion operations in different feature layers. The back end fusion is mainly aimed at the fusion of decision planes, and the back end fusion is carried out by processing the original data of different modes by respective networks, outputting classification scores and fusing the scores.

The middle Cheng Hua Long computer technology company has the following application number: patent literature CN202211082826 discloses an "automatic driving decision method and SoC chip based on multi-sensor data fusion", which inputs point cloud data into a trained point cloud target detection neural network model to detect an obstacle target. In the method, although a camera is used, in obstacle detection, only point cloud data acquired by a laser radar is utilized, sensor data cannot be fully utilized, and the acquired image is only used for detecting a lane, and the importance of color information on the obstacle is ignored, so that the obstacle cannot be efficiently detected.

The scientific and technological company of Studian Brown (Beijing) has the application number: the patent document of CN202211314591 discloses a scene-sensing-based V2X multi-sensor fusion method and device, which comprises the steps of performing time synchronization preprocessing operation on acquired sensor data, comparing the preprocessed two types of data, and acquiring correlation coefficients among sensors and confidence degrees of the sensors so as to judge the scene; and finally, determining fusion weights of all the sensors in the preset neural network based on the judged scene and the input sensor types to obtain a fusion result. The method ignores the relation among different sensor data because the fusion of the different sensor data occurs at the decision level, so that the characteristics among different mode data cannot be complemented, and finally the accuracy of target detection is reduced.

Disclosure of Invention

The invention aims to overcome the defects of the prior art, provides a multi-class target detection method based on a camera and a laser radar, aims to solve the defect that a single laser radar target detection algorithm lacks color information, and solves the defect that decision-level fusion cannot fully consider the characteristic relation of different mode data, and improves the target detection precision.

The technical idea of the invention is as follows: setting an image auxiliary processing branch and a point cloud main processing branch and a point cloud auxiliary processing branch for the point cloud by setting a path of image auxiliary processing branch for the RGB image so as to solve the problem of lack of color information in single laser radar target detection; spatial information of the point cloud auxiliary processing branch can be fused into the point cloud main processing branch through single-mode self-fusion, so that the problem that decision-level fusion cannot complement characteristics among different mode data is solved.

According to the above thought, the implementation steps of the invention include the following:

(1) Data are obtained through the sensors respectively, and the camera obtains RGB image F ₁ Laser radar obtains point cloud R ₁ (x,y,z,r)；

(2) Point-to-point cloud R ₁ (x, y, z, r) voxel pretreatment to obtain an original pseudo image P:

(3) Spatial information of the pseudo image P is enhanced by utilizing a single-mode self-fusion method, and a spatial enhanced pseudo image Ps is generated _i ：

(3a) Point cloud feature R _i Registering with the pseudo image feature P to generate an intra-mode mapping matrix M _RP ：

M _RP ＝R _i /P _i ；

(3b) Based on the sampling position p' and the intra-modal mapping matrix M _RP Generating a characteristic representation V _RP ：

Wherein K represents a bilinear interpolation function,

features representing adjacent pixels at the sampling position p';

(3c) Point cloud feature R using SetAbstract sampling operation _i Feature extraction is carried out to generate point cloud features R to be fused _i '；

(3d) Representing the characteristic V _RP And point cloud feature R to be fused _i ' Point-by-Point fusion, generating fused Point cloud characteristics R _i ”：

R _i ”＝σ(Wtanh(UV _RP +VR _i ))

Wherein W, U and V are three learnable weight matrices of different values, sigma represents a sigmoid activation function, and tanh represents a hyperbolic tangent function;

(3e) For the fused point cloud characteristics R _i "obtaining spatially enhanced pseudo image Ps with two fully connected layers FC _i ：

Ps _i ＝P _i ×FC(FC(R _i ”))

(4) Image F of RGB _i Color information integration space-enhanced pseudo image Ps _i Generating a color enhanced pseudo image Pc _i ：

(4a) For RGB image F _i Resized to a size and spatially enhanced pseudo image Ps _i Is the same in size;

(4b) To resized RGB image F _i And spatially enhanced pseudo image Ps _i The method comprises the steps of executing BatchNorm operation and ReLu operation respectively, and connecting in a channel dimension to obtain a to-be-transformed graph PF;

(4c) Generating a space factor matrix M by using a space attention formula and a channel attention formula of the map PF to be transformed respectively _s And channel factor matrix M _c And gets the pseudo view transformation matrix M _cs ：

M _cs ＝0.6*M _c +0.4*M _s ；

(4d) Transforming the pseudo-view into matrix M _cs And RGB image F _i Multiplication to obtain viewing angle conversion pseudo image Pv _i And the pseudo image Pv _i And spatially enhancing the dummy image Ps _i Splicing in the channel dimension to generate a double-enhanced pseudo image Pc _i ；

(5) According to the double-enhanced pseudo-image Pc _i Obtaining a dual-enhancement pseudo image:

(5a) For double-enhanced pseudo image Pc _i Convolving to obtain a downsampled double-enhanced pseudo-image Pc _i+1 Returning to the step (3);

(5b) Repeating the step (5 a) to finally obtain the double-reinforced pseudo image Pc ₃ ；

(6) For double-enhanced pseudo image Pc ₃ Performing two transpose convolutions, dividingObtain two transposed characteristic graphs Pt ₁ And Pt (Pt) ₂ . Double-enhanced pseudo image Pc ₃ With transposed two characteristic patterns Pt ₁ 、Pt ₂ Splicing to obtain a feature map F to be detected _U ；

(7) To-be-detected feature map F _U And sending the detection result to an SSD detector to generate a detection result of a front target in the running process of the automobile.

Compared with the prior art, the invention has the following advantages:

firstly, the invention uses a single-mode self-fusion method to strengthen the spatial information of the pseudo image by establishing a mapping matrix in the mode and setabstract sampling, so that the spatial expression capacity of the pseudo image on the target is enhanced, the defect that useful spatial information is lost in the voxelization process in the prior art is overcome, and the accuracy of target detection is improved.

Secondly, the invention utilizes a bimodal cross fusion method to fuse the color information of the RGB image with the space information of the point cloud through size adjustment and generation of the pseudo view transformation matrix, thereby further improving the accuracy of target detection.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a schematic illustration of a single mode self-fusion in accordance with the present invention;

FIG. 3 is a schematic diagram of a bimodal cross-fusion in accordance with the present invention.

Detailed Description

Embodiments and effects of the present invention are described in further detail below with reference to the accompanying drawings.

Referring to fig. 1, the implementation steps for this example are as follows.

And step 1, obtaining road surface images and point cloud data.

Under driving scene, the detection of various targets by the automobile needs to acquire images and point cloud data of the road surface through a sensor, and the acquisition of the information can acquire RGB image F by adopting a camera ₁ Obtaining a point cloud R by adopting a laser radar ₁ (x, y, z, r). In this embodiment, the actual scenario is simulated with the internationally published KITTI data set as inputThe data is acquired by the sensor. The KITTI data set is one of the most authoritative computer vision evaluation data sets internationally, and comprises real images and point cloud data acquired by scenes such as urban areas, villages, highways and the like.

And step 2, voxelized preprocessing is carried out on the point cloud data.

Point cloud R ₁ About 2 tens of thousands of data are contained, including a large number of noise points and redundancy points. In order to reduce the system computing load, a point cloud R is needed first ₁ The (x, y, z, r) is subjected to voxelization pretreatment to obtain an original pseudo image P capable of expressing important information of the point cloud, and the method is concretely realized as follows:

2.1 In z direction, from z=0m to z=4m, dividing the point cloud space into four equal-height spaces, respectively, height space S ₁ (z ₁ ＝[0,1)m)，S ₂ (z ₂ ＝[1,2)m)，S ₃ (z ₃ ＝[2,3)m)，S ₄ (z ₄ ＝[3,4]m)；

2.2 In the x-y direction), the point cloud in each height space is divided into regular column-shaped sub-point clouds with the bottom dimension of 0.16mx0.16m, in this embodiment, the height space S ₁ 、S ₂ 、S ₃ 、S ₄ Are each divided into 496×432 regular cylindrical sub-point clouds;

2.3 Setting a sampling threshold D, randomly sampling columnar sub-point clouds with the number of points larger than the threshold, and zero-filling columnar sub-point clouds with the number of points smaller than the threshold, wherein in the embodiment, d=32, if the number of points in 496×432 regular columnar sub-point clouds in each height space is larger than 32, randomly sampling 32 points; if less than 32, the missing dots are complemented with 0;

2.4 Calculating an arithmetic mean (x) of all points within each columnar sub-point cloud _c ,y _c ,z _c ) And offset (x) _p ,y _p ) Obtaining enhanced point cloud (x, y, z, r, x) _c ,y _c ,z _c ,x _p ,y _p )；

2.5 Enhanced point cloud (x, y, z, r, x) using a maximally pooled and simplified PointNet network pair _c ,y _c ,z _c ,x _p ,y _p ) Encoding to generateSub-pseudo-image p corresponding to each height space _i The method comprises the steps of carrying out a first treatment on the surface of the In the present embodiment, the height space S ₁ 、S ₂ 、S ₃ 、S ₄ The corresponding sub-pseudo images are p ₁ 、p ₂ 、p ₃ 、p ₄ . Each sub-pseudo-image has a size of 496 x 432.

2.6 Using four fully connected layers to obtain a height dimension weight and a channel dimension weight:

2.6.1 Using a first fully-connected layer W ₁ Compressing sub-pseudo-image p _i Is recycled by the second full connection layer W ₂ Extracting its height dimension weight S _i ：

S _i ＝W ₂ δ(W ₁ p _i )

Wherein δ () represents an activation function;

2.6.2 Using a third full-connection layer W ₃ Compressing sub-pseudo-image p _i Re-use of the fourth full connection layer W ₄ Extracting its channel dimension weight T _i ：

T _i ＝W ₄ δ(W ₃ p _i )

Wherein δ () represents an activation function;

2.7 Four sub-pseudo-images p) _i Height dimension weight S corresponding to the same _i And channel dimension weight T _i Multiplication is followed by stitching and maximum pooling operations in the channel dimension to obtain the final original pseudo-image P.

In this embodiment, the shape of the four sub-pseudo images after being stitched in the channel dimension is [4,496,432], and by performing the maximum pooling in the channel dimension, the shape of the final original pseudo image P is [496,432].

And 3, spatial information reinforcement is carried out on the preprocessed point cloud.

In the voxelization process, part of spatial information is lost due to the point cloud, so that the accuracy of target detection is reduced, and the spatial information of the pseudo image P needs to be enhanced. In the embodiment, the spatial information of the pseudo image P is enhanced through registration, sampling and fusion, and finally the spatial enhanced pseudo image Ps is generated _i 。

Referring to fig. 2, the specific implementation of this step is as follows:

3.1 Point cloud feature R) _i Registering with the pseudo image feature P to generate an intra-mode mapping matrix M _RP ：

M _RP ＝R _i /P _i ；

3.2 According to intra-modality mapping matrix M _RP And sampling position p' to generate a feature representation V _RP ：

Wherein K represents a bilinear interpolation function,

features representing adjacent pixels at the sampling position p';

3.3 Point cloud feature R using setextraction sampling operations _i Feature extraction is carried out to generate point cloud features R to be fused _i ' its specific implementation is as follows:

3.3.1 Point cloud R) ₁ Obtaining key points by using a furthest point sampling method;

3.3.2 A nearest k neighborhood method is adopted, a plurality of points around the nearest k neighborhood method are grouped by taking the key point as a center, and a local point area is formed;

3.3.3 Encoding a local point region using a simplified PointNet network to obtain a point cloud R ₁ Is characterized by (2);

in this embodiment, if i=1, r ₁ Extracting 4096 points from 20000 points; if i=2, r ₂ Extracting 1024 points from 4096 points; if i=3, r ₃ Extracting 256 points from 1024 points;

3.4 Representing the characteristic V _RP And point cloud characteristics R to be fused _i ' Point-by-Point fusion, generating fused Point cloud characteristics R _i ”：

R _i ”＝σ(Wtanh(UV _RP +VR _i ))

3.5 For the fused point cloud characteristics R) _i "feature of lot number dimension and channel dimension is sequentially compressed by two fully connected layers FC () to obtain spatially enhanced pseudo image Ps _i 。

Ps _i ＝P _i ×FC(FC(R _i ”))

In the present embodiment, the pseudo image Ps is spatially enhanced _i The size of the image is the same as the size of the original pseudo image characteristic P, and is [496,432]]。

And 4, carrying out color information fusion on the point cloud with the enhanced space information.

One of the biggest drawbacks of single-point clouds is the lack of color information, which makes it difficult to improve the accuracy of point cloud target detection. This step is performed by applying a mask to the RGB image F _i Is transformed into a pseudo image so that color information thereof is integrated into a space for the pseudo image Ps _i Performs enhancement to generate a color enhancement pseudo image Pc _i 。

Referring to fig. 3, the specific implementation of this step is as follows:

4.1 Adjusting RGB image F _i Size, size and space-enhancing dummy image Ps _i In the present embodiment, the RGB image F _i Original size of [1300,400 ]]Adjusted to [496,432]]；

4.2 RGB image F after resizing _i And spatially enhanced pseudo image Ps _i The method comprises the steps of sequentially executing BatchNorm operation and ReLu operation, and splicing the two images in the channel dimension to obtain a to-be-transformed image PF; in the present embodiment, the resized RGB image F _i Shape is [3,496,432 ]]Spatially enhanced pseudo image Ps _i Is of the shape [64,496,432 ]]The shape of the spliced diagram PF to be transformed is [67,496,432 ]]；

4.3 Generating a space factor matrix M by using a space attention formula and a channel attention formula of the map PF to be transformed _s And channel factor matrix M _c ：

M _s ＝σ(Conv(AvgPool(PF),MaxPool(PF)))

M _c ＝σ(W ₅ ReLu(W ₆ (PF)))

Wherein σ () represents an activation function, conv () represents a convolution operation, avgPool () represents an average pooling operation, maxPool () represents a maximum pooling operation, W ₅ And W is ₆ Representing two different fully connected layers, respectively, reLu () represents ReLu operation;

4.4 According to the space factor matrix M _s And channel factor matrix M _c Calculating to obtain a pseudo view transformation matrix M _cs ：

M _cs ＝0.6*M _c +0.4*M _s Wherein represents multiplication;

4.5 Transform matrix M) of pseudo-views _cs And RGB image F _i Multiplication to obtain viewing angle conversion pseudo image Pv _i And the pseudo image Pv _i And spatially enhancing the dummy image Ps _i Performing convolution operation after channel dimension is spliced to generate a double-reinforcement pseudo image Pc _i In the present embodiment, the double-enhanced pseudo image Pc _i Is of the shape [64,496,432 ]]。

Step 5, convolution obtains a double-enhanced pseudo image Pc ₃ 。

5.1 For double-enhanced dummy image Pc) _i Convolving to obtain a downsampled double-enhanced pseudo-image Pc _i+1 Returning to the step (3);

5.2 Repeating the step (5.1) to finally obtain the double-reinforced pseudo image Pc ₃ ；

In this embodiment, the number of convolution kernels corresponding to the convolution operation is [6,128,256] in sequence, and the step sizes are [1,2,4] in sequence.

Step 6, according to the double-enhanced pseudo image Pc ₃ Obtaining a feature diagram F to be detected _U 。

6.1 For double-enhanced dummy image Pc) ₃ Performing two transpose convolutions to obtain two transposed feature maps Pt respectively ₁ And Pt (Pt) ₂ ；

6.2 To double-strengthen dummy image Pc) ₃ With transposed two characteristic patterns Pt ₁ 、Pt ₂ Splicing in turnObtaining a feature diagram F to be detected _U In this embodiment, the transpose convolution kernels of both transpose convolutions are 128.

Step 7, feature map F to be detected _U And sending the detection result to an SSD detector to generate a detection result of a front target in the running process of the automobile.

The SSD detector is one of the most excellent existing object detectors, and can directly generate an object detection result for an input feature map, and the object detection result is specifically realized as follows:

7.1 SSD detector pair input to-be-detected feature map F _U Convolving to generate a classification prediction score and a regression prediction score;

7.2 Generating a default frame according to the classification prediction score and the regression prediction score;

7.3 Obtaining classification loss and regression loss according to the default frame;

7.4 Decoding the default frame and suppressing the non-maximum value according to the classification loss and the regression loss to obtain a final detection result.

The effect of the present invention can be further illustrated by the following simulation experiment.

Simulation experiment data set

The data set adopts one of the most internationally authoritative computer vision evaluation data sets KITTI, which contains real images and point cloud data acquired by scenes such as urban areas, villages and highways.

Second, simulation content

The method and the existing 5 single laser radar methods are respectively used: MV3D, voxelNet, SECOND, pointPillars, pointRCNN,5 camera lidar methods: h ² 3DRCNN and Point-GNN, IPOD, F-ConvNet, pointPainting respectively perform target detection on KITTI data sets, and compare their detection accuracies, and the results are shown in Table 1.

TABLE 1 comparison of the inventive method with other prior art methods

The MV3D method is derived from Multi-view3 dobjectdetectionwork for Autonomomousdriving

The VoxelNet method is derived from VoxelNet End-to-EndLearninggforPointCloudBased 3DObject Detection

The SECOND method is derived from Sensors|FreeFull-text|SECOND: sparselyEmbedded ConvolutionalDetection

The PointPicloras method is derived from PointPicloras FastEncoderstfor object detection from PointClouds

The PointRCNN method is from PointRCNN:3DObjectProposalGenerationandDetectionfrom PointCloud

The H is ² The 3DRCNN method is described in from FromMulti-Viewtollow-3D: hallucinatedHollow-3DR-CNN for3DObject detection

The Point-GNN method is derived from the GraphNeuralNetworkfor3DObjectDetectionina PointCloud

The IPOD method is derived from IPOD: intervePoint-basedObjectDetector for PointCloud

The F-ConvNet method is derived from FrustumConvNet: slip FrustumstoAggregateLocalPoint-Wise FeaturesforAmodal DObjectdetection

The PointPaintPaintmethod comes from PointPaintPaintSequencialFusion for3DObject detection

It can be seen from table 1 that the present method achieves excellent results in terms of accuracy, especially for both the bicycle and cyclist types of targets. For example, the accuracy is improved by 2.66% on average compared to the classical target detection algorithm PointPicloras. Wherein the accuracy improves on average by 2.97% for detection of small objects, wherein the accuracy improvement for simple cyclists is greatest (5.27%). Compared with the classical target detection method PointRCNN, the accuracy is improved by 11.50% on average, and the detection accuracy for small targets is improved by 11.3% on average. Among them, the detection accuracy for a simple bicycle is improved by 13.7%.

Claims

1. A multi-category target detection method based on a camera and a laser radar is characterized by comprising the following steps:

(3) Enhancing the spatial information of the original pseudo image P to generate a spatially enhanced pseudo image Ps _i ：

M _RP ＝R _i /P _i ；

Wherein K represents a bilinear interpolation function,

features representing adjacent pixels at the sampling position p';

(3c) Operating point cloud feature R using SetAbstract _i Feature extraction is carried out to generate point cloud features R to be fused _i '；

R _i ”＝σ(Wtanh(UV _RP +VR _i ))

Ps _i ＝P _i ×FC(FC(R _i ”))

(4c) Generating a space factor matrix M by using a space attention formula and a channel attention formula of the map PF to be transformed respectively _s And channel factor matrix M _c ；

(4d) According to a space factor matrix M _s And channel factor matrix M _c Calculating to obtain a pseudo view transformation matrix M _cs ：

M _cs ＝0.6*M _c +0.4*M _s ；

(4e) Transforming the pseudo-view into matrix M _cs And RGB image F _i Multiplication to obtain viewing angle conversion pseudo image Pv _i And the pseudo image Pv _i And spatially enhancing the dummy image Ps _i Splicing in the channel dimension to generate a double-enhanced pseudo image Pc _i ；

(6) For double-enhanced pseudo image Pc ₃ Performing two transpose convolutions to obtain transposed two feature maps Pt respectively ₁ And Pt (Pt) ₂ . Double-enhanced pseudo image Pc ₃ With transposed two characteristic patterns Pt ₁ 、Pt ₂ Sequentially splicing to obtain a feature map F to be detected _U ；

2. The method of claim 1, wherein in step (2) the point cloud R is ₁ (x, y, z, r) voxel pretreatment, realizing the following:

(2a) In the z direction, dividing the point cloud space into four equal-height spaces;

(2b) In the x-y direction, dividing the point cloud in each height space into regular column-shaped sub-point clouds with the bottom surface size of 0.16mX0.16m;

(2c) Setting a sampling threshold D, randomly sampling columnar sub-point clouds with the point number larger than the threshold, and zero filling columnar sub-point clouds with the point number smaller than the threshold;

(2d) Calculating the arithmetic mean (x) of all points in each columnar sub-point cloud _c ,y _c ,z _c ) And offset (x) _p ,y _p )；

(2e) Computed point clouds (x, y, z, r, x) using a maximally pooled and simplified PointNet network _c ,y _c ,z _c ,x _p ,y _p ) Coding to generate sub-pseudo image p corresponding to each height space _i The method comprises the steps of carrying out a first treatment on the surface of the And obtain each sub-pseudo image p _i Corresponding height dimension weight S _i And channel dimension weight T _i ：

S _i ＝W ₂ δ(W ₁ p _i )

T _i ＝W ₄ δ(W ₃ p _i )

Wherein W is ₁ ，W ₂ ，W ₃ ，W ₄ Representing four different fully connected layers, respectively, delta () representing an activation function;

(2f) To four sub-pseudo-images p _i Height dimension weight S corresponding to the same _i And channel dimension weight T _i Multiplication is followed by stitching and maximum pooling operations in the channel dimension to obtain the final original pseudo-image P.

3. The method of claim 1, wherein the point cloud feature R is operated in step (3 c) using setextraction _i The feature extraction is carried out, and the realization is as follows:

(3c1) Point-to-point cloud R ₁ Obtaining key points by using a furthest point sampling method;

(3c2) Grouping a plurality of points around the key point by using a nearest k neighborhood method as a center to form a local point area;

(3c3) Encoding local point regions using a simplified PointNet network to obtain a point cloud R ₁ Is characterized by (3).

4. The method of claim 1, wherein step (4 c) generates the spatial factor matrix M using a spatial attention formula and a channel attention formula, respectively, for the map PF to be transformed _s And channel factor matrix M _c The formula is as follows:

M _s ＝σ(Conv(AvgPool(PF),MaxPool(PF)))

M _c ＝σ(W ₅ ReLu(W ₆ (PF)))

wherein σ () represents an activation function, conv () represents a convolution operation, avgPool () represents an average pooling operation, maxPool () represents a maximum pooling operation, W ₅ And W is ₆ Representing two different fully connected layers, respectively, reLu () represents ReLu operation.

5. The method according to claim 1, wherein the feature pattern F to be detected is obtained in the step (7) _U Sending the detection result to an SSD detector to generate a detection result of a front target in the running process of the automobile, wherein the detection result is realized as follows:

(7a) SSD detector to-be-detected feature map F _U Convolving to generate a classification prediction score and a regression prediction score;

(7b) Generating a default frame according to the classification prediction score and the regression prediction score;

(7c) Obtaining classification loss and regression loss according to a default frame;

(7d) And decoding the default frame and performing non-maximum suppression processing according to the classification loss and the regression loss to obtain a final prediction result.