CN113361601A

CN113361601A - Method for fusing perspective and aerial view characteristics based on unmanned vehicle laser radar data

Info

Publication number: CN113361601A
Application number: CN202110627186.3A
Authority: CN
Inventors: 张雨
Original assignee: Beijing Qingzhou Zhihang Technology Co ltd
Current assignee: Beijing Qingzhou Zhihang Technology Co ltd
Priority date: 2021-06-04
Filing date: 2021-06-04
Publication date: 2021-09-07

Abstract

The embodiment of the invention relates to a method for fusing perspective and aerial view characteristics based on unmanned vehicle laser radar data, which comprises the following steps: acquiring a first point cloud tensor generated by scanning a first target environment by the unmanned vehicle laser radar; performing two-dimensional voxel characteristic extraction processing on the first point cloud tensor to generate a first overlook characteristic tensor; according to a preset perspective mode, carrying out corresponding two-dimensional voxel characteristic extraction processing on the first point cloud tensor to generate a first perspective characteristic tensor; performing feature fusion processing on the first overhead view feature tensor by using the first perspective feature tensor to generate a first fusion feature tensor; the first fused feature tensor is dimensionality reduced using a 1 x 1 convolutional network. The method can reduce the calculation complexity of the voxelization algorithm, meet the requirement of low time delay in the field of automatic driving, and avoid the problem of loss of the height information of the surrounding environment of the self-vehicle due to the fact that only the overlook feature is used.

Description

Method for fusing perspective and aerial view characteristics based on unmanned vehicle laser radar data

Technical Field

The invention relates to the technical field of data processing, in particular to a method for fusing perspective and aerial view characteristics based on unmanned vehicle laser radar data.

Background

The point cloud data (point cloud data) is data for recording scanning information in a point form, and each point cloud data obtained by scanning with the laser radar includes a three-dimensional coordinate (X, Y, Z) and laser reflection Intensity information (Intensity). A voxel is an abbreviation of volume element (voxel), which is the smallest unit segmented in a digital three-dimensional space. In a digital three-dimensional space, a solid containing voxels can be represented in the form of polygons or isosurfaces through operations such as voxel feature extraction, solid rendering and the like. The method is applied to the field of automatic driving, and can input the voxel characteristics extracted from the point cloud data into an artificial intelligence model for target detection or target classification for calculation, so that the classification recognition result of objects around the driving route is obtained.

A common method for extracting point cloud voxel features is a three-dimensional voxelization method (e.g., VoxelNet method) based on a cartesian coordinate system. However, the three-dimensional voxelization method has high calculation complexity and long calculation time, and cannot adapt to the low delay characteristic required by the automatic driving field.

Disclosure of Invention

The invention aims to provide a method for fusing perspective and aerial view features based on unmanned vehicle laser radar data, an electronic device and a computer readable storage medium, wherein the aerial view two-dimensional feature extraction based on a Cartesian coordinate system and the perspective two-dimensional feature extraction based on a perspective three-dimensional coordinate system are respectively carried out on point cloud data, and the extracted two-dimensional aerial view features and the perspective features are subjected to feature fusion so as to obtain semantic information with three-dimensional features. By using the method, the calculation complexity of the voxel algorithm can be reduced, the requirement of low time delay in the field of automatic driving can be met, and the problem of height information loss of the surrounding environment of the self-vehicle caused by only using the overlook feature can be solved.

In order to achieve the above object, a first aspect of the embodiments of the present invention provides a method for fusion of perspective and overhead characteristics based on unmanned vehicle lidar data, where the method includes:

acquiring a first point cloud tensor generated by scanning a first target environment by the unmanned vehicle laser radar;

performing two-dimensional voxel characteristic extraction processing on the first point cloud tensor to generate a first overlooking characteristic tensor;

according to a preset perspective mode, carrying out corresponding two-dimensional voxel characteristic extraction processing on the first point cloud tensor to generate a first perspective characteristic tensor;

performing feature fusion processing on the first overhead feature tensor by using the first perspective feature tensor to generate a first fused feature tensor;

the first fused feature tensor is dimensionality reduced using a 1 x 1 convolutional network.

Preferably, the shape of the first point cloud tensor is X Y Z I, X is a cross-axis coordinate dimension parameter of the three-dimensional coordinate system, Y is a longitudinal-axis coordinate dimension parameter of the three-dimensional coordinate system, Z is a vertical-axis coordinate dimension parameter of the three-dimensional coordinate system, and I is a laser reflection intensity dimension parameter;

the first overlook feature tensor has a shape of H₁*W₁*C₁Said H is₁As a height dimension parameter, said W₁As a width dimension parameter, C₁Is a channel dimension parameter;

the first perspective feature tensor has a shape of H₂*W₂*C₂Said H is₂As a height dimension parameter, said W₂As a width dimension parameter, C₂As a parameter of channel dimension, C₂＝C₁；

The first fused feature tensor has a shape of H₃*W₃*C₃Said H is₃As a height dimension parameter, said W₃As a width dimension parameter, C₃As a parameter of the channel dimension, H₃＝H₁,W₃＝W₁,C₃＝H₂*C₁。

Preferably, the performing two-dimensional voxel feature extraction processing on the first point cloud tensor to generate a first overhead view feature tensor specifically includes:

using PointPillars algorithm, for the first point cloud tensor [ X Y Z I]Performing two-dimensional voxel feature extraction of a Cartesian coordinate system to generate the first overlook feature tensor [ H [ ]₁*W₁*C₁]。

Preferably, the performing, according to a preset perspective mode, corresponding two-dimensional voxel feature extraction processing on the first point cloud tensor to generate a first perspective feature tensor specifically includes:

when the perspective mode is a spherical mode, the cloud tensor [ X Y Z I ] is applied to the first point]Performing two-dimensional voxel feature extraction processing of a spherical coordinate system to generate the first perspective feature tensor [ H ]₂*W₂*C₂]；

When the perspective mode is a cylindrical mode, the cloud tensor [ X Y Z I ] is set for the first point]Performing two-dimensional voxel feature extraction processing of a cylindrical coordinate system to generate the first perspective feature tensor [ H ]₂*W₂*C₂]。

Preferably, the generating a first fused feature tensor by performing feature fusion processing on the first overhead view feature tensor by using the first perspective feature tensor specifically includes:

extracting W from the first perspective feature tensor₂Is H in shape₂*1*C₂First sub-tensor [ H ] of₂*1*C₂]；

Extracting H from the first overlook feature tensor₁*W₁The shape is 1 x C₁Second sub-tensor [1 x 1C ] of₁]；

For each of said second sub-tensors [1 x C₁]Selecting a corresponding one of said first sub-tensors [ H ]₂*1*C₂]As a first corresponding sub-tensor [ H ]₂*1*C₂]；

Reusing each of the second sub-tensors [1 x C ]₁]A sub-tensor [ H ] corresponding to the first₂*1*C₂]Performing feature fusion to obtain corresponding shape 1 × C₃Third sub-tensor [1 x 1C ] of₃](ii) a Wherein, C₃＝H₂*C₁；

Final from H obtained₁*W₁A third sub-tensor [1 x C ]₃]Composing the first fused feature tensor [ H₃*W₃*C₃](ii) a Wherein H₃＝H₁,W₃＝W₁。

Further, the method comprisesSaid using each of said second sub-tensors [1 x C₁]A sub-tensor [ H ] corresponding to the first₂*1*C₂]Performing feature fusion to obtain corresponding shape 1 × C₃Third sub-tensor [1 x 1C ] of₃]Specifically, include:

from the first corresponding sub-tensor [ H₂*1*C₂]Extracting to obtain H₂The shape is 1 x C₂Fourth partial sheet of (1 x 1C)₂]；

Using the second sub-tensor [1 x C₁]With each of said fourth sub-tensors [1 x C, respectively₂]Carrying out tensor cross multiplication to obtain H₂The shape is 1 x C₁The fifth tensor [1 x 1C ] of₁]；

To H₂A fifth tensor [1 x C ]₁]Channel merging was performed to obtain a shape of 1 x C₃Of the third sub-tensor [1 x 1C₃]Wherein, C₃＝H₂*C₁。

A second aspect of an embodiment of the present invention provides an electronic device, including: a memory, a processor, and a transceiver;

the processor is configured to be coupled to the memory, read and execute instructions in the memory, so as to implement the method steps of the first aspect;

the transceiver is coupled to the processor, and the processor controls the transceiver to transmit and receive messages.

A third aspect of embodiments of the present invention provides a computer-readable storage medium storing computer instructions that, when executed by a computer, cause the computer to perform the method of the first aspect.

The embodiment of the invention provides a method for fusing perspective and aerial view characteristics based on unmanned vehicle laser radar data, electronic equipment and a computer readable storage medium. By using the method, the calculation complexity of the voxelization algorithm is reduced, the requirement of low time delay in the field of automatic driving is met, and the problem of height information loss of the surrounding environment of the vehicle caused by only using the overlooking characteristic is solved.

Drawings

Fig. 1 is a schematic diagram of a method for fusion of perspective and aerial view characteristics based on unmanned vehicle lidar data according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an electronic device according to a second embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is apparent that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

An embodiment of the present invention provides a method for fusing perspective and overhead view characteristics based on unmanned vehicle lidar data, as shown in fig. 1, which is a schematic diagram of the method for fusing perspective and overhead view characteristics based on unmanned vehicle lidar data according to the embodiment of the present invention, and the method mainly includes the following steps:

step 1, acquiring a first point cloud tensor generated by scanning a first target environment by an unmanned vehicle laser radar;

the shape of the first point cloud tensor is X, Y, Z and I, X is a cross-axis coordinate dimension parameter of the three-dimensional coordinate system, Y is a longitudinal-axis coordinate dimension parameter of the three-dimensional coordinate system, Z is a vertical-axis coordinate dimension parameter of the three-dimensional coordinate system, and I is a laser reflection intensity dimension parameter.

Here, as described above, the point cloud data includes a three-dimensional coordinate system and a laser reflection intensity; the unmanned vehicle radar performs target environment scanning on a specified scanning range according to a set scanning frequency in the process of performing the target environment scanning, obtains a plurality of scanning data, converts the obtained plurality of scanning data according to a point cloud three-dimensional coordinate to obtain a plurality of point cloud data, creates a calculation tensor according to the three-dimensional coordinate and the scanning intensity form for all the point cloud data, and obtains a first point cloud tensor, wherein if the laser reflection intensity is a scalar, the shape of the first point cloud tensor is X, Y, Z1, and if the laser reflection intensity is a vector, the shape of the first point cloud is X, Y, Z, I.

Step 2, performing two-dimensional voxel characteristic extraction processing on the first point cloud tensor to generate a first overlook characteristic tensor;

wherein the first overlook feature tensor has a shape of H₁*W₁*C₁，H₁As a height dimension parameter, W₁As a parameter of the width dimension, C₁Is a channel dimension parameter;

the method specifically comprises the following steps: using PointPillars algorithm, the first point cloud tensor [ X Y Z I]Two-dimensional voxel characteristic extraction of a Cartesian coordinate system is carried out, and a first overlook characteristic tensor [ H ] is generated₁*W₁*C₁]。

Here, the principle of the PointPillars algorithm can be found in the technical paper "PointPillars: Fast Encoders for Object Detection from Point cloud". The embodiment of the invention completes the processing process of projecting the three-dimensional coordinates of the Cartesian coordinates system (Cartesian coordinates system) of the point cloud to the two-dimensional coordinates by training the PointPillars model, and the processing process of extracting the two-dimensional characteristics of the projected point cloud. Specifically, the method includes the steps that a well-trained PointPillars model is used for firstly obtaining two-dimensional maximum values (X-direction maximum value and Y-direction maximum value) of an input first point cloud tensor on an xy plane of an overlooking angle; then, a shape H is drawn according to the X-direction maximum value, the Y-direction maximum value and the set grid unit₁*W₁Each cell grid is set to correspond to a Pillar (pilar) tensor; then referring to the x/y two-dimensional coordinates of each point cloud data in the first point cloud tensor, distributing each point cloud to the corresponding pilar, and completing filling of each pilar tensor, wherein the model is also used for processing the three-dimensional coordinates of the point cloudA process of projection to a two-dimensional coordinate; then according to a set sampling strategy, data preprocessing such as noise filtering and sampling is carried out on the point cloud data in each Pillar tensor, the preprocessed Pillar tensors are input into a convolution network for feature extraction, and a final two-dimensional feature tensor, namely a first overlooking feature tensor [ H ] is obtained₁*W₁*C₁]。

Step 3, according to a preset perspective mode, performing corresponding two-dimensional voxel characteristic extraction processing on the first point cloud tensor to generate a first perspective characteristic tensor;

wherein the first perspective feature tensor has a shape of H₂*W₂*C₂，H₂As a height dimension parameter, W₂As a parameter of the width dimension, C₂As a parameter of channel dimension, C₂＝C₁；

Here, the perspective mode according to the embodiment of the present invention supports two modes, a Spherical mode in which a Spherical coordinate system (Spherical coordinate system) is used as a perspective structure, and a Cylindrical mode in which a Cylindrical coordinate system (Cylindrical coordinate system) is used as a perspective structure;

the method specifically comprises the following steps: step 31, when the perspective mode is the spherical mode, the first point cloud tensor [ X Y Z I ] is processed]Two-dimensional voxel characteristic extraction processing of a spherical coordinate system is carried out to generate a first perspective characteristic tensor [ H₂*W₂*C₂]；

Here, the perspective mode is a spherical mode, which means that when two-dimensional feature extraction of spherical three-dimensional coordinates is performed on the input first point cloud tensor, spherical coordinate conversion is performed on the input first point cloud tensor; then the spherical surface is unfolded and marked into the shape of H₂*W₂Each grid cell is set to correspond to a cell tensor; then, referring to the spherical coordinates of each point cloud data in the first point cloud tensor, projecting each point cloud to a corresponding grid unit, and completing filling of each unit tensor; then according to a set sampling strategy, carrying out data preprocessing such as noise filtering and sampling on the point cloud data in each unit tensor, and inputting the preprocessed unit tensor into a convolution networkExtracting line features to obtain a final two-dimensional feature tensor, namely a first perspective feature tensor [ H ]₂*W₂*C₂]；

Step 32, when the perspective mode is the cylindrical mode, the first point cloud tensor [ X Y Z I ] is processed]Two-dimensional voxel characteristic extraction processing of a cylindrical coordinate system is carried out to generate a first perspective characteristic tensor [ H₂*W₂*C₂]。

Here, the perspective mode is a cylindrical mode, which means that when two-dimensional feature extraction of cylindrical three-dimensional coordinates is to be performed on the input first point cloud tensor, cylindrical coordinate conversion is performed on the input first point cloud tensor; then, the column is unfolded to form a shape of H₂*W₂Each grid cell is set to correspond to a cell tensor; then, referring to the cylindrical coordinates of each point cloud data in the first point cloud tensor, projecting each point cloud to a corresponding grid unit, and completing filling of each unit tensor; then according to a set sampling strategy, data preprocessing such as noise filtering and sampling is carried out on the point cloud data in each unit tensor, the preprocessed unit tensor is input into a convolution network for feature extraction, and a final two-dimensional feature tensor, namely a first perspective feature tensor [ H ] is obtained₂*W₂*C₂]。

Step 4, performing feature fusion processing on the first aerial view feature tensor by using the first perspective feature tensor to generate a first fusion feature tensor;

wherein the first fused feature tensor has a shape of H₃*W₃*C₃，H₃As a height dimension parameter, W₃As a parameter of the width dimension, C₃As a parameter of the channel dimension, H₃＝H₁,W₃＝W₁,C₃＝H₂*C₁；

The two-dimensional perspective feature based on the perspective relation is used for carrying out feature fusion on the two-dimensional aerial view feature, so that the vertical dimension feature perpendicular to the horizontal/longitudinal dimensions can be obtained on the basis of the two-dimensional aerial view feature, and the purpose of adding three-dimensional semantic information to the two-dimensional aerial view feature is achieved;

the method specifically comprises the following steps: step 41, extracting W from the first perspective feature tensor₂Is H in shape₂*1*C₂First sub-tensor [ H ] of₂*1*C₂]；

Here, if H is used₁Is the maximum of the vertical axis of the grid, in W₁Drawing a grid graph for the maximum value of the horizontal axis of the grid to obtain a grid graph H₁Line W₁A first overhead grid diagram formed by orthogonal columns; then each first sub-tensor H₂*1*C₂]Actually corresponds to a column in the first perspective grid map;

step 42, extracting H from the first overlooking feature tensor₁*W₁The shape is 1 x C₁Second partial sheet of (1 x 1C)₁]；

Here, if H is used₂Is the maximum of the vertical axis of the grid, in W₂Drawing a grid graph for the maximum value of the horizontal axis of the grid to obtain a grid graph H₂Line W₂A first perspective grid map formed by orthogonal columns; each second sub-tensor [1 x 1C ]₁]Actually, one cell in the corresponding first overhead grid map;

step 43, for each second sub-tensor [1 x C₁]Selecting a corresponding first sub-tensor [ H ]₂*1*C₂]As a first corresponding sub-tensor [ H ]₂*1*C₂]；

Using each of the second sub-tensors [1 x C ], step 44₁]A sub-tensor [ H ] corresponding to the first₂*1*C₂]Performing feature fusion to obtain corresponding shape 1 × C₃Third sub-tensor [1 x 1C ] of₃](ii) a Wherein, C₃＝H₂*C₁；

Here, the actual one-dimensional overhead feature vector and the one-dimensional list of perspective feature vectors are cross-multiplied, and then all the feature tensors with the third-dimensional information output by the cross-multiplication are combined to obtain one-dimensional vector with a plurality of third-dimensional features, that is, a third sub-tensor [1 x C₃]；

The method specifically comprises the following steps: in a step 441 of the method,from the first corresponding sub-tensor [ H ]₂*1*C₂]Extracting to obtain H₂The shape is 1 x C₂Of the fourth sub-tensor [1 x 1C₂]；

Step 442, using the second sub-tensor [1 x C₁]With each fourth sub-tensor [1 x C, respectively₂]Carrying out tensor cross multiplication to obtain H₂The shape is 1 x C₁The fifth tensor [1 x 1C ] of₁]；

The process of cross multiplication is respectively carried out on the one-dimensional overlooking characteristic vector and a row of one-dimensional perspective characteristic vectors;

step 443, for H₂The fifth tensor [1 x C ]₁]Channel merging was performed to obtain a shape of 1 x C₃Third sub-tensor [1 x 1C ] of₃]Wherein, C₃＝H₂*C₁。

Here, the process is to cascade all cross multiplication results according to the channel dimension;

step 45, finally, from the H obtained₁*W₁A third sub-tensor [1 x C ]₃]Constitute a first fused feature tensor [ H₃*W₃*C₃](ii) a Wherein H₃＝H₁,W₃＝W₁。

Here, the process is to integrate all the one-dimensional overhead feature vectors that have completed the third-dimensional feature fusion.

Step 5, using 1 × 1 convolution network to pair the first fused feature tensor [ H₃*W₃*C₃]And (5) performing dimensionality reduction treatment.

Here, a 1 × 1 convolutional network is often used to perform dimensionality reduction on the high-dimensional tensor.

After that, the first fusion feature tensor after dimension reduction can be input into an artificial intelligence model for target detection or target classification as the three-dimensional voxel feature information of the point cloud for calculation, so as to obtain a classification recognition result of the objects around the driving route.

Fig. 2 is a schematic structural diagram of an electronic device according to a second embodiment of the present invention. The electronic device may be the terminal device or the server, or may be a terminal device or a server connected to the terminal device or the server and implementing the method according to the embodiment of the present invention. As shown in fig. 2, the electronic device may include: a processor 301 (e.g., a CPU), a memory 302, a transceiver 303; the transceiver 303 is coupled to the processor 301, and the processor 301 controls the transceiving operation of the transceiver 303. Various instructions may be stored in memory 302 for performing various processing functions and implementing the processing steps described in the foregoing method embodiments. Preferably, the electronic device according to an embodiment of the present invention further includes: a power supply 304, a system bus 305, and a communication port 306. The system bus 305 is used to implement communication connections between the elements. The communication port 306 is used for connection communication between the electronic device and other peripherals.

The system bus 305 mentioned in fig. 2 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus. The communication interface is used for realizing communication between the database access device and other devices (such as a client, a read-write library and a read-only library). The Memory may include Random Access Memory (RAM) and may also include Non-volatile Memory (Non-volatile e-Memory), such as at least one disk Memory.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), a Graphics Processing Unit (GPU), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

It should be noted that the embodiment of the present invention also provides a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to execute the method and the processing procedure provided in the above-mentioned embodiment.

The embodiment of the present invention further provides a chip for executing the instructions, where the chip is configured to execute the processing steps described in the foregoing method embodiment.

Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only examples of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method for fusion of perspective and aerial view features based on unmanned vehicle lidar data, the method comprising:

performing feature fusion processing on the first overhead feature tensor by using the first perspective feature tensor to generate a first fusion feature tensor;

2. The method for perspective and aerial view feature fusion based on unmanned vehicle lidar data of claim 1,

the shape of the first point cloud tensor is X, Y, Z and I, wherein X is a cross-axis coordinate dimension parameter of a three-dimensional coordinate system, Y is a longitudinal-axis coordinate dimension parameter of the three-dimensional coordinate system, Z is a vertical-axis coordinate dimension parameter of the three-dimensional coordinate system, and I is a laser reflection intensity dimension parameter;

the first overlook feature tensor has a shape of H₁*W₁*C₁Said H is₁As a height dimension parameter, said W₁Is the width dimensionDegree parameter, said C₁Is a channel dimension parameter;

3. The method for fusion of perspective and overhead view features based on the unmanned vehicle lidar data according to claim 2, wherein the generating a first overhead view feature tensor by performing the two-dimensional voxel feature extraction processing on the first point cloud tensor specifically comprises:

using PointPillars algorithm, for the first point cloud tensor [ X Y Z I]Extracting two-dimensional voxel characteristics of a Cartesian coordinate system to generate the first overlook characteristic tensor [ H ]₁*W₁*C₁]。

4. The method according to claim 2, wherein the generating a first perspective feature tensor by performing corresponding two-dimensional voxel feature extraction processing on the first point cloud tensor according to a preset perspective mode specifically includes:

When the perspective mode is a cylindrical mode, the cloud tensor [ X Y Z I ] is set for the first point]Performing two-dimensional voxel characteristic extraction processing of cylindrical coordinate system to generateThe first perspective feature tensor [ H [ ]₂*W₂*C₂]。

5. The method according to claim 2, wherein the generating a first fusion feature tensor by performing feature fusion processing on the first overhead feature tensor using the first perspective feature tensor includes:

6. The method for perspective and overhead feature fusion based on unmanned vehicle lidar data of claim 5, wherein the using each of the second sub-tensors [1 x 1C₁]A sub-tensor [ H ] corresponding to the first₂*1*C₂]Performing feature fusion to obtain corresponding shape 1 × C₃Third sub-tensor [1 x 1C ] of₃]The method specifically comprises the following steps:

from the first corresponding sub-tensor [ H₂*1*C₂]Extracting to obtain H₂The shape is 1 x C₂Of the fourth sub-tensor [1 x 1C₂]；

7. An electronic device, comprising: a memory, a processor, and a transceiver;

the processor is used for being coupled with the memory, reading and executing the instructions in the memory to realize the method steps of any one of claims 1-6;

8. A computer-readable storage medium having stored thereon computer instructions which, when executed by a computer, cause the computer to perform the method of any of claims 1-6.