CN114782786A - Feature fusion processing method and device for point cloud and image data - Google Patents

Feature fusion processing method and device for point cloud and image data Download PDF

Info

Publication number
CN114782786A
CN114782786A CN202210536129.9A CN202210536129A CN114782786A CN 114782786 A CN114782786 A CN 114782786A CN 202210536129 A CN202210536129 A CN 202210536129A CN 114782786 A CN114782786 A CN 114782786A
Authority
CN
China
Prior art keywords
feature
tensor
point cloud
image data
generate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210536129.9A
Other languages
Chinese (zh)
Inventor
张雨
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Qingyu Technology Co Ltd
Original Assignee
Suzhou Qingyu Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Qingyu Technology Co Ltd filed Critical Suzhou Qingyu Technology Co Ltd
Priority to CN202210536129.9A priority Critical patent/CN114782786A/en
Publication of CN114782786A publication Critical patent/CN114782786A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Abstract

The embodiment of the invention relates to a method and a device for fusing and processing characteristics of point cloud and image data, wherein the method comprises the following steps: acquiring first point cloud data and first image data; performing aerial view feature extraction on the first point cloud data to generate a first feature map tensor; performing aerial view feature extraction on the first image data to generate a second feature map tensor; cascading the first and second eigen map tensors to generate a third eigen map tensor; calculating a first position coding tensor corresponding to the third eigen map tensor according to a position coding rule of the Transformer model; inputting the third eigen map tensor and the first position coding tensor into a Transformer model for self-attention operation; and the output result of the model operation is used as the corresponding fusion characteristic tensor. The invention can achieve the purpose of multi-sensor aerial view characteristic fusion, and also can achieve the purpose of reducing development and maintenance cost by additionally maintaining a fusion model.

Description

Feature fusion processing method and device for point cloud and image data
Technical Field
The invention relates to the technical field of data processing, in particular to a method and a device for feature fusion processing of point cloud and image data.
Background
The perception module of the unmanned system performs multi-target tracking by taking Bird's Eye View (BEV) image characteristics as reference, and can further improve the tracking efficiency. In a conventional situation, the sensing module acquires the aerial view characteristics through image data shot by a camera or acquires the aerial view characteristics through point cloud data scanned by a laser radar, and the aerial view characteristics can be rarely acquired on the premise of fusing the commonalities of the image data and the point cloud data, because the respective calculated amount and maintenance amount of an image-based or point cloud-based aerial view characteristic extraction model are large, and if the calculated amount and the maintenance amount are additionally established, the two are fused together, larger resource loss is inevitably caused.
Disclosure of Invention
The invention aims to provide a feature fusion processing method, a feature fusion processing device, an electronic device and a computer-readable storage medium for point cloud and image data, which aim to overcome the defects of the prior art. By the bird's-eye view feature fusion processing mechanism provided by the invention, the purpose of multi-sensor bird's-eye view feature fusion can be achieved, and the purpose of reducing development and maintenance cost can be achieved without additionally maintaining a fusion model.
In order to achieve the above object, a first aspect of the embodiments of the present invention provides a method for feature fusion processing of point cloud and image data, where the method includes:
acquiring first point cloud data and first image data;
performing aerial view feature extraction processing on the first point cloud data to generate a corresponding first feature map tensor;
performing bird's-eye view feature extraction processing on the first image data to generate a corresponding second feature map tensor;
performing cascade processing on the first and second eigen map tensors to generate a corresponding third eigenmap tensor;
calculating a position coding tensor corresponding to the third feature map tensor according to a position coding rule of a Transformer model to obtain a corresponding first position coding tensor; inputting the third feature map tensor and the corresponding first position coding tensor into a Transformer model for self-attention operation; and the output result of the model operation is used as the corresponding fusion characteristic tensor.
Preferably, the generating a corresponding first feature map tensor by performing the bird's-eye view feature extraction processing on the first point cloud data specifically includes:
and carrying out aerial view plane pseudo-image conversion processing on the first point cloud data based on a PointPillars model, and carrying out two-dimensional image feature extraction processing on the aerial view plane pseudo-image obtained by conversion to generate the first feature image tensor.
Preferably, the generating a corresponding second feature map tensor by performing the bird's-eye view feature extraction processing on the first image data specifically includes:
and inputting the first image data into a BevFormer model to perform two-dimensional image aerial-view feature extraction to generate a second feature map tensor.
Preferably, the first eigenmap tensor has a shape H1*W1*C1,H1Is the height, W, of the feature map1Is the width, C, of the feature map1The total number of data channels;
the second eigen map tensor has a shape H2*W2*C2,H2Is a feature height and H2=H1、W2Is a feature width of W2=W1、C2The total number of data channels;
the third eigen map tensor has a shape H3*W3*C3,H3Is feature height and H3=H2=H1、W3Is a feature width of W3=W2=W1、C3Is the total number of data channels and C3=(C1+C2);
The fused feature tensor has a shape of H4*W4*C4,H4Is feature height and H4=H3=H2=H1、W4Is a feature width of W4=W3=W2=W1、C4Is the total number of data channels and C4=C3=(C1+C2)。
A second aspect of the embodiments of the present invention provides an apparatus for implementing the method for feature fusion processing of point cloud and image data according to the first aspect, where the apparatus includes: the system comprises an acquisition module, a point cloud aerial view feature processing module, an image aerial view feature processing module and a feature fusion processing module;
the acquisition module is used for acquiring first point cloud data and first image data;
the point cloud aerial view feature processing module is used for extracting aerial view features of the first point cloud data to generate a corresponding first feature map tensor;
the image aerial view feature processing module is used for performing aerial view feature extraction processing on the first image data to generate a corresponding second feature map tensor;
the feature fusion processing module is used for performing cascade processing on the first and second eigen map tensors to generate a corresponding third eigen map tensor; calculating a position coding tensor corresponding to the third eigen map tensor according to a position coding rule of a Transformer model to obtain a corresponding first position coding tensor; inputting the third eigen map tensor and the corresponding first position coding tensor into a Transformer model for self-attention operation; and the output result of the model operation is used as the corresponding fusion characteristic tensor.
A third aspect of an embodiment of the present invention provides an electronic device, including: a memory, a processor, and a transceiver;
the processor is configured to be coupled to the memory, read and execute instructions in the memory, so as to implement the method steps of the first aspect;
the transceiver is coupled to the processor, and the processor controls the transceiver to transmit and receive messages.
A fourth aspect of the embodiments of the present invention provides a computer-readable storage medium, which stores computer instructions that, when executed by a computer, cause the computer to execute the instructions of the method according to the first aspect.
The embodiment of the invention provides a feature fusion processing method and device for point cloud and image data, electronic equipment and a computer readable storage medium. By the bird's-eye view feature fusion processing mechanism provided by the invention, the bird's-eye view feature fusion of multiple sensors is realized, no additional fusion model is added, and the development and maintenance cost is reduced.
Drawings
Fig. 1 is a schematic diagram of a feature fusion processing method for point cloud and image data according to an embodiment of the present invention;
fig. 2 is a block diagram of a feature fusion processing apparatus for point cloud and image data according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention.
Detailed Description
To make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
An embodiment of the present invention provides a method for feature fusion processing of point cloud and image data, as shown in fig. 1, which is a schematic diagram of a method for feature fusion processing of point cloud and image data provided in an embodiment of the present invention, the method mainly includes the following steps:
step 1, first point cloud data and first image data are obtained.
The first point cloud data are point cloud data generated by the vehicle-mounted laser radar, and a sensing module of the vehicle unmanned system obtains the first point cloud data from the vehicle-mounted laser radar; the sensing module acquires the first image data from the vehicle-mounted camera; in the embodiment of the invention, the generation time of the first point cloud data and the generation time of the first image data are matched with each other by default, and the corresponding spatial ranges are also matched with each other.
Step 2, performing aerial view feature extraction processing on the first point cloud data to generate a corresponding first feature map tensor;
the method specifically comprises the following steps: carrying out aerial view plane pseudo-image conversion processing on the first point cloud data based on a PointPillars model, and carrying out two-dimensional image feature extraction processing on the aerial view plane pseudo-image obtained through conversion to generate a first feature image tensor;
wherein the shape of the first eigen map tensor is H1*W1*C1,H1Is the height, W, of the feature map1Is the width, C, of the feature map1Is the total number of data channels.
Here, the embodiment of the present invention may extract the bird's-eye view features of the first point cloud data based on a plurality of mature models capable of identifying bird's-eye view features of the point cloud data, so as to obtain a corresponding bird's-eye view feature tensor, that is, the first feature map tensor, and the pointpilars model is used as a default. For the model implementation of PointPillars model, refer to the paper "PointPillars: Fast Encoders for Object Detection from Point cloud", which is not further described herein; it is known from the thesis that the pointclouds model is composed of a point cloud Pillar Feature extraction network (Pillar Feature Net), a two-dimensional Feature extraction Backbone network (back bone (2D CNN)) and a target Detection Head (SSD)), wherein the point cloud Pillar Feature extraction network is used for performing point cloud Pillar (Pillar) clustering on input point clouds and performing aerial view plane projection on the point cloud pillars and outputting a final projection result as an aerial view plane Pseudo-Image (Pseudo Image), and the two-dimensional Feature extraction Backbone network is based on the point cloud Pillar Feature extraction networkTwo-dimensional image feature extraction is carried out on the aerial view plane pseudo-image in a traditional multi-level down-sampling convolution network, the target detection head classifies the extracted aerial view features and restores classification results to original point cloud data to increase semantic features for each point in the original point cloud data. When the aerial view plane pseudo-image conversion processing is carried out on the first point cloud data based on the PointPillars model, the point cloud column feature extraction network of the PointPillars model is used for carrying out aerial view plane pseudo-image conversion processing on the first point cloud data to obtain a corresponding aerial view plane pseudo-image tensor, and the two-dimensional feature extraction backbone network of the PointPillars model is used for carrying out two-dimensional image feature extraction processing on the aerial view plane pseudo-image tensor to generate a corresponding first feature image tensor. The output tensor structure of the two-dimensional feature extraction backbone network can know that the first feature map tensor is a three-dimensional map tensor and the shape of the first feature map tensor is H1*W1*C1Wherein H is1Is the feature height, W1Is the feature width, C1The first eigen map tensor is understood as a two-dimensional image, the image being H, for the total number of data channels1*W1Each pixel point is composed of C1And (4) characteristic data.
Step 3, performing aerial view feature extraction processing on the first image data to generate a corresponding second feature map tensor;
the method specifically comprises the following steps: inputting the first image data into a BevFormer model to perform two-dimensional image aerial view feature extraction to generate a second feature map tensor;
wherein the second eigen map tensor has a shape of H2*W2*C2,H2Is feature height and H2=H1、W2Is the feature width and W2=W1、C2Is the total number of data channels.
Here, in the embodiment of the present invention, bird's-eye view features of first image data may be extracted based on a plurality of mature models capable of identifying bird's-eye view features of image data, so as to obtain a corresponding bird's-eye view feature tensor, that is, a second feature map tensor, and a BevFormer model is used as a default. The model realization of the BevFormer model can refer to the paper BEVFormLer, Learning Bird's-Eye-View retrieval from Multi-Camera Images via spatial mapping transformations, which will not be further described herein. According to the embodiment of the invention, after the first image data is input into the BevFormer model for two-dimensional image aerial-view feature extraction, the BevFormer model obtains the historical aerial-view time features of the first image data through query, obtains the real-time image features of the first image data through the feature extraction network, and then performs space-time feature aggregation on the obtained historical aerial-view time features and the real-time image features to obtain the corresponding second feature map tensor. Here, the second eigen map tensor is also actually a three-dimensional map tensor having the shape H2*W2*C2,H2Is the height, W, of the feature map2Is the width, C, of the feature map2The total number of data channels, i.e. the second feature tensor can be understood as a two-dimensional image, which is represented by H2*W2Each pixel point is composed of C2And (4) characteristic data. In order to facilitate subsequent feature cascade processing, in the embodiment of the present invention, the sizes of the feature tensors output by the pointpilars model and the BevFormer model are specifically set to be the same, that is, H is ensured by setting the model parameters2=H1、W2=W1
Step 4, cascade processing is carried out on the first characteristic image tensor and the second characteristic image tensor to generate a corresponding third characteristic image tensor;
wherein the third eigen map tensor has the shape H3*W3*C3,H3Is a feature height and H3=H2=H1、W3Is the feature width and W3=W2=W1、C3Is the total number of data channels and C3=(C1+C2)。
Step 5, calculating a position coding tensor corresponding to the third feature map tensor according to a position coding rule of the Transformer model to obtain a corresponding first position coding tensor; inputting the third eigen map tensor and the corresponding first position coding tensor into a Transformer model for self-attention operation; and using the output result of the model operation as the corresponding fusion characteristic tensor;
wherein the shape of the fused feature tensor is H4*W4*C4,H4Is a feature height and H4=H3=H2=H1、W4Is the feature width and W4=W3=W2=W1、C4Is the total number of data channels and C4=C3=(C1+C2)。
Here, the model implementation of the transform model can refer to the article "Attention Is All You Need", which Is not further described herein. As can be seen from the paper, the input of the Transformer model includes two parts: the calculation mode of the position coding tensor is determined by a position coding rule of a Transformer model, the position coding rule of the Transformer model comprises a sine coding rule and a cosine coding rule, and the sine coding rule is adopted by default in the embodiment of the invention. The thesis shows that the Transformer model comprises an encoder and a decoder, the third feature map tensor and the corresponding first position encoding tensor are input into the encoder to be encoded step by step, and the decoder is used for decoding step by step to obtain a final model operation output result, namely the fusion feature tensor. As can be seen from the input/output structure of the Transformer model, the shape of the output fusion feature tensor should match the shape of the input third feature map tensor, so the shape H of the fusion feature tensor is4*W4*C4Is actually H3*W3*C3
The fusion feature tensor obtained through the steps 1-5 not only contains the aerial view feature of the point cloud, but also contains the aerial view feature of the image, and the perception module can perform multi-target tracking subsequently based on the fusion feature tensor as a reference.
Fig. 2 is a block diagram of a feature fusion processing apparatus for point cloud and image data according to a second embodiment of the present invention, where the apparatus is a terminal device or a server for implementing the foregoing method embodiment, and may also be an apparatus capable of enabling the foregoing terminal device or server to implement the foregoing method embodiment, for example, the apparatus may be an apparatus or a chip system of the foregoing terminal device or server. As shown in fig. 2, the apparatus includes: the system comprises an acquisition module 201, a point cloud aerial view feature processing module 202, an image aerial view feature processing module 203 and a feature fusion processing module 204.
The obtaining module 201 is configured to obtain first point cloud data and first image data.
The point cloud aerial view feature processing module 202 is configured to perform aerial view feature extraction processing on the first point cloud data to generate a corresponding first feature map tensor.
The image bird's-eye-view feature processing module 203 is configured to perform bird's-eye-view feature extraction processing on the first image data to generate a corresponding second feature map tensor.
The feature fusion processing module 204 is configured to cascade the first and second eigen map tensors to generate a corresponding third eigenmap tensor; calculating a position coding tensor corresponding to the third feature map tensor according to a position coding rule of the Transformer model to obtain a corresponding first position coding tensor; inputting the third eigen map tensor and the corresponding first position coding tensor into a Transformer model for self-attention operation; and the output result of the model operation is used as the corresponding fusion characteristic tensor.
The feature fusion processing device for point cloud and image data provided by the embodiment of the invention can execute the method steps in the method embodiment, and the implementation principle and the technical effect are similar, and are not repeated herein.
It should be noted that the division of each module of the above apparatus is only a logical division, and all or part of the actual implementation may be integrated into one physical entity or may be physically separated. And these modules can all be implemented in the form of software invoked by a processing element; or can be implemented in the form of hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the obtaining module may be a processing element separately set up, or may be implemented by being integrated in a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and a processing element of the apparatus calls and executes the functions of the determining module. The other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.
For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more Digital Signal Processors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), etc. For another example, when some of the above modules are implemented in the form of a Processing element scheduler code, the Processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor that can call the program code. As another example, these modules may be integrated together and implemented in the form of a System-on-a-chip (SOC).
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions described in accordance with the foregoing method embodiments may be generated, in whole or in part, when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, bluetooth, microwave, etc.) means, the computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes one or more of the available media, which may be magnetic media, (e.g., floppy disks, hard disks, magnetic tape), optical media (e.g., DVD), or semiconductor media (e.g., Solid State Disk (SSD)), etc.
Fig. 3 is a schematic structural diagram of an electronic device according to a third embodiment of the present invention. The electronic device may be the terminal device or the server, or may be a terminal device or a server connected to the terminal device or the server and implementing the method according to the embodiment of the present invention. As shown in fig. 3, the electronic device may include: a processor 301 (e.g., CPU), memory 302, transceiver 303; the transceiver 303 is coupled to the processor 301, and the processor 301 controls the transceiving operation of the transceiver 303. Various instructions may be stored in memory 302 for performing various processing functions and implementing the processing steps described in the foregoing method embodiments. Preferably, the electronic device according to an embodiment of the present invention further includes: a power supply 304, a system bus 305, and a communication port 306. The system bus 305 is used to implement communication connections between the elements. The communication port 306 is used for connection communication between the electronic device and other peripheral devices.
The system bus 305 mentioned in fig. 3 may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The system bus may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 3, but that does not indicate only one bus or one type of bus. The communication interface is used for realizing communication between the database access device and other equipment (such as a client, a read-write library and a read-only library). The Memory may include a Random Access Memory (RAM), and may further include a Non-Volatile Memory (Non-Volatile Memory), such as at least one disk Memory.
The Processor may be a general-purpose Processor, and includes a central Processing Unit CPU, a Network Processor (NP), a Graphics Processing Unit (GPU), and the like; but also a digital signal processor DSP, an application specific integrated circuit ASIC, a field programmable gate array FPGA or other programmable logic device, discrete gate or transistor logic, discrete hardware components.
It should be noted that the embodiment of the present invention also provides a computer-readable storage medium, which stores instructions that, when executed on a computer, cause the computer to execute the method and the processing procedure provided in the above-mentioned embodiment.
The embodiment of the present invention further provides a chip for executing the instruction, where the chip is configured to execute the processing steps described in the foregoing method embodiment.
The embodiment of the invention provides a feature fusion processing method and device for point cloud and image data, an electronic device and a computer readable storage medium. The bird's-eye view feature fusion processing mechanism provided by the invention not only realizes the bird's-eye view feature fusion of multiple sensors, but also reduces the development and maintenance cost without additionally adding a fusion model.
Those of skill would further appreciate that the various illustrative components and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the components and steps of the various examples have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the technical solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only examples of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (7)

1. A feature fusion processing method of point cloud and image data is characterized by comprising the following steps:
acquiring first point cloud data and first image data;
performing aerial view feature extraction processing on the first point cloud data to generate a corresponding first feature map tensor;
performing aerial view feature extraction processing on the first image data to generate a corresponding second feature map tensor;
performing cascade processing on the first and second eigen map tensors to generate a corresponding third eigenmap tensor;
calculating a position coding tensor corresponding to the third feature map tensor according to a position coding rule of a Transformer model to obtain a corresponding first position coding tensor; inputting the third eigen map tensor and the corresponding first position coding tensor into a Transformer model for self-attention operation; and the output result of the model operation is used as the corresponding fusion characteristic tensor.
2. The feature fusion processing method of point cloud and image data according to claim 1, wherein the performing the bird's eye feature extraction processing on the first point cloud data to generate a corresponding first feature map tensor specifically comprises:
and carrying out aerial view plane pseudo-image conversion processing on the first point cloud data based on a PointPillars model, and carrying out two-dimensional image feature extraction processing on the aerial view plane pseudo-image obtained by conversion to generate the first feature image tensor.
3. The feature fusion processing method of point cloud and image data according to claim 1, wherein the performing the bird's eye view feature extraction processing on the first image data to generate a corresponding second feature map tensor specifically comprises:
and inputting the first image data into a BevFormer model to perform two-dimensional image aerial-view feature extraction to generate a second feature map tensor.
4. The feature fusion processing method of point cloud and image data according to claim 1,
the first eigen map tensor has a shape H1*W1*C1,H1Is the feature height, W1Is the feature width, C1Is the total number of data channels;
the second eigen map tensor has a shape H2*W2*C2,H2Is feature height and H2=H1、W2Is a feature width of W2=W1、C2The total number of data channels;
the third eigen map tensor has a shape H3*W3*C3,H3Is feature height and H3=H2=H1、W3Is a feature width of W3=W2=W1、C3Is the total number of data channels and C3=(C1+C2);
The fused feature tensor has a shape of H4*W4*C4,H4Is a feature height and H4=H3=H2=H1、W4Is the feature width and W4=W3=W2=W1、C4Is the total number of data channels and C4=C3=(C1+C2)。
5. An apparatus for implementing the feature fusion processing method of point cloud and image data according to any one of claims 1 to 4, the apparatus comprising: the system comprises an acquisition module, a point cloud aerial view feature processing module, an image aerial view feature processing module and a feature fusion processing module;
the acquisition module is used for acquiring first point cloud data and first image data;
the point cloud aerial view feature processing module is used for extracting aerial view features of the first point cloud data to generate a corresponding first feature map tensor;
the image aerial view feature processing module is used for performing aerial view feature extraction processing on the first image data to generate a corresponding second feature map tensor;
the feature fusion processing module is used for performing cascade processing on the first and second eigen map tensors to generate a corresponding third eigen map tensor; calculating a position coding tensor corresponding to the third eigen map tensor according to a position coding rule of a Transformer model to obtain a corresponding first position coding tensor; inputting the third feature map tensor and the corresponding first position coding tensor into a Transformer model for self-attention operation; and the output result of the model operation is used as the corresponding fusion characteristic tensor.
6. An electronic device, comprising: a memory, a processor, and a transceiver;
the processor is used for being coupled with the memory, reading and executing the instructions in the memory to realize the method steps of any one of claims 1-4;
the transceiver is coupled to the processor, and the processor controls the transceiver to transmit and receive messages.
7. A computer-readable storage medium having computer instructions stored thereon which, when executed by a computer, cause the computer to perform the method of any of claims 1-4.
CN202210536129.9A 2022-05-17 2022-05-17 Feature fusion processing method and device for point cloud and image data Pending CN114782786A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210536129.9A CN114782786A (en) 2022-05-17 2022-05-17 Feature fusion processing method and device for point cloud and image data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210536129.9A CN114782786A (en) 2022-05-17 2022-05-17 Feature fusion processing method and device for point cloud and image data

Publications (1)

Publication Number Publication Date
CN114782786A true CN114782786A (en) 2022-07-22

Family

ID=82437852

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210536129.9A Pending CN114782786A (en) 2022-05-17 2022-05-17 Feature fusion processing method and device for point cloud and image data

Country Status (1)

Country Link
CN (1) CN114782786A (en)

Similar Documents

Publication Publication Date Title
JP6745328B2 (en) Method and apparatus for recovering point cloud data
CN112862874B (en) Point cloud data matching method and device, electronic equipment and computer storage medium
CN114782787A (en) Processing method and device for carrying out feature fusion on point cloud and image data
WO2023193401A1 (en) Point cloud detection model training method and apparatus, electronic device, and storage medium
CN113095333B (en) Unsupervised feature point detection method and unsupervised feature point detection device
CN115436910B (en) Data processing method and device for performing target detection on laser radar point cloud
WO2023193400A1 (en) Point cloud detection and segmentation method and apparatus, and electronic device
CN113420637A (en) Laser radar detection method under multi-scale aerial view angle in automatic driving
CN115457492A (en) Target detection method and device, computer equipment and storage medium
CN115965842A (en) Target detection method and system based on image and point cloud fusion
CN114627244A (en) Three-dimensional reconstruction method and device, electronic equipment and computer readable medium
CN110910463B (en) Full-view-point cloud data fixed-length ordered encoding method and equipment and storage medium
CN116246119A (en) 3D target detection method, electronic device and storage medium
CN114782786A (en) Feature fusion processing method and device for point cloud and image data
CN115049872A (en) Image point cloud feature fusion classification method and device
JP2023095806A (en) Three-dimensional data augmentation, model training and detection method, device, and autonomous vehicle
CN116206302A (en) Three-dimensional object detection method, three-dimensional object detection device, computer equipment and storage medium
CN114937092A (en) Processing method and device for image feature coding based on internal and external parameters of camera
CN115147692A (en) Target detection method and device, electronic equipment and storage medium
CN114898221A (en) Tower inclination detection method and device, electronic equipment and medium
CN113505796A (en) Real-time high-precision panoramic segmentation method
Tu et al. An efficient deep learning approach using improved generative adversarial networks for incomplete information completion of self-driving vehicles
Bründl et al. Semantic part segmentation of spatial features via geometric deep learning for automated control cabinet assembly
CN115063459B (en) Point cloud registration method and device and panoramic point cloud fusion method and system
CN115909255B (en) Image generation and image segmentation methods, devices, equipment, vehicle-mounted terminal and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination