CN112905239B - Point cloud preprocessing acceleration method based on FPGA, accelerator and electronic equipment - Google Patents

Point cloud preprocessing acceleration method based on FPGA, accelerator and electronic equipment Download PDF

Info

Publication number
CN112905239B
CN112905239B CN202110191083.7A CN202110191083A CN112905239B CN 112905239 B CN112905239 B CN 112905239B CN 202110191083 A CN202110191083 A CN 202110191083A CN 112905239 B CN112905239 B CN 112905239B
Authority
CN
China
Prior art keywords
point cloud
voxel
data
fpga
ddr
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110191083.7A
Other languages
Chinese (zh)
Other versions
CN112905239A (en
Inventor
王维杰
郭开元
张剑
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Chaoxing Future Technology Co ltd
Original Assignee
Beijing Chaoxing Future Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Chaoxing Future Technology Co ltd filed Critical Beijing Chaoxing Future Technology Co ltd
Priority to CN202110191083.7A priority Critical patent/CN112905239B/en
Publication of CN112905239A publication Critical patent/CN112905239A/en
Application granted granted Critical
Publication of CN112905239B publication Critical patent/CN112905239B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/7817Specially adapted for signal processing, e.g. Harvard architectures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Hardware Design (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Signal Processing (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Image Generation (AREA)

Abstract

The embodiment of the application provides a point cloud preprocessing acceleration method, an accelerator and electronic equipment based on an FPGA, wherein the acceleration method comprises the following steps: receiving parameter configuration and starting commands; dividing to obtain voxels, wherein each voxel comprises a plurality of point clouds; maintaining a voxel table on an FPGA, and recording the quantity and related information of point clouds in each voxel; reading point cloud data to be processed, performing calculation processing, and storing a calculation result in the DDR; and reading the calculation result data, carrying out maximum value pooling processing, writing the pooled result into the DDR, and simultaneously completing the function of a scanner. According to the processing scheme, the data storage structure adopts the FPGA on-chip storage, so that the consumption of on-chip storage resources is reduced; the parallel processing of the point cloud data is realized by fully utilizing the pipeline parallel technology, and the data throughput rate is greatly improved, so that the speed of the point cloud preprocessing is improved, and the requirements of an automatic driving scene are met.

Description

Point cloud preprocessing acceleration method based on FPGA, accelerator and electronic equipment
Technical Field
The application relates to the technical field of point cloud data processing, in particular to a point cloud preprocessing acceleration method based on an FPGA, an accelerator and electronic equipment.
Background
Most of the existing machine learning systems are calculated by a CPU or a GPU, and have the problems of complex structure, huge calculated amount and data amount and high power and energy consumption, and cannot be directly applied to embedded environments such as automobiles. This greatly affects the usage scenario of deep learning.
In the actual application process, if the processing delay is too large or the occupied computational power and storage resources are more, the application field of the CNN (Convolutional Neural Networks, convolutional neural network) can be greatly limited. For example, in the aspect of processing images and laser radar point clouds in the automatic driving field, the processing data volume is large, and the real-time requirement is high, so that research on how to optimize CNN to achieve high throughput rate and low delay is urgent.
In general, a point cloud preprocessing algorithm runs on a CPU, and the CPU performs serial processing on point cloud data, so that pipeline parallel processing cannot be realized, and when large-scale point cloud data is processed, a traditional point cloud data processing mode on the CPU occupies a long processing time and tends to be a bottleneck of system performance, so that the real-time requirement of automatic driving cannot be met.
Disclosure of Invention
In view of this, the embodiments of the present application provide a point cloud preprocessing acceleration method, an accelerator and an electronic device based on FPGA, which at least partially solve the problems existing in the prior art.
In a first aspect, an embodiment of the present application provides a method for accelerating point cloud preprocessing based on FPGA, including the following steps:
receiving parameter configuration and starting commands;
dividing a graph to be processed into a plurality of voxels according to grids, wherein each voxel comprises a plurality of point clouds;
maintaining a voxel table on an FPGA, wherein the voxel table records the quantity and related information of point clouds in each voxel;
reading point cloud data to be processed from DDR flowing water;
calculating the point cloud data, and storing a calculation result in the DDR;
reading the calculation result data in the DDR according to the point cloud related information in the voxel table;
and respectively carrying out maximum pooling processing on the point cloud data in each voxel according to the read calculation result data, writing the pooled result into the DDR, and writing the pooled result into the DDR and simultaneously completing the function of a seater.
According to a specific implementation manner of the embodiment of the application, the point cloud computing module performs computing processing on the point cloud data, where the computing processing includes computing coordinates of a voxel center, computing coordinates of a point cloud offset voxel center, convolution operation, activation function and normalization.
According to a specific implementation manner of the embodiment of the application, the coordinates of the voxel center and the coordinates of the offset voxel center of the point cloud are calculated in a serial pipeline mode, and a set of point cloud data generates a set of coordinates.
According to a specific implementation manner of the embodiment of the application, when the convolution operation is performed, a group of point cloud data generates a plurality of channels and adopts multi-channel convolution operation, and the multi-channel convolution operation adopts parallel processing.
According to a specific implementation manner of the embodiment of the application, the parameter configuration includes the data volume to be processed, the source data address, the destination address and parameters used in the point cloud computing.
According to a specific implementation manner of the embodiment of the application, the relevant information recorded in the voxel table includes the number of point clouds processed in each voxel and the storage address of the calculation result in the DDR.
In a second aspect, an embodiment of the present application further provides a point cloud preprocessing accelerator based on an FPGA, where the point cloud preprocessing accelerator includes: the system comprises a voxel table, a top layer control module, a point cloud computing module and a maximum value pooling module;
the voxel table records the quantity and related information of point clouds in each voxel by using storage resources on an FPGA (field programmable gate array) chip;
one end of the top layer control module is connected with the APU and is used for receiving parameter configuration and starting commands, and the other end of the top layer control module is connected with the point cloud computing module and controls the pipeline flow between the top layer control module and the point cloud computing module;
the point cloud computing module is connected with the voxel table and is used for computing point cloud data, and computing results are stored in the DDR, wherein the computing processing of the point cloud data comprises the steps of computing voxel coordinates, computing coordinates of a point cloud offset voxel center, carrying out convolution operation, activating functions and normalizing;
the maximum value pooling module is connected with the voxel table and is used for carrying out maximum value pooling processing on the point cloud data in each voxel, writing the pooled result into the DDR and simultaneously completing the function of a sciter.
According to a specific implementation manner of the embodiment of the application, the top layer control module is further connected with the data carrier and controls the data carrier to read and write the data in the DDR.
In a third aspect, embodiments of the present application further provide an electronic device, including:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the FPGA-based point cloud preprocessing acceleration method of any one of the preceding first aspects.
In a fourth aspect, embodiments of the present application further provide a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the FPGA-based point cloud preprocessing acceleration method of any one of the preceding first aspects.
The FPGA-based point cloud preprocessing acceleration method comprises the steps that a top layer control module receives parameter configuration and a starting command; dividing the top view into a plurality of voxels according to grids; maintaining a voxel table on an FPGA, and recording relevant information of point cloud in each voxel; the point cloud data are input into a point cloud computing module to be computed, and a computing result is stored in the DDR; and respectively carrying out maximum value pooling processing on the point cloud data in each voxel at a maximum value pooling module according to the calculation result data, writing the pooled result into the DDR, and simultaneously completing the function of the scanner. According to the processing scheme, an efficient hardware architecture is designed for a point cloud preprocessing algorithm based on the FPGA, and the hardware architecture fully utilizes a pipeline technology, so that parallel processing of point cloud data is realized, and the data throughput rate is greatly improved; the hardware architecture optimizes the data storage structure, and the key storage adopts on-chip storage, so that the consumption of on-chip storage resources is reduced as much as possible under the condition that the performance meets the requirement; the invention can meet the requirements of different application scenes through parameterized configuration, and improves the practicability.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a flowchart of an acceleration method for preprocessing point cloud based on FPGA according to an embodiment of the present application;
fig. 2 is a system frame diagram of an FPGA-based point cloud preprocessing accelerator according to an embodiment of the present application.
Detailed Description
Embodiments of the present application are described in detail below with reference to the accompanying drawings.
Other advantages and effects of the present application will become apparent to those skilled in the art from the present disclosure, when the following description of the embodiments is taken in conjunction with the accompanying drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. The present application may be embodied or carried out in other specific embodiments, and the details of the present application may be modified or changed from various points of view and applications without departing from the spirit of the present application. It should be noted that the following embodiments and features in the embodiments may be combined with each other without conflict. All other embodiments, which can be made by one of ordinary skill in the art based on the embodiments herein without making any inventive effort, are intended to be within the scope of the present application.
It is noted that various aspects of the embodiments are described below within the scope of the following claims. It should be apparent that the aspects described herein may be embodied in a wide variety of forms and that any specific structure and/or function described herein is merely illustrative. Based on the present application, one skilled in the art will appreciate that one aspect described herein may be implemented independently of any other aspect, and that two or more of these aspects may be combined in various ways. For example, an apparatus may be implemented and/or a method practiced using any number of the aspects set forth herein. In addition, such apparatus may be implemented and/or such methods practiced using other structure and/or functionality in addition to one or more of the aspects set forth herein.
It should also be noted that the illustrations provided in the following embodiments merely illustrate the basic concepts of the application by way of illustration, and only the components related to the application are shown in the drawings and are not drawn according to the number, shape and size of the components in actual implementation, and the form, number and proportion of the components in actual implementation may be arbitrarily changed, and the layout of the components may be more complicated.
In addition, in the following description, specific details are provided in order to provide a thorough understanding of the examples. However, it will be understood by those skilled in the art that the aspects may be practiced without these specific details.
In a first aspect, an embodiment of the present application provides a method for accelerating point cloud preprocessing based on FPGA (Field Programmable Gate Array ), where steps of the method are shown in fig. 1, and specific steps are as follows:
the first step: a Top level control module (Top Controller) receives the parameter configuration and initiates the command.
In the embodiment of the application, the top layer control module is connected with the APU through the AXI_Lite interface to receive parameter configuration, wherein the parameter configuration mainly comprises data quantity to be processed, a source data address, a destination address and the like, and parameters used in calculation. Through parameterized configuration, the requirements of different application scenes can be met.
And a second step of: dividing the top view into H×W voxels (pixels) according to grids, wherein each Voxel comprises a plurality of point clouds, H represents the number of voxels in the height direction, and W represents the number of voxels in the width direction.
And a third step of: a Voxel Table (Voxel Table) is maintained on the FPGA, and the Voxel Table records the number of point clouds and related information in each Voxel.
In the embodiment, the related data of the voxel table is stored on the FPGA chip, the energy efficiency ratio is high, the volume is small, the voxel table is realized by using the storage resources on the FPGA chip, the access external memory can be reduced, and the access delay is greatly reduced compared with the access delay of DDR (double rate synchronous dynamic random access memory) storage. Since the point cloud top view is divided into h×w voxels, each entry in the voxel table corresponds to one voxel, so there are h×w entries in the voxel table. The relevant information recorded in the voxel table mainly comprises the number of processed point clouds in each voxel and the storage address of the calculated result in the DDR.
Fourth step: and (3) reading point cloud data to be processed from the DDR in the running water, and inputting the point cloud data to a point cloud computing module (calculation).
Fifth step: and the point cloud computing module performs computing processing on the point cloud data, and a computing result is stored in the DDR.
The specific process of the calculation processing of the point cloud data comprises the following steps: voxel coordinates are calculated (Calculate Voxel Coordinate), coordinates of the point cloud offset voxel center are calculated (Calculate Coordinate Offset), convolution operation (Conv), activation function (Relu) and normalization.
When the coordinates are calculated, a serial pipeline mode is used for calculation, a group of point cloud data generates a group of coordinates, the coordinates of the voxels are calculated, namely, the coordinates of the voxels on the top view, and then the Table entries (Lookup tables) in the voxel Table corresponding to the voxels can be quickly accessed through the coordinates.
In computing the convolution, since a set of point cloud data (x, y, z) produces several channels, computing the multi-channel convolution may employ parallel processing to increase the computation speed. The parallelism selection method includes: in this embodiment, a set of point cloud data (x, y, z) generates 32 channels, i.e. 32 data, and if the data bit width is 8bits, the computation multi-channel convolution may use parallel processing, and the parallelism depends on the available bandwidth of the DDR, for example, when the data bus bit width is 128bits (clock 200 Mhz), the data bus bit width/data bit width=128/8=16, so the parallelism is 16. It should be understood that, in the embodiment of the present application, 32 channels are selected for a set of point cloud data, and the data bit width is 8bits, which is not limited to this in practical application and may be determined according to practical situations.
And (3) performing an activation function operation on the result of the convolution operation, wherein the parallelism of the activation function operation is selected to be consistent with that of the convolution operation.
The data after the operation of the activation function is written into the DDR after some normalization.
The calculation result can be obtained through calculation in the DDR, each group of voxels has 32 groups of point cloud data at most, if the pixel is not full of 32, the pixel cloud data corresponds to 32 channels according to 32 reserved spaces, the data bit width is 8bits, and the voxel size is H multiplied by W, so that the occupied DDR capacity is as follows: h×w×32×32×8 bits.
Sixth step: and reading the calculation result data in the DDR according to the point cloud related information in the voxel table. The specific process is as follows: and traversing a voxel Table (Lookup Table) in sequence, calculating a voxel coordinate, namely the coordinate of the voxel on a top view through the point cloud calculation in the fifth step, quickly accessing an entry in the voxel Table corresponding to the voxel through the coordinate, obtaining the number of the point clouds in the voxel and the storage address of a calculation result in the DDR for the entry which is not empty, and reading the result of a point cloud calculation module through a Datamover (a form of direct memory access), thereby quickly reading calculation result data (Read DDR) in the DDR.
Seventh step: and according to the read calculation result data, carrying out maximum value pooling processing (only taking the maximum value as a reserved value) between point clouds in the same voxel by a maximum value pooling module (Maxpooling), writing the pooled result into the DDR (Compare & Write DDR), writing the pooled result into the DDR, and simultaneously completing the function of a sciter, wherein the function of the sciter is the coordinates of the point clouds, and storing the point cloud data into corresponding DDR addresses. The data size of the pooled result after the storage structure in DDR is pooled is: H×W×32×8 bits.
According to a specific implementation manner of the embodiment of the application, the coordinates of the voxel center and the coordinates of the offset voxel center of the point cloud are calculated in a serial pipeline mode, and a set of point cloud data generates a set of coordinates.
According to a specific implementation manner of the embodiment of the application, when the convolution operation is performed, a group of point cloud data generates a plurality of channels and adopts multi-channel convolution operation, and the multi-channel convolution operation adopts parallel processing.
According to the embodiment of the application, the parallel processing of the point cloud data is realized by fully utilizing the pipeline technology, the data throughput rate is greatly improved, and the point cloud computing performance is improved.
In a second aspect, an embodiment of the present application further provides a point cloud preprocessing accelerator based on FPGA, where a specific frame diagram of the point cloud preprocessing accelerator is shown in fig. 2, and the point cloud preprocessing accelerator includes: voxel table (Voxel table), top control module (Top Controller), point cloud computing module (calculation), and maximum pooling module (Maxpooling).
The voxel table stores point cloud related data by utilizing storage resources on an FPGA chip;
one end of the top layer control module is connected with the APU through an AXI_lite interface and is used for receiving parameter configuration, starting commands and initializing a voxel table, and the other end of the top layer control module is connected with the point cloud computing module and controls the pipeline flow between the top layer control module and the point cloud computing module;
the point cloud computing module is connected with the voxel table and is used for computing the point cloud data in the voxel table, and the computing result is stored in the DDR, wherein the computing process of the point cloud data comprises the steps of computing the coordinates of the voxels, computing the coordinates of the offset voxel centers of the point cloud, carrying out convolution operation, activating functions and normalizing.
And the maximum value pooling module is connected with the voxel table, performs maximum pooling operation on the calculation result of the point cloud calculation module, completes the function of a scanner, and stores the pooled result in the DDR.
According to a specific implementation manner of the embodiment of the present application, the top layer control module is further connected to and controls the Datamover to read and write data in the DDR.
For understanding, the embodiment specifically describes the workflow of the point cloud preprocessing accelerator, including the following steps:
s1, a top layer control module configures parameters of an accelerator through an AXI_Lite interface, and the method mainly comprises the following steps: the data volume to be processed, the source data address, the destination address and the like, and parameters used in calculation;
s2, the APU sends a start command to start the accelerator to work through an AXI_Lite interface;
s3, the accelerator firstly initializes all list items of the Voxel Table to be 0;
s4, sending a command for reading the weight from the DDR to the Datamver through an axis_mm2s_cmd interface, receiving the weight (used for convolution operation) through the axis_mm2s interface and caching the weight in an on-chip memory;
s5, sending a command for reading the point cloud from the DDR to the Datamver through an axis_mm2s_cmd interface, and receiving the point cloud data through the axis_mm2s interface for calculation of the next step;
s6, judging whether the point cloud data is in the selected boundary range, and calculating the center coordinates of the Voxel where the point cloud data is located and the offset coordinates of the point cloud data in the range;
s7, looking up a Table, and searching the number of existing point clouds of the Voxel where the point clouds are located in the Voxel Table;
s8, when the Voxel of the point cloud is not built in the Voxel Table, if the total number of the Voxels is smaller than 20000, creating a new Table entry; the Voxel entry already exists in the table, which contains less than 32 point clouds. The next step is carried out in both cases, otherwise, the point cloud data are discarded;
s9, calculating convolution and activation functions in parallel, and writing the result into the DDR through an axis_s2mm interface;
s10, maxpooling: reading the table in sequence, obtaining the storage address of each point cloud data volume and calculation result in the DDR, and sending a reading command through an axis_mm2s_cmd interface;
s11, maxpooling: carrying out maximum value pooling operation on the data comparison, and then storing the pooled result in the DDR for the DPU to use;
and S12, the accelerator starts the accelerator to work by sending a completion (done) command through an AXI_Lite interface.
In a third aspect, embodiments of the present application further provide an electronic device, including:
at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the FPGA-based point cloud preprocessing acceleration method of any one of the preceding first aspects.
In a fourth aspect, embodiments of the present application further provide a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the steps of the FPGA-based point cloud preprocessing acceleration method of any one of the preceding first aspects:
receiving parameter configuration and starting commands;
dividing a graph to be processed into a plurality of voxels according to grids, wherein each voxel comprises a plurality of point clouds;
maintaining a voxel table on an FPGA, wherein the voxel table records the quantity and related information of point clouds in each voxel;
reading point cloud data to be processed from DDR flowing water;
calculating the point cloud data, and storing a calculation result in the DDR;
reading the calculation result data in the DDR according to the point cloud related information in the voxel table;
and respectively carrying out maximum pooling processing on the point cloud data in each voxel according to the read calculation result data, writing the pooled result into the DDR, and writing the pooled result into the DDR and simultaneously completing the function of a seater.
Aiming at the problem that the real-time performance and throughput rate cannot be met by using a CPU to perform a point cloud preprocessing algorithm, the embodiment provided by the application provides a point cloud preprocessing acceleration method, an accelerator and electronic equipment based on an FPGA, and designs an efficient hardware architecture, wherein a data storage structure adopts FPGA on-chip storage, so that the consumption of on-chip storage resources is reduced as much as possible under the condition that the performance meets the requirement; the parallel processing of the point cloud data is realized by fully utilizing the pipeline parallel technology, and the data throughput rate is greatly improved, so that the speed of the point cloud preprocessing is improved, and the requirements of an automatic driving scene are met.
The foregoing is merely specific embodiments of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions easily conceivable by those skilled in the art within the technical scope of the present application should be covered in the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (8)

1. The point cloud preprocessing acceleration method based on the FPGA is characterized by comprising the following steps of:
receiving parameter configuration and starting commands;
dividing a graph to be processed into a plurality of voxels according to grids, wherein each voxel comprises a plurality of point clouds;
maintaining a voxel table on an FPGA, wherein the voxel table records the quantity and related information of point clouds in each voxel;
reading point cloud data to be processed from DDR flowing water;
calculating the point cloud data, wherein a calculation result is stored in the DDR, and the calculation process comprises the steps of calculating the coordinates of the voxel center, calculating the coordinates of the offset voxel center of the point cloud, carrying out convolution operation, activating functions and normalizing;
reading the calculation result data in the DDR according to the point cloud related information in the voxel table;
respectively carrying out maximum pooling treatment on the point cloud data in each voxel according to the read calculation result data, writing the pooled result into the DDR, and simultaneously completing the function of a seater;
the relevant information recorded in the voxel table comprises the number of processed point clouds in each voxel and the storage address of the calculation result in the DDR;
and the calculation processing and the maximum value pooling processing of the point cloud data are performed on an FPGA.
2. The FPGA-based point cloud preprocessing acceleration method of claim 1, wherein the calculating of the coordinates of the voxel center and the calculating of the coordinates of the point cloud offset voxel center are performed using a serial pipeline, and a set of point cloud data generates a set of coordinates.
3. The FPGA-based point cloud preprocessing acceleration method of claim 1, wherein, when performing the convolution operation, a set of point cloud data generates a plurality of channels and adopts a multi-channel convolution operation, and the multi-channel convolution operation adopts parallel processing.
4. The FPGA-based point cloud preprocessing acceleration method of claim 1, wherein the parameter configuration includes an amount of data to be processed, a source data address, a destination address, and parameters used in point cloud computing.
5. A point cloud preprocessing accelerator based on an FPGA, the point cloud preprocessing accelerator comprising: a voxel table, a top layer control module, a point cloud computing module and a maximum value pooling module,
the voxel table records the quantity and related information of point clouds in each voxel by using storage resources on an FPGA (field programmable gate array) chip;
one end of the top layer control module is connected with the APU and is used for receiving parameter configuration and starting commands, and the other end of the top layer control module is connected with the point cloud computing module and controls the pipeline flow between the top layer control module and the point cloud computing module;
the point cloud computing module is connected with the voxel table and is used for computing point cloud data, and the computing result is stored in the DDR, wherein the computing process of the point cloud data comprises the steps of computing voxel coordinates, computing coordinates of a point cloud offset voxel center, convolution operation, activation function and normalization;
the maximum value pooling module is connected with the voxel table and is used for carrying out maximum value pooling processing on the point cloud data in each voxel, writing the pooled result into the DDR and completing the function of a scanner at the same time;
the relevant information recorded in the voxel table comprises the number of processed point clouds in each voxel and the storage address of the calculation result in the DDR;
and the calculation processing and the maximum value pooling processing of the point cloud data are performed on an FPGA.
6. The FPGA-based point cloud preprocessing accelerator of claim 5, wherein said top level control module is further coupled to a data handler and controls said data handler to read and write data in said DDR.
7. An electronic device, the electronic device comprising:
at least one processor; the method comprises the steps of,
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the FPGA-based point cloud preprocessing acceleration method of any of the preceding claims 1-4.
8. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the FPGA-based point cloud preprocessing acceleration method of any of the preceding claims 1-4.
CN202110191083.7A 2021-02-19 2021-02-19 Point cloud preprocessing acceleration method based on FPGA, accelerator and electronic equipment Active CN112905239B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110191083.7A CN112905239B (en) 2021-02-19 2021-02-19 Point cloud preprocessing acceleration method based on FPGA, accelerator and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110191083.7A CN112905239B (en) 2021-02-19 2021-02-19 Point cloud preprocessing acceleration method based on FPGA, accelerator and electronic equipment

Publications (2)

Publication Number Publication Date
CN112905239A CN112905239A (en) 2021-06-04
CN112905239B true CN112905239B (en) 2024-01-12

Family

ID=76123904

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110191083.7A Active CN112905239B (en) 2021-02-19 2021-02-19 Point cloud preprocessing acceleration method based on FPGA, accelerator and electronic equipment

Country Status (1)

Country Link
CN (1) CN112905239B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110060288A (en) * 2019-03-15 2019-07-26 华为技术有限公司 Generation method, device and the storage medium of point cloud characteristic pattern
CN111814679A (en) * 2020-07-08 2020-10-23 上海雪湖科技有限公司 FPGA-based realization algorithm for voxel-encoder and VFE of voxel 3D network
CN111860340A (en) * 2020-07-22 2020-10-30 上海科技大学 Efficient K-nearest neighbor search algorithm for three-dimensional laser radar point cloud in unmanned driving
WO2020258529A1 (en) * 2019-06-28 2020-12-30 东南大学 Bnrp-based configurable parallel general convolutional neural network accelerator

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10783698B2 (en) * 2018-07-31 2020-09-22 Intel Corporation Point cloud operations
US11164363B2 (en) * 2019-07-08 2021-11-02 Waymo Llc Processing point clouds using dynamic voxelization

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110060288A (en) * 2019-03-15 2019-07-26 华为技术有限公司 Generation method, device and the storage medium of point cloud characteristic pattern
WO2020258529A1 (en) * 2019-06-28 2020-12-30 东南大学 Bnrp-based configurable parallel general convolutional neural network accelerator
CN111814679A (en) * 2020-07-08 2020-10-23 上海雪湖科技有限公司 FPGA-based realization algorithm for voxel-encoder and VFE of voxel 3D network
CN111860340A (en) * 2020-07-22 2020-10-30 上海科技大学 Efficient K-nearest neighbor search algorithm for three-dimensional laser radar point cloud in unmanned driving

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
云计算环境下海量激光点云数据的高密度存储器逻辑结构设计;刘辉;;激光杂志(第09期);91-95 *
刘忠雨等.《深入浅出图神经网络 GNN原理解析》.机械工业出版社,2020,173. *

Also Published As

Publication number Publication date
CN112905239A (en) 2021-06-04

Similar Documents

Publication Publication Date Title
CN110097174B (en) Method, system and device for realizing convolutional neural network based on FPGA and row output priority
CN107437110B (en) Block convolution optimization method and device of convolutional neural network
CN109214504B (en) FPGA-based YOLO network forward reasoning accelerator design method
US20210103818A1 (en) Neural network computing method, system and device therefor
US10684946B2 (en) Method and device for on-chip repetitive addressing
US11030095B2 (en) Virtual space memory bandwidth reduction
CN110390382B (en) Convolutional neural network hardware accelerator with novel feature map caching module
CN113792621B (en) FPGA-based target detection accelerator design method
US20210295607A1 (en) Data reading/writing method and system in 3d image processing, storage medium and terminal
CN114779209B (en) Laser radar point cloud voxelization method and device
WO2021147276A1 (en) Data processing method and apparatus, and chip, electronic device and storage medium
CN116227599A (en) Inference model optimization method and device, electronic equipment and storage medium
WO2020103883A1 (en) Method for executing matrix multiplication, circuit and soc
CN115033185A (en) Memory access processing method and device, storage device, chip, board card and electronic equipment
CN114265696A (en) Pooling device and pooling accelerating circuit for maximum pooling layer of convolutional neural network
CN115249057A (en) System and computer-implemented method for graph node sampling
CN112905239B (en) Point cloud preprocessing acceleration method based on FPGA, accelerator and electronic equipment
CN110490312A (en) A kind of pond calculation method and circuit
CN115577747A (en) High-parallelism heterogeneous convolutional neural network accelerator and acceleration method
CN112035056B (en) Parallel RAM access equipment and access method based on multiple computing units
CN113111013B (en) Flash memory data block binding method, device and medium
US20210224632A1 (en) Methods, devices, chips, electronic apparatuses, and storage media for processing data
CN110390392B (en) Convolution parameter accelerating device based on FPGA and data reading and writing method
CN112101538A (en) Graph neural network hardware computing system and method based on memory computing
CN111488970A (en) Execution optimization method and device of neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant