CN112182042A

CN112182042A - Point cloud feature matching method and system based on FPGA and path planning system

Info

Publication number: CN112182042A
Application number: CN202011086902.3A
Authority: CN
Inventors: 李江辉; 骆兵; 陈诚知; 张磊; 卜凯
Original assignee: Shanghai Yangling Energy Technology Co ltd
Current assignee: Wuhan Zhihui Innovation Technology Co.,Ltd.
Priority date: 2020-10-12
Filing date: 2020-10-12
Publication date: 2021-01-05

Abstract

The invention provides a point cloud feature matching method, a point cloud feature matching system and a path planning system based on an FPGA (field programmable gate array), wherein the method comprises the following steps of: a processing system portion and a programmable logic portion; the processing system part carries out periodic point cloud data acquisition, the acquired point cloud data are written into a designated address space of a memory and are communicated with the programmable logic part through a register channel, the programmable logic part acquires the point cloud data from the designated address space according to frames and carries out logic processing, and the KNN process of processing characteristic point cloud matching on a CPU of a traditional processor is avoided and simplified.

Description

Point cloud feature matching method and system based on FPGA and path planning system

Technical Field

The invention relates to the field of data processing, in particular to a point cloud feature matching method and system based on an FPGA and a path planning system.

Background

With the rapid development of new technologies such as artificial intelligence and automatic driving, the laser radar is widely applied to the fields of artificial intelligence, perception, navigation, positioning and the like of intelligent robots, and achieves good effects in special operation scenes of mobile robots with medium and low speed. The requirements for performance, volume, power consumption and cost of application systems are higher and higher. At present, in the field of mobile robots, laser radars play great advantages in the aspects of environmental perception, positioning, obstacle avoidance and the like, however, data processing flows of the laser radars are almost processed on a traditional industrial personal computer and a CPU or a GPU, and the processes comprise point cloud data acquisition, feature extraction, feature matching, pose estimation, distance optimization solution, map feature matching and the like.

For intelligent robots or mobile terminal products, along with diversification of product functions, user experience requirements are improved, and performance requirements for low delay and high frame rate of the products are stricter, so that data volume and complexity of an algorithm are larger and larger. When massive data and complex service logic are processed on a traditional CPU processor or a hardware platform, the time consumption is large, the efficiency is low, and the real-time requirement of the service can not be met.

Patent document CN103345382A discloses a CPU + GPU group kernel supercomputing system and a SIFT feature matching parallel computing method, wherein through parallelism analysis, many computations are divided and respectively computed between a CPU and a GPU, and respective computing advantages are exerted, so that the speed of the SIFT feature matching GPU parallel algorithm is increased by about 30 times compared with the speed of the CPU serial algorithm, the data processing time is greatly shortened, the real-time performance is improved, and the extraction and matching of remote sensing image feature points are realized.

In the data processing process of the industrial personal computer and the CPU, the CPU is a serial execution process, even if the CPU is a multi-core and multi-thread CPU, the time consumption is large when massive data and service logic are processed, the efficiency is low, and the real-time requirement of the service cannot be met. And the CPU about to face the moore's law of failure is difficult to continue to support the processing energy efficiency of complex data, and the industrial personal computer is large in size, large in power consumption and high in cost. The acceleration performance of the SIMD instruction architecture of the GPU cannot be well played when the SIMD instruction architecture is used for processing irregular and irregular data structures.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a point cloud feature matching method and system based on an FPGA and a path planning system.

The invention provides a point cloud feature matching system based on an FPGA, which comprises: a processing system portion and a programmable logic portion;

the processing system part carries out periodic point cloud data acquisition, the acquired point cloud data are written into a designated address space of a memory and are communicated with the programmable logic part through a register channel, and the programmable logic part acquires the point cloud data from the designated address space according to frames and carries out logic processing.

Preferably, the programmable logic portion includes:

a first functional module: loading a frame of point cloud data from the memory, performing feature extraction processing to obtain feature points, and writing the feature points into the memory;

a second functional module: loading the point cloud data of the previous frame from the memory for cache management, performing one-by-one matching calculation and result output with the feature points obtained by the first functional module, wherein the output result comprises a plurality of nearest feature points of the feature points, caching the result, and simultaneously starting a burst to write the result back to the memory;

a third functional module: and loading the feature points from the memory for cache management, performing one-by-one matching calculation and result output with the feature points obtained by the first functional module, wherein the output result comprises a plurality of nearest feature points of the feature points, caching the result, and simultaneously starting a burst to write the result back to the memory.

Preferably, the processing system portion comprises:

and acquiring the feature points of the current frame and data of a plurality of 2 frames of nearest feature points, constructing lines and planes, solving the distance from the points to the lines and the planes, and carrying out optimal solution on the distance term.

Preferably, the memory comprises DDR.

The invention provides a point cloud feature matching method based on an FPGA, which comprises the following steps: a processing system portion and a programmable logic portion;

Preferably, the programmable logic portion includes:

loading a frame of point cloud data from the memory, performing feature extraction processing to obtain feature points, and writing the feature points into the memory;

loading the point cloud data of the previous frame from the memory for cache management, performing one-by-one matching calculation and result output with the feature points obtained by the first functional module, wherein the output result comprises a plurality of nearest feature points of the feature points, caching the result, and simultaneously starting a burst to write the result back to the memory;

and loading the feature points from the memory for cache management, performing one-by-one matching calculation and result output with the feature points obtained by the first functional module, wherein the output result comprises a plurality of nearest feature points of the feature points, caching the result, and simultaneously starting a burst to write the result back to the memory.

Preferably, the processing system portion comprises:

Preferably, the memory comprises DDR.

The path planning system provided by the invention comprises the point cloud feature matching system based on the FPGA.

Preferably, the point cloud data comprises a static hmap map file.

Compared with the prior art, the invention has the following beneficial effects:

1. the point cloud feature matching method based on the FPGA avoids and simplifies the KNN process for processing map feature point cloud matching on a traditional processor CPU, such as the complex processes of variance calculation, sorting, segmentation and mass data multi-time storage in the process of KDtree or graph node construction and the complex processes of distance calculation, multi-level index search, multi-level storage and the like in the process of KDtree or graph node search.

2. The method for task allocation in the complex processes of matching calculation, distance iterative solution, pose optimization and the like in the point cloud feature calculation process is provided, and by combining the characteristics of each processor of a CPU (central processing unit) and an FPGA (field programmable gate array), each functional step or data stream is deployed in each processor suitable for calculation and operation, so that the whole task execution efficiency is improved.

3. The feature point number N of the parameterizable local map, the dimension M of the feature point, the number K of adjacent feature points, the data bit width B of the feature points and the like can be flexibly configured and easily expanded, and the parameterizable local map can be properly adjusted and deployed according to chips of different hardware resources.

4. The parallel matching calculation method consumes less hardware resources and has high parallel acceleration calculation efficiency. For the above-mentioned local map feature points N32768, M3, K3, B48 bit, etc., the number of consumed 36Kb BRAM is 48, the computation parallelism is 3 × 8 — 24, 8 feature points and data processing processes in 3 dimensions for each feature point can be processed simultaneously, the PL logic side main processing clock is designed to be 250M, the time t ═ 3 adjacent feature points in the feature point set of the local map frame is searched for each feature point extracted from the key frame, (N/8+8+ K)/250MHZ ═ 3)/250MHZ ═ 4107/250MHZ ═ 0.0164ms, and the influence of the size of K on the consumed time is known from the calculation formula to be not large. Then, for 5000 feature points extracted from the current key frame, the total time T ═ 5000 ═ 0.0164ms ═ 5000 ═ 82ms consumed for searching each point correspondence K ═ 3 adjacent feature points in the feature point set of the local map frame. Compared with the traditional CPU, the calculation efficiency of the processing time is improved by more than 10 times. If the hardware resources are abundant, the parallelism of the deployable column direction is 16 or more, the row direction is expanded according to the dimension space of the feature points, the overall parallel computing efficiency is doubled, and the performance requirement of real-time performance can be met for the laser radar with the scanning period of 10 HZ. Therefore, one key frame does not need to be taken every 10 frames for motion estimation, map matching and accurate positioning can be performed on each frame in the motion process of the mobile robot in real time, the pose accumulation calculation process among key frames can be deleted, hardware resources are further reduced, the initial pose obtained on the PS side is more accurate, the number of iteration times for solving distance optimization is reduced, the pose resolving time is greatly reduced, the map matching and pose resolving efficiency is greatly improved on the whole, the speed of the mobile robot or the trolley can be controlled to be greatly improved, and user experience is better.

5. Compared with the CPU processing on the traditional industrial personal computer, the power consumption is low, the cost is low, the energy consumption is high, and the method is more suitable for being deployed on mobile terminal products.

Drawings

Other features, objects and advantages of the invention will become more apparent upon reading of the detailed description of non-limiting embodiments with reference to the following drawings:

FIG. 1 is a block diagram of the present invention;

FIG. 2 is a schematic diagram of pose change between frames;

FIG. 3 is a schematic diagram of pose changes between key frames;

FIG. 4 is a schematic diagram of a positioning and mapping process;

FIG. 5 is a schematic view of a motion path;

FIG. 6 is a schematic diagram of a feature point fixing process;

FIG. 7 is a schematic diagram of a map storage module;

FIG. 8 is a schematic diagram of a distance calculation module;

FIG. 9 is a schematic view of Cal _ PE calculation process;

FIG. 10 is a schematic diagram of the K nearest neighbor comparison module;

FIG. 11 is a diagram illustrating the write back of K-neighbor feature results.

Detailed Description

The present invention will be described in detail with reference to specific examples. The following examples will assist those skilled in the art in further understanding the invention, but are not intended to limit the invention in any way. It should be noted that it would be obvious to those skilled in the art that various changes and modifications can be made without departing from the spirit of the invention. All falling within the scope of the present invention.

FIG. 1 is a hardware implementation block diagram of a system, a processing platform of which is an FPGA chip embedded with an ARM core, and the overall design block diagram is divided into a PS system and a PL subsystem, wherein the PS system is embedded with an ARM processor core and a memory controller. According to the business logic, the data processing process of the laser point cloud is decomposed as follows. As can be seen from the above block diagram, when the hardware is powered on, the static hmap map file stored on the SD card or the EMMC chip is imported to the DDR designated address. The laser point cloud data is output by a network interface, is periodically collected at a PS end, is written into a specified address space of a DDR and is communicated with a PL through a register channel, and the PL acquires a point cloud frame from the corresponding address space and performs logic processing.

The process of PL logic processing is mainly divided into 3 functional blocks:

the PL side of the 1 st functional module loads a frame of point cloud from the DDR to perform feature extraction processing, including ground point segmentation, angular point extraction and the like, and writes the final classification feature point result into the DDR.

And the PL side of the 2 nd functional module loads the feature points of the previous frame from the DDR for cache management, performs one-by-one matching calculation and result output with the classified feature points output by the first functional module, and stores the K pieces of nearest feature point information of the feature points in the result. And caching the results of the K nearest neighbor points, and simultaneously starting a write burst to write the result data back to the DDR.

And the PL side of the 3 rd functional module carries out cache management on map frame feature points loaded from the DDR, performs one-by-one matching calculation and result output with the feature points output by the first functional module, and stores the K pieces of nearest feature point information of the feature points in the result. And caching the K nearest feature point results, and simultaneously starting a write burst to write the result data back to the DDR.

And (3) PS processing:

and obtaining classified feature point data of the current frame and K nearest feature point result data of 2 frames, constructing lines and surfaces, solving the distances between the points and the lines and the surfaces, and performing optimized solution, pose estimation calculation, path planning design and the like on the distance items.

In the invention, a 16-line laser radar is taken as an example, the vertical direction angular resolution is 2 degrees, the vertical visual angle is +/-15 degrees, the horizontal azimuth resolution is 0.2 degrees, the horizontal visual angle is 360 degrees, the laser radar frame rate is 10HZ, and the data volume of the original point cloud output by each frame is 28800 points which are 16x360 degrees/0.2 degrees. The laser radar is used for sensing a three-dimensional environment, each point contains information of x, y and z 3 dimensions, and a numerical value in each dimension represents the distance between a target point in the axis direction and the center of the laser radar. The method comprises the steps that a point cloud data network UDP packet output by a laser radar is received and unpacked at a PS end through a network interface, original point cloud data is converted into xyz coordinates under a Cartesian coordinate system through polar coordinates, the range finding capability of the laser radar is about 200 meters at most, the range of each coordinate of xyz is an 8-bit unsigned number range, but in order to improve the precision of measured data and the requirement of a floating point conversion point, the numerical value of each dimensional coordinate of xyz is amplified by 256 times in the coordinate conversion process, and the range is set to be 16 bits. Therefore, the maximum data volume of the original point cloud is 28800x16x16x16bit, but in the laser radar detection process, when no target exists or no target point returns, the data volume of the point cloud is reduced, the point cloud data after each frame of periodic conversion is written into the DDR, and the PL end periodically reads the laser point cloud data for feature extraction through each frame of an AXI4 bus, and the processes comprise ground point segmentation, plane point extraction, corner point extraction and the like. As can be seen from fig. 1, after the laser point cloud data passes through the corner feature extraction module, a certain number of N corner feature points are generated, N corner feature points are generated for each frame, and N feature points of the current frame are simultaneously written into the DDR and read out in the next frame to be matched with N feature points of the current frame. The matching is to find K adjacent feature points from the feature points of the previous frame for each feature point in the current frame, construct the distance from the feature point of the current frame to the line or plane formed by the adjacent feature points in the previous frame at the PS end, perform optimal solution on the distance term, and find out the pose estimate between two adjacent frames of the lidar, as shown in fig. 2, each frame can find out a pose R, T relative to the previous frame, but the pose estimate R, T between two adjacent frames is only a local pose and has an accumulated error. As shown in fig. 1, the pose estimation R, T between two adjacent frames is used as the initial pose input in the map matching process, and error elimination and accurate positioning are performed in the map matching process.

The scanning period of the laser radar is generally 10HZ, and the pose R, T for matching solution between frames is also 10HZ, but in the process of matching with the map, due to the large amount of map data, the time consumption of matching calculation in the calculation process of the CPU processor is large and usually reaches the second level, T frames are required to be spaced, and usually, one frame is extracted for map matching once when T is 10 frames. Here, every other T frame is defined as a key frame key for calculation, as shown in fig. 3, the total pose change between every two key frames is an accumulative effect of small pose changes between every two key frames, and the accumulation is to accumulate the rotation product and translation to obtain the pose change amount P of the current key frame relative to the previous key frame. And converting the product of the feature point cloud of the current key frame and the P into a world coordinate system, and then matching the world coordinate system with a map. As shown in fig. 4, for a positioning and mapping purpose, the feature point cloud output by each key frame period of the laser radar is QK +1, the pose between two adjacent key frames is calculated through accumulation to be TK, and the mapping process is to convert QK +1 multiplied by TK into the feature point cloud under a world coordinate system, match the feature point cloud with a global map or a local map, and obtain the accurate pose TK +1 of the current key frame relative to the world coordinate system. And then entering a global planning module, wherein the motion path of the mobile robot is planned and set in advance, the motion path is a 3-dimensional coordinate point set of a path expected to be driven by the mobile robot in a certain application scene, and the calculated pose TK +1 is multiplied by the coordinate of the current point on the motion path in the motion path to predict the coordinate of the next target point. A simple motion path is shown in FIG. 5, which has an ABCD4 point set, and the motion track is A- > B- > C- > D. If deviation exists in the prediction in the driving process, if the initial position is given as the point A, the next target point predicted after multiplying the point A by the point TK +1 is B' instead of B, position correction is carried out in the local planning module, and the pose of the mobile robot is adjusted. And finally filling the current feature points meeting the matching into the map. The map feature matching can predict the motion track of the mobile robot, and the matching speed influences the prediction precision and efficiency. The method comprises the steps that target feature point clouds of an environment extracted by a current key frame are matched with local map point clouds, namely K points which are nearest to each point of the target feature point clouds in the local map point clouds are found out on a PL side, data are written into a DDR after the K points are found out, the current target point clouds in the DDR and the K nearby points corresponding to the current target point clouds are read out on a PS side, and distance optimization iteration pose solving is carried out.

In the calculation process, a CPU obtains a current key frame to match with a global or larger local map, the matching calculation consumes a long time, so that one frame is required to be obtained at intervals of T-10 frames to match with the map feature point cloud. Too large a data amount for the local map range further increases the time consumption of calculation. Therefore, the efficiency of searching and searching mass data on the serial processor is low, so that the position posture is calculated slowly in the positioning and mapping process, and the navigation speed of the mobile trolley cannot be improved.

The global map is feature information describing the environment around the motion trail of the mobile robot in the current application scene, and the larger the area of the application scene is, the larger the feature information data volume of the global map is, and the larger the map file is. If the feature point data of the whole global map is read and matched, the number of the feature points is too large, and invalid feature points are calculated and consumed greatly. Therefore, the feature point information on the local map around a specific point on the motion trajectory where the current lidar is located needs to be obtained. The size of the feature point region of the local map may be set, for example, to search in a cubic neighborhood (1m × 1m × 1m) near a specific point on the motion trajectory. However, the number of feature points in such a cube is not fixed, and for a scene with dense target points, the number of feature points in the cube with the same size is large, and for a scene with sparse target points, the number of feature points in the cube is small. In the FPGA parallel operation accelerating structure, a storage space with a fixed size and a computing unit need to be designed in advance to finish storage and computation of fixed feature points in parallel, so that the fixed feature points with the fixed cube size cannot be adapted to the parallel computation task, and the feature points need to be fixed in advance. By performing binning processing in a cube, and fixing the number of feature points, as shown in a schematic diagram of binning processing shown in fig. 6, assuming that the maximum BRAM space for storing a local map is designed to be 32768 points, the three directions of length, width and height are equally binned, 32768 is 32 x 32, all three directions of length, width and height are set to be 32 grids, whether feature points exist in the current grid or not is judged according to the spatial coordinate position, at most one feature point is reserved in each grid, if a plurality of feature points exist in a grid, one feature point closest to the central coordinate of each grid in the spatial position is reserved, and the rest feature points are discarded. After the binning is finished, the feature points in the whole cube are reserved as M32768 points at most, each point has XYZ 3 dimensions, and each dimension is 16 bits, for example, in the calculation process of a map storage module schematic diagram of fig. 7, in which the column parallelism is 8 and the row parallelism is 3, map point cloud data in the DDR needs to be loaded to each of 8 on-chip cache spaces BRAM1, BRAM2, and BRAM3 … BRAM8 under XYZ 3 dimensions in advance. Each key frame is thus operated to solidify the local map of the current key frame into M32768 points, and then written into 8 on-chip cache spaces BRAM1, BRAM2 and BRAM3 … BRAM8 in XYZ 3 dimensions.

For each key frame, K adjacent feature points are searched on a local map, for example, fig. 8 is a schematic diagram of a distance calculation module, in the diagram, 8 BRAMs under 3 dimensions correspond to 8 BRAM storage spaces in a storage management module, and each BRAM is followed by a distance calculation subprocess of Cal _ PE, and in cooperation with corresponding 1 time delay beat (small triangle symbol representation), a systolic array is formed, manhattan distance calculation is performed in the row-column direction at the same time, and the column parallelism is 8 and the row parallelism is 3. And after extracting the feature points on the key frame, inputting the extracted feature points into the pulse array, sequentially starting BRAM reading operation in each row and column direction, and sequentially reading out 3 coordinate values of xyz from each row and column and sequentially solving the absolute value of difference values of the 3 coordinate values of xyz corresponding to the input feature points, and accumulating and summing the absolute values. Each column is delayed by one clock beat in time of the previous column, so that a pipeline structure is formed, and meanwhile, the condition that a data path of a characteristic point is too long and a fan-out is too large to cause timing violation can be avoided. When reading in the row direction, a first beat reads X1 data of BRAM1 space, a 2 nd beat reads X9 of BRAM1 space, X2 of BRAM2 space, a 3 rd beat reads X17 of BRAM1 space, X10 of BRAM2 space, and X3 of BRAM3 space, all data in BRAM1, BRAM2, and BRAM3 … BRAM8 space which are sequentially cycled, all data in BRAM1, BRAM2, and BRAM3 … BRAM8 space of Y dimension is delayed by one beat compared with X dimension on the whole, and all data in BRAM1, BRAM2, BRAM3 … and BRAM8 space of Z dimension is delayed by one beat compared with Y dimension on the whole. Similarly, when reading in the column direction, the first beat reads out the X1 data, the second beat reads out the Y1 data and the X2 data, and the third beat reads out the Z1 data, the Y2 data and the X3 data, and the reading operation of the data in each BRAM space in XYZ 3 dimensions is completed in sequence in a traversing manner. And reading out the data of each BRAM space on 3 rows and 8 columns, and simultaneously transmitting the data and the corresponding coordinate data of the extracted feature points to a Cal _ PE calculation module for calculating the distance. Fig. 9 is a schematic diagram of a calpe calculation process, corresponding to a calculation process of each row with 3 dimensions, where the row is XYZ coordinate values extracted by the current frame corner extraction module, X ' Y ' Z ' is an XYZ coordinate value corresponding to a feature point of a previous frame read in the BRAM space, an absolute difference value of the X dimension is output after 2 coordinate values in the X dimension are determined, an absolute difference value of the Y dimension is output after 2 coordinate values in the Y dimension are determined and accumulated, and an absolute difference value of the Z dimension is output after 2 coordinate values in the Z dimension are determined and accumulated, so as to obtain a manhattan distance between each feature point extracted by the key frame feature point extraction module and a map frame feature point. Compared with the Euclidean distance, the distance calculation method can avoid the complex calculation process of calculating the square and the open square root in an FPGA chip, and simultaneously consumes less hardware resources.

FIG. 10 is a schematic diagram of the K nearest neighbor comparison module. After each feature angular point is extracted from the key frame, a set of K feature points closest to the key frame needs to be found out from a set of feature points of the local map frame, so that the final manhattan distance result is transmitted to a K nearest comparison module through accumulation of corresponding dimension distances of each feature point, and coordinate values of the K nearest points are solved. In the invention, K is 3, 3 levels of pipeline structures are needed to complete the search of 3 nearest points, and only the minimum distance in the current level of pipeline structure and the coordinates of the nearest points corresponding to the minimum distance can be found in each level of pipeline structure. In the figure, the process of solving the 1 st adjacent characteristic point corresponds to the 1 st level running water, 8 com _ PE comparison modules are totally connected end to end, results of result1, result2 and result3 … result8 output by the calculation module are correspondingly and sequentially received, comparison is sequentially carried out, the smaller result after comparison, the address addr1 corresponding to the current BRAM and the serial number num1 of the current BRAM are output, transmission is carried out rightward in data flow, and the 1 st adjacent characteristic point is obtained until the cycle comparison of the rightmost com _ PE module is completed. And simultaneously outputting a larger result of each com _ PE comparison module, a corresponding current BRAM address addr2 and a column serial number num2, and transmitting the result downwards in the data stream. Note in particular that the first com _ PE comparison module in each stage of the flowing water needs to distinguish the first comparison from the non-first comparison, and the mark signal distinction can be designed. Since result1 has only one result input at beats 1 through 8, a larger constant value can be passed and compared, and the larger constant is designed to be larger than all distances, so as to avoid passing false results. And from the 9 th beat, the input result1 needs to be compared with the feedback result output by the 8 th com _ PE module. Similar to the first-level pipeline, each com _ PE module in the 2 nd-level pipeline receives as input the larger result of the com _ PE comparison of the previous-level pipeline, and the output is the smaller result of the two larger results, and the output corresponds to the current BRAM address addr1 and the column serial number num1, and transmits to the right in the data pipeline until the 2 nd adjacent characteristic point is obtained after the loop comparison of the rightmost com _ PE module is completed. And simultaneously outputting a larger result of each com _ PE comparison module, a corresponding current BRAM address addr2 and a column serial number num2, and transmitting the result downwards in the data stream. The same principle of the 3 rd level pipeline can find the 3 rd adjacent characteristic point and the corresponding current BRAM address addr1 and the column sequence number num 1. Because there is no 4 th stage of the pipeline, the large result of the com _ PE module comparison in the 3 rd stage of the pipeline is negligible and is not output to the next stage of the pipeline. The results of column direction passing to the right in the com _ PE array of the 3-level pipeline are smaller and smaller, while the results of row direction passing down are a larger and larger trend. And after each characteristic angular point of the current frame is extracted, the acquisition of 3 adjacent points K can be completed only by reading and comparing the characteristic point set of the previous frame one by one, all com _ PEs in the row and column directions are compared simultaneously and are completely synchronous with the data flow rate of the distance calculation module, each point is synchronously calculated and compared, and the realization process of KNN is accelerated by efficient parallel calculation. In the process of the comparison process, the module inputs 2 sets of variables and outputs 2 sets of variables, wherein the output ports { addr1, com _ result1 and num1} are smaller values corresponding to comparison, and the output ports { addr2, com _ result2 and num2} are larger values corresponding to comparison.

After the K nearest comparison module performs one round of comparison, the distance between the nearest feature points and the address of the BRAM space where the 3 feature points are located and the column serial number where the feature points are located are output, and the 3-dimensional coordinates of the feature points in the corresponding BRAM address are read out simultaneously to perform data bit splicing { x, y, z }16bit +16bit +16bit ═ 48bit according to the BRAM address and the column serial number where the feature points are located in sequence, so that the output result 48bit x3 data of each feature point needs to be written back to the DDR, and the processing of the back-end PS service logic is facilitated. FIG. 11 shows the K-neighbor feature result write-back module. Because the burst characteristic of the DDR needs to be adapted, an FIFO is designed in a write-back module and used for temporarily storing a 3-dimensional coordinate value of a K-adjacent feature result, a bit width of the FIFO is designed to be 64bit, and a depth of the FIFO is 32, when a data volume of a burst length is stored for one time, a burst length of 128Byte is obtained, the burst write operation of the DDR is started, the change of the data volume in the FIFO is monitored in real time, when the data in the FIFO is enough for one time of burst, the burst write operation of the DDR is started, a data transmission flow similar to the DMA is performed, every time a feature point is input in a KNN process, and the searched coordinates of the K nearest feature points are sequentially obtained and written back to the DDR. And then, after the feature point information of the current frame and the coordinate result information of the feature points adjacent to the 2 frames K are obtained periodically at the PS terminal, line and surface are constructed, the distance between points and the line and the surface is calculated, the distance item is optimized and solved, the pose is estimated and calculated, and the like.

Those skilled in the art will appreciate that, in addition to implementing the system and its various devices, modules, units provided by the present invention as pure computer readable program code, the system and its various devices, modules, units provided by the present invention can be fully implemented by logically programming method steps in the form of logic gates, switches, application specific integrated circuits, programmable logic controllers, embedded microcontrollers and the like. Therefore, the system and various devices, modules and units thereof provided by the invention can be regarded as a hardware component, and the devices, modules and units included in the system for realizing various functions can also be regarded as structures in the hardware component; means, modules, units for performing the various functions may also be regarded as structures within both software modules and hardware components for performing the method.

In the description of the present application, it is to be understood that the terms "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", and the like indicate orientations or positional relationships based on those shown in the drawings, and are only for convenience in describing the present application and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation, and be operated, and thus, should not be construed as limiting the present application.

The foregoing description of specific embodiments of the present invention has been presented. It is to be understood that the present invention is not limited to the specific embodiments described above, and that various changes or modifications may be made by one skilled in the art within the scope of the appended claims without departing from the spirit of the invention. The embodiments and features of the embodiments of the present application may be combined with each other arbitrarily without conflict.

Claims

1. A point cloud feature matching system based on FPGA is characterized by comprising: a processing system portion and a programmable logic portion;

2. The FPGA-based point cloud feature matching system of claim 1, wherein the programmable logic comprises:

3. The FPGA-based point cloud feature matching system of claim 1, wherein the processing system portion comprises:

4. The FPGA-based point cloud feature matching system of claim 1, wherein the memory comprises a DDR.

5. A point cloud feature matching method based on FPGA is characterized by comprising the following steps: a processing system portion and a programmable logic portion;

6. The FPGA-based point cloud feature matching method of claim 5, wherein the programmable logic portion comprises:

7. The FPGA-based point cloud feature matching method of claim 5, wherein the processing system comprises in part:

8. The FPGA-based point cloud feature matching method of claim 5, wherein the memory comprises a DDR.

9. A path planning system comprising the FPGA-based point cloud feature matching system of any one of claims 1 to 4.

10. The path planning system of claim 9, wherein the point cloud data comprises a static hmap map file.