CN113536024A - ORB-SLAM relocation feature point retrieval acceleration method based on FPGA - Google Patents
ORB-SLAM relocation feature point retrieval acceleration method based on FPGA Download PDFInfo
- Publication number
- CN113536024A CN113536024A CN202110918561.XA CN202110918561A CN113536024A CN 113536024 A CN113536024 A CN 113536024A CN 202110918561 A CN202110918561 A CN 202110918561A CN 113536024 A CN113536024 A CN 113536024A
- Authority
- CN
- China
- Prior art keywords
- slam
- output
- fpga
- orb
- feature point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F7/00—Methods or arrangements for processing data by operating upon the order or content of the data handled
- G06F7/38—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
- G06F7/48—Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
- G06F7/50—Adding; Subtracting
- G06F7/501—Half or full adders, i.e. basic adder cells for one denomination
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06M—COUNTING MECHANISMS; COUNTING OF OBJECTS NOT OTHERWISE PROVIDED FOR
- G06M1/00—Design features of general application
- G06M1/27—Design features of general application for representing the result of count in the form of electric signals, e.g. by sensing markings on the counter drum
- G06M1/272—Design features of general application for representing the result of count in the form of electric signals, e.g. by sensing markings on the counter drum using photoelectric means
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Abstract
The invention provides an ORB _ SLAM relocation feature point retrieval acceleration method based on an FPGA, which comprises the following steps: s1, buffering the input picture and extracting a descriptor; s2, entering a working space Workspace, and solving the distance of the nodes through a computing circuit; s3, the results of each calculation circuit flow to a parallel comparison circuit together to find the point where the minimum value is located; s4, finally, judging whether the node is a bottom layer or not, if so, finishing the search to obtain a final node; s5, each node has an offset value for finding the address of the child node to obtain the key frame, and then relocates according to the key frame set. The invention adds the approximate unit AU in front of the counter to form the circuit structure of the accumulation parallel counter APC for the consumption of the circuit resource, based on the approximate calculation principle, the invention can reduce the consumption of the hardware resource and improve the circuit calculation speed under the condition that the bit stream is longer and a plurality of same structures need to be copied.
Description
Technical Field
The invention relates to the field of image processing, in particular to an ORB _ SLAM relocation feature point retrieval accelerating method based on an FPGA.
Background
When slam navigation positioning fails, the system starts repositioning, and based on one frame currently shot by the system, similar key frames are found from a frame library according to the characteristic points of the frame to update map point matching of the current frame, so that repositioning operation is performed. In the existing ORB relocation algorithm, the frame library is constructed by using a structure of k-means (k-means) tree, as shown in fig. 1. Extracting and calculating the feature points of all training pictures into descriptors, and clustering for d times of k-means to obtain a retrieval tree with k-degree of d layers, wherein the center of each cluster is represented by an average distance, and the associated frames are recorded in the class at the bottommost layer, so that each relocation algorithm finds the class to which all the feature points (256-bit descriptors) extracted from the current frame finally belong through the retrieval tree, each layer can calculate the distance, namely the hamming distance, from all the sub-nodes of each layer, the minimum value is the direction of the next layer search path until the last layer, and calculates similarity scores according to the finally found associated frames to screen out some key frames to complete the computation of relocation.
However, when the number of frames in our library becomes huge, the depth and breadth of the search tree become large, and the speed of feature point search becomes slow accordingly, and although a forward index is maintained in the ORB algorithm for fast search, the real-time performance is difficult to achieve in the case of a huge frame library.
Disclosure of Invention
The invention aims to at least solve the technical problems in the prior art, and particularly creatively provides an ORB _ SLAM relocation feature point retrieval acceleration method based on an FPGA.
In order to achieve the above object, the present invention provides an acceleration method for retrieving ORB _ SLAM relocation feature points based on FPGA, which includes:
s1, buffering the input picture and extracting a descriptor;
s2, entering a working space Workspace, and solving the distance of the nodes through a computing circuit;
s3, the results of each calculation circuit flow to a parallel comparison circuit together to find the point where the minimum value is located;
s4, finally, judging whether the node is a bottom layer or not, if so, finishing the search to obtain a final node;
s5, each node has an offset value for finding the address of the child node to obtain the key frame, and then relocates according to the key frame set.
In a preferred embodiment of the present invention, the calculation circuit includes:
firstly, the data passes through an exclusive-OR gate and then passes through an accumulation parallel counter APC or a parallel counter PC;
the accumulated parallel counter APC is added with an X-level approximate unit AU in front of the counter PC;
the approximation unit AU comprises: the first-level AU approximate unit is an AND gate and an OR gate of a column;
the parallel counter PC comprises a plurality of full adders.
In a preferred embodiment of the present invention, the parallel counter PC includes:
every three bits are sent to a full adder in a group, each full adder has a weight, wherein three inputs are the same weight, the output home bit and Sum send values to the full adder with the same weight, and the carry Cout to the adjacent high bit sends values to the high-level weight; the intermediate result is 'beat' by the same operation, then the full adder with all existing values input is calculated, the intermediate result is beat until the highest bit of the output result is calculated, and finally each arrow output is a bit.
In a preferred embodiment of the present invention, the parallel counter PC further includes:
v denotes the output bit width, and the input bit width N is 2vThe number of full adders consumed per added bit of output becomes 2 times plus v-1, i.e.:
f(v)=2*f(v-1)+v-1
wherein f (v) represents the number of full adders required for the output bit width v, and f (v-1) represents the number of full adders required for the output bit width v-1.
In a preferred embodiment of the invention, the gate level resources consumed by the parallel counter PC are:
g(v)=(2v-v-1)*5=(N-log2 N-1)*5
where N represents the input bit width and v represents the output bit width.
In a preferred embodiment of the present invention, the cumulative parallel counter APC is added to the X-level approximation unit AU before the full adder, and the consumed gate-level resources are:
where N represents the input bit width and X represents the number of levels of the approximation unit.
In a preferred embodiment of the present invention, the full adder includes:
two exclusive-or gates, two and gates and one or gate, and the specific logic expression is as follows:
Sum=(A^B)^Cin
Cout=(A&B)|((A^B)&Cin)
a, B, Cin is the input of full adder, which is the added number, and adjacent low-order carry number, Sum, Cout are the output of full adder, which is the carry number of output local Sum and adjacent high-order carry number, the exclusive OR operation is represented, the AND operation is represented, and the OR operation is represented.
In a preferred embodiment of the present invention, step S5 includes the following steps:
s51, randomly selecting I feature points from the key frame set, wherein I is a positive integer greater than or equal to 1, and calculating the pose (alpha, gamma) of the current frame, wherein alpha represents a rotation angle, and gamma is a translation amount;
s52, calculating the reprojection error of the rest key frames according to the pose (alpha, gamma) of the step S51, and if the calculated reprojection error is less than or equal to the set error threshold, the point is a key point;
s53, counting the number of key points and corresponding poses (alpha, gamma);
s54, using the pose (alpha, gamma) of the step S53 as the initial pose value to locally optimize the pose of the current frame, wherein the optimized objective function is as follows:
wherein e isxFor the x-th reprojection error observed by the camera, | | · | | represents norm, hxIs the number of observations; o represents the number of re-projections observed by the camera;
and S55, if the optimized key points exceed the set key points, the relocation is considered to be successful.
In a preferred embodiment of the present invention, the method for calculating the pose (α, γ) of the current frame in step S51 is:
wherein I represents the total number of the characteristic points in the reference frame;
(Xi,Yi) Representing the position coordinates of the ith feature point in the current frame;
(Xj,Yj) Representing the position coordinates of the j-th characteristic point in the current frame; j is not equal to i;
(Xi′,Yi') indicates the position coordinates of the ith feature point in the reference frame corresponding to the ith feature point in the current frame;
(Xj′,Yj') indicates the position coordinates of the jth characteristic point in the reference frame corresponding to the ith characteristic point in the current frame;
(x0,y0) Represents a reference starting point;
[Xi-x0,Yi-y0]a vector representing an ith feature point in the current frame;
[Xj-x0,Yj-y0]a vector representing a jth feature point in the current frame;
|Xi-x0,Yi-y0| represents a distance value of an ith feature point in the current frame;
|Xj-x0,Yj-y0the distance value of the jth characteristic point in the current frame;
in a preferred embodiment of the present invention, the calculation method of the reprojection error of the remaining keyframes in step S52 is:
wherein ε represents the equilibrium coefficient;
S(α,γ)representing the degree of shift of the pose (α, γ) over the remaining keyframes;
Kkrepresenting the reprojection error of the kth remaining keyframe;
when it is KkTau is less than or equal to tau, and tau represents a set error threshold value, and the selected characteristic points are key points;
when it is KkIf the value is more than tau, the selected characteristic point is not a key point.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
1) the invention adopts the circuit structure of a parallel counter PC for solving the Hamming distance of two 256-bit characteristic points to form the accelerated calculation in a pipeline form.
2) The invention adds the approximate unit AU in front of the counter to form the circuit structure of the accumulation parallel counter APC for the consumption of the circuit resource, based on the approximate calculation principle, the invention can reduce the consumption of the hardware resource and improve the circuit calculation speed under the condition that the bit stream is longer and a plurality of same structures need to be copied.
3) Aiming at the characteristic that the upper layer of the k-means tree has a coarse clustering granularity and the lower layer has a fine clustering granularity, the upper layer adopts a multi-level approximate unit, and the lower layer adopts one layer or does not use an approximate unit.
4) The independence of distance calculation for each sub-node is designed into k workers, wherein k is the number of clustering clusters, and each Worker has a copy of an accelerating circuit and controls reading input and calculation of respective data in parallel.
5) The invention adopts a dynamic random access memory DRAM to store the characteristic points of the frame Bank, and according to the read-write parallel characteristic of each memory Bank in the DRAM, each sub-node respectively has different banks to improve the data reading speed, and each Worker is responsible for managing one Bank.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a block diagram of a prior art feature point search tree;
FIG. 2 is an overall block diagram of the present invention;
FIG. 3 is a schematic diagram of a Hamming distance parallel counter circuit according to the present invention (16-bit input is an example);
FIG. 4 is a schematic diagram of a gate level circuit for a full adder;
FIG. 5 is a schematic diagram of a parallel counting circuit employing an approximation unit;
FIG. 6 is a diagram illustrating a storage form of search tree data in the DRAM.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
As shown in fig. 1, a search tree is trained offline with all feature points of the frame library, each feature point being represented by a 256-bit descriptor, and the tree nodes being represented by the mean of all descriptors belonging to the class. And finding the belonged class layer by layer during retrieval, namely finding the minimum Hamming distance until the lowest layer. The feature points are feature points in a specific picture, and the nodes refer to nodes in a tree structure and generally represent means belonging to the class.
The invention provides an ORB _ SLAM relocation feature point retrieval accelerating method based on an FPGA, which comprises the following steps:
s1, buffering the input picture and extracting a descriptor;
s2, entering a working space Workspace, and solving the distance of the nodes through a computing circuit;
s3, the results of each calculation circuit flow to a parallel comparison circuit together to find the point where the minimum value is located;
s4, finally, judging whether the node is a bottom layer or not, if so, finishing the search to obtain a final node;
s5, each node has an offset value for finding the address of the child node to obtain the key frame, and then relocates according to the key frame set.
In a preferred embodiment of the present invention, the calculation circuit includes:
firstly, the data passes through an exclusive-OR gate and then passes through an accumulation parallel counter APC or a parallel counter PC;
the accumulated parallel counter APC is added with an X-level approximate unit AU in front of the counter PC;
the approximation unit AU comprises: the first-level AU approximate unit is an AND gate and an OR gate of a column;
the parallel counter PC comprises a plurality of full adders.
In a preferred embodiment of the present invention, the parallel counter PC includes:
every three bits are sent to a full adder in a group, each full adder has a weight, wherein three inputs are the same weight, the output home bit and Sum send values to the full adder with the same weight, and the carry Cout to the adjacent high bit sends values to the high-level weight; the intermediate result is 'beat' by the same operation, then the full adder with all existing values input is calculated, the intermediate result is beat until the highest bit of the output result is calculated, and finally each arrow output is a bit.
In a preferred embodiment of the present invention, the parallel counter PC further includes:
v denotes the output bit width, and the input bit width N is 2vThe number of full adders consumed per added bit of output becomes 2 times plus v-1, i.e.:
f(v)=2*f(v-1)+v-1
wherein f (v) represents the number of full adders required for the output bit width v, and f (v-1) represents the number of full adders required for the output bit width v-1.
In a preferred embodiment of the invention, the gate level resources consumed by the parallel counter PC are:
g(v)=(2v-v-1)*5=(N-log2 N-1)*5
where N represents the input bit width and v represents the output bit width.
In a preferred embodiment of the present invention, the cumulative parallel counter APC is added to the X-level approximation unit AU before the full adder, and the consumed gate-level resources are:
where N represents the input bit width and X represents the number of levels of the approximation unit.
In a preferred embodiment of the present invention, the full adder includes:
two exclusive-or gates, two and gates and one or gate, and the specific logic expression is as follows:
Sum=(A^B)^Cin
Cout=(A&B)|((A^B)&Cin)
a, B, Cin is the input of full adder, which is the added number, and adjacent low-order carry number, Sum, Cout are the output of full adder, which is the carry number of output local Sum and adjacent high-order carry number, the exclusive OR operation is represented, the AND operation is represented, and the OR operation is represented.
FIG. 2 is an overall acceleration structure diagram of the present invention, where an input query is a graph with many feature points for search query, and since there are many inputs and the query cannot be sent all at once, a data structure, i.e. a queue with FIFO feature, is used for storage. The input module is a descriptor which is used for buffering the input pictures and is extracted;
workspace is a main working space, wherein each Worker is responsible for parallelly controlling a node distance to solve, each Worker also corresponds to a cache (cache) and a computing circuit (PC or APC), the cache is used for buffering sub-nodes which are accessed once, and because the distance between adjacent feature points is relatively small, some similar access paths can exist; when no data exists in the cache, the DRAM is accessed; the result of each calculation circuit will flow to a parallel comparison circuit together, find out the point where the minimum value is located, judge whether it is the bottom layer at last, if yes, search and finish, each node will have an deviant, is used for looking for and solving the address of the sub-node, then relocate.
DRAM is a storage structure in FPGA, can be understood as the memory in computer, and cache is a cache, and is smaller but has faster access speed than DRAM. The Cache stores a part of data of the DRAM, is small in size, but the access speed of the Cache is higher than that of accessing the DRAM, so that the data can be buffered in the Cache when the DRAM is accessed (due to the small size, some old data can be covered), the Cache can be accessed firstly when the data is accessed every time, and the DRAM is accessed and the data is buffered if corresponding data does not exist. The mechanism is mainly characterized in that the computer has personality, and data accessed recently is likely to be accessed again, so that the data access speed can be improved.
FIG. 3 shows a core computing circuit of the present invention, which employs a parallel counting structure, taking 16-bit input as an example. Each unit is a full adder, combinational logic is adopted, and the specific logic expression is as follows: sum ═ B ^ Cin, Cout ^ a & B | ((a ^ B) & Cin), consuming a total of two exclusive or gates, two and gates and one or gate, as shown in fig. 4.
Let output v denote that the output requires several bits to represent, i.e. the output bit width, and the input bit width N is 2vFrom FIG. 3, it can be seen that for each bit of output increase, the input is doubled; the number of full adders consumed per added bit of output becomes 2 times plus v-1, i.e.:
f(v)=2*f(v-1)+v-1
wherein f (2) ═ 1
Then the gate level resources consumed are:
g(v)=(2v-v-1)*5=(N-log2 N-1)*5
where N represents the input bit width and v represents the number of bits of the result, i.e., the output bit width. Based on the principle of approximate calculation, the APC adds a first-stage AU approximation unit, i.e. a column of and gates and or gates, before the full adder, as shown in fig. 5, so that the calculation result can be obtained with a small error (the experimental verification error is within 5), and the resource consumption can be reduced, where the consumed gate-level resource becomes:
similarly, APC adds an X-level AU approximation unit before the full adder, consuming gate-level resources becoming:
for the characteristic that the upper layer of the k-means tree has coarse clustering granularity and the lower layer has fine clustering granularity, the upper layer adopts a multi-level approximate unit, and the lower layer adopts one layer or does not use an approximate unit. The nodes of the same hierarchy adopt the calculation circuits with the same structure, and the nodes are parallel.
For 256-bit input, the one-stage (gate-stage) approximation unit can reduce resource consumption by about 50%, and meanwhile, the calculation speed is improved.
After two 256 bits of distance have entered the circuit, the circuit comprises an exclusive or gate before the calculation circuit PC or the calculation circuit APC. Firstly, the exclusive OR is solved by using combinational logic, and the number of 1 solved for the 256 bits after the exclusive OR is the distance between the two bits. The 256 bits after XOR first enter the approximate unit to halve the input, e.g. after one stage AU the input will be equivalent to 128 bits, but the weight of each bit becomes 21. Then using register non-blocking assignment to make one beat of intermediate result on clock rising edge, then feeding every third bit into full adder (firstly feeding weight 2)0Full adders) each having a weight, wherein three inputs are the same weight, Sum of the output sends a value to the full adders of the same level weight, and Cout of the carry sends a value to the higher level weight; the intermediate result is 'beaten' by the same operation, then the full adder with the existing value of the input is calculated, the intermediate result is beaten until the highest bit of the output result is calculated, each arrow of the final output is a bit (0 or 1) as shown in figure 5 by taking 16 bits as an example, the output can be represented by 4 bits, the power of 2 indicates which bit of the 4 bits the arrow points to, the circuit is directly connected, the actual calculation is that 0 or 1 represents a coefficient, and the power of 2 represents a weight and then the weights are accumulated. This forms a pipeline inside the computation circuit, the first descriptor will get the result in the 11 th cycle, and the computation results of one descriptor will be obtained in each cycle. Since the distance calculation with each sub-node is parallelAfter the result is obtained, all the data are sent to a comparison tree, and the minimum value is obtained to determine where the next layer of path goes.
The distance between clusters at the upper layer of the retrieval tree is larger, the difference is smaller and smaller when going downwards, the stage number of the AU calculation unit adopted by each layer is controlled according to the actual condition of the data set, the difference at the upper layer is larger, two to three layers can be adopted, and whether an approximate unit is adopted or not is determined according to the condition of the data set at the bottom layer.
For tree storage inside DRAM, as shown in fig. 6, DRAM shares one I/O control port per Bank, but reads and writes inside each Bank can be done in parallel. And each subnode is stored in a corresponding Bank, and each Worker can read the own Bank in parallel when data does not need to access the DRAM in the cache.
In a preferred embodiment of the present invention, step S5 includes the following steps:
s51, randomly selecting I feature points from the key frame set, wherein I is a positive integer greater than or equal to 1, and calculating the pose (alpha, gamma) of the current frame, wherein alpha represents a rotation angle, and gamma is a translation amount;
s52, calculating the reprojection error of the rest key frames according to the pose (alpha, gamma) of the step S51, and if the calculated reprojection error is less than or equal to the set error threshold, the point is a key point;
s53, counting the number of key points and corresponding poses (alpha, gamma);
s54, using the pose (alpha, gamma) of the step S53 as the initial pose value to locally optimize the pose of the current frame, wherein the optimized objective function is as follows:
wherein e isxFor the x-th reprojection error observed by the camera, | | · | | represents norm, hxIs the number of observations; o represents the number of re-projections observed by the camera;
and S55, if the optimized key points exceed the set key points, the relocation is considered to be successful.
In a preferred embodiment of the present invention, the method for calculating the pose (α, γ) of the current frame in step S51 is:
wherein I represents the total number of the characteristic points in the reference frame;
(Xi,Yi) Representing the position coordinates of the ith feature point in the current frame;
(Xj,Yj) Representing the position coordinates of the j-th characteristic point in the current frame; j is not equal to i;
(Xi′,Yi') indicates the position coordinates of the ith feature point in the reference frame corresponding to the ith feature point in the current frame;
(Xj′,Yj') indicates the position coordinates of the jth characteristic point in the reference frame corresponding to the ith characteristic point in the current frame;
(x0,y0) Represents a reference starting point;
[Xi-x0,Yi-y0]a vector representing an ith feature point in the current frame;
[Xj-x0,Yj-y0]a vector representing a jth feature point in the current frame;
|Xi-x0,Yi-y0| represents a distance value of an ith feature point in the current frame;
|Xj-x0,Yj-y0the distance value of the jth characteristic point in the current frame;
in a preferred embodiment of the present invention, the calculation method of the reprojection error of the remaining keyframes in step S52 is:
wherein ε represents the equilibrium coefficient;
S(α,γ)representing the degree of shift of the pose (α, γ) over the remaining keyframes;
Kkrepresenting the reprojection error of the kth remaining keyframe;
when it is KkTau is less than or equal to tau, and tau represents a set error threshold value, and the selected characteristic points are key points;
when it is KkIf the value is more than tau, the selected characteristic point is not a key point.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.
Claims (8)
1. An ORB _ SLAM relocation feature point retrieval acceleration method based on FPGA is characterized by comprising the following steps:
s1, buffering the input picture and extracting a descriptor;
s2, entering a working space Workspace, and solving the distance of the nodes through a computing circuit;
s3, the results of each calculation circuit flow to a parallel comparison circuit together to find the point where the minimum value is located;
s4, finally, judging whether the node is a bottom layer or not, if so, finishing the search to obtain a final node;
s5, each node has an offset value for finding the address of the child node to obtain the key frame, and then relocates according to the key frame set.
2. The FPGA-based ORB-SLAM relocation feature point retrieval acceleration method according to claim 1, wherein the calculation circuit comprises:
firstly, the data passes through an exclusive-OR gate and then passes through an accumulation parallel counter APC or a parallel counter PC;
the accumulated parallel counter APC is added with an X-level approximate unit AU in front of the counter PC;
the approximation unit AU comprises: the first-level AU approximate unit is an AND gate and an OR gate of a column;
the parallel counter PC comprises a plurality of full adders.
3. The FPGA-based ORB _ SLAM relocation feature point retrieval acceleration method according to claim 2, wherein the parallel counter PC comprises:
every three bits are sent to a full adder in a group, each full adder has a weight, wherein three inputs are the same weight, the output home bit and Sum send values to the full adder with the same weight, and the carry Cout to the adjacent high bit sends values to the high-level weight; the intermediate result is 'beat' by the same operation, then the full adder with all existing values input is calculated, the intermediate result is beat until the highest bit of the output result is calculated, and finally each arrow output is a bit.
4. The FPGA-based ORB _ SLAM relocation feature point retrieval acceleration method according to claim 2, wherein the parallel counter PC further comprises:
v denotes the output bit width, and the input bit width N is 2vThe number of full adders consumed per added bit of output becomes 2 times plus v-1, i.e.:
f(v)=2*f(v-1)+v-1
wherein f (v) represents the number of full adders required for the output bit width v, and f (v-1) represents the number of full adders required for the output bit width v-1.
5. The FPGA-based ORB _ SLAM relocation feature point retrieval acceleration method according to claim 2, wherein gate-level resources consumed by the parallel counter PC are:
g(v)=(2v-v-1)*5=(N-log2 N-1)*5
where N represents the input bit width and v represents the output bit width.
7. The FPGA-based ORB _ SLAM relocation feature point retrieval acceleration method according to claim 2, wherein the full adder comprises:
two exclusive-or gates, two and gates and one or gate, and the specific logic expression is as follows:
Sum=(A^B)^Cin
Cout=(A&B)|((A^B)&Cin)
a, B, Cin is the input of full adder, which is the added number, and adjacent low-order carry number, Sum, Cout are the output of full adder, which is the carry number of output local Sum and adjacent high-order carry number, the exclusive OR operation is represented, the AND operation is represented, and the OR operation is represented.
8. The method for accelerating the retrieval of ORB _ SLAM relocation feature points based on FPGA of claim 1, wherein the step S5 comprises the following steps:
s51, randomly selecting I feature points from the key frame set, wherein I is a positive integer greater than or equal to 1, and calculating the pose (alpha, gamma) of the current frame, wherein alpha represents a rotation angle;
s52, calculating the reprojection error of the rest key frames according to the pose (alpha, gamma) of the step S51, and if the calculated reprojection error is less than or equal to the set error threshold, the point is a key point;
s53, counting the number of key points and corresponding poses (alpha, gamma);
s54, using the pose (alpha, gamma) of the step S53 as the initial pose value to locally optimize the pose of the current frame, wherein the optimized objective function is as follows:
wherein e isxFor the x-th reprojection error observed by the camera, | | · | | represents norm, hxIs the number of observations; o represents the number of re-projections observed by the camera;
and S55, if the optimized key points exceed the set key points, the relocation is considered to be successful.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110918561.XA CN113536024B (en) | 2021-08-11 | 2021-08-11 | ORB-SLAM relocation feature point retrieval acceleration method based on FPGA |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110918561.XA CN113536024B (en) | 2021-08-11 | 2021-08-11 | ORB-SLAM relocation feature point retrieval acceleration method based on FPGA |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113536024A true CN113536024A (en) | 2021-10-22 |
CN113536024B CN113536024B (en) | 2022-09-09 |
Family
ID=78091542
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110918561.XA Active CN113536024B (en) | 2021-08-11 | 2021-08-11 | ORB-SLAM relocation feature point retrieval acceleration method based on FPGA |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113536024B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019057179A1 (en) * | 2017-09-22 | 2019-03-28 | 华为技术有限公司 | Visual slam method and apparatus based on point and line characteristic |
CN109919825A (en) * | 2019-01-29 | 2019-06-21 | 北京航空航天大学 | A kind of ORB-SLAM hardware accelerator |
CN110070580A (en) * | 2019-03-29 | 2019-07-30 | 南京华捷艾米软件科技有限公司 | Based on the matched SLAM fast relocation method of local key frame and image processing apparatus |
CN110782494A (en) * | 2019-10-16 | 2020-02-11 | 北京工业大学 | Visual SLAM method based on point-line fusion |
CN111583093A (en) * | 2020-04-27 | 2020-08-25 | 西安交通大学 | Hardware implementation method for ORB feature point extraction with good real-time performance |
CN111795704A (en) * | 2020-06-30 | 2020-10-20 | 杭州海康机器人技术有限公司 | Method and device for constructing visual point cloud map |
CN112381890A (en) * | 2020-11-27 | 2021-02-19 | 上海工程技术大学 | RGB-D vision SLAM method based on dotted line characteristics |
CN112991447A (en) * | 2021-03-16 | 2021-06-18 | 华东理工大学 | Visual positioning and static map construction method and system in dynamic environment |
CN113160130A (en) * | 2021-03-09 | 2021-07-23 | 北京航空航天大学 | Loop detection method and device and computer equipment |
-
2021
- 2021-08-11 CN CN202110918561.XA patent/CN113536024B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2019057179A1 (en) * | 2017-09-22 | 2019-03-28 | 华为技术有限公司 | Visual slam method and apparatus based on point and line characteristic |
CN109919825A (en) * | 2019-01-29 | 2019-06-21 | 北京航空航天大学 | A kind of ORB-SLAM hardware accelerator |
CN110070580A (en) * | 2019-03-29 | 2019-07-30 | 南京华捷艾米软件科技有限公司 | Based on the matched SLAM fast relocation method of local key frame and image processing apparatus |
CN110782494A (en) * | 2019-10-16 | 2020-02-11 | 北京工业大学 | Visual SLAM method based on point-line fusion |
CN111583093A (en) * | 2020-04-27 | 2020-08-25 | 西安交通大学 | Hardware implementation method for ORB feature point extraction with good real-time performance |
CN111795704A (en) * | 2020-06-30 | 2020-10-20 | 杭州海康机器人技术有限公司 | Method and device for constructing visual point cloud map |
CN112381890A (en) * | 2020-11-27 | 2021-02-19 | 上海工程技术大学 | RGB-D vision SLAM method based on dotted line characteristics |
CN113160130A (en) * | 2021-03-09 | 2021-07-23 | 北京航空航天大学 | Loop detection method and device and computer equipment |
CN112991447A (en) * | 2021-03-16 | 2021-06-18 | 华东理工大学 | Visual positioning and static map construction method and system in dynamic environment |
Non-Patent Citations (4)
Title |
---|
AYOUB MAMRI等: "ORB-SLAM accelerate on heterogeneous parallel architectures", 《E3S WEB OF CONFERENCES 229,01055(2021)》 * |
WEIKANG FANG等: "FPGA-based ORB feature extraction for real-time visual SLAM", 《2017 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY(ICFPT)》 * |
唐醅林: "基于ORB-SLAM的特征匹配与建图方法研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 * |
张超凡: "基于多目视觉与惯导融合的SLAM方法研究", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 * |
Also Published As
Publication number | Publication date |
---|---|
CN113536024B (en) | 2022-09-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Wang et al. | Suppressing model overfitting in mining concept-drifting data streams | |
US9208374B2 (en) | Information processing apparatus, control method therefor, and electronic device | |
He et al. | Queryprop: Object query propagation for high-performance video object detection | |
Mayer et al. | Hype: Massive hypergraph partitioning with neighborhood expansion | |
Zhu | Dynamic feature pyramid networks for object detection | |
CN111160461A (en) | Fuzzy clustering-based weighted online extreme learning machine big data classification method | |
US20230161811A1 (en) | Image search system, method, and apparatus | |
CN111597230A (en) | Parallel density clustering mining method based on MapReduce | |
CN107426315B (en) | Distributed cache system Memcached improvement method based on BP neural network | |
CN112906865A (en) | Neural network architecture searching method and device, electronic equipment and storage medium | |
Zhang et al. | COLIN: a cache-conscious dynamic learned index with high read/write performance | |
CN115795065A (en) | Multimedia data cross-modal retrieval method and system based on weighted hash code | |
CN109818971B (en) | Network data anomaly detection method and system based on high-order association mining | |
US9135984B2 (en) | Apparatuses and methods for writing masked data to a buffer | |
CN113536024B (en) | ORB-SLAM relocation feature point retrieval acceleration method based on FPGA | |
Li et al. | Multi-scale global context feature pyramid network for object detector | |
Sun | Personalized music recommendation algorithm based on spark platform | |
CN108897847A (en) | Multi-GPU Density Peak Clustering Method Based on Locality Sensitive Hashing | |
US10997497B2 (en) | Calculation device for and calculation method of performing convolution | |
Ding et al. | An Error-Bounded Space-Efficient Hybrid Learned Index with High Lookup Performance | |
Beutel et al. | A machine learning approach to databases indexes | |
Huang et al. | Unsupervised fusion feature matching for data bias in uncertainty active learning | |
Kargar et al. | E2-NVM: A Memory-Aware Write Scheme to Improve Energy Efficiency and Write Endurance of NVMs using Variational Autoencoders. | |
Lovagnini et al. | CIRCE: Real-time caching for instance recognition on cloud environments and multi-core architectures | |
Lee et al. | StaleLearn: Learning acceleration with asynchronous synchronization between model replicas on PIM |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |