CN113536024A - ORB-SLAM relocation feature point retrieval acceleration method based on FPGA - Google Patents

ORB-SLAM relocation feature point retrieval acceleration method based on FPGA Download PDF

Info

Publication number
CN113536024A
CN113536024A CN202110918561.XA CN202110918561A CN113536024A CN 113536024 A CN113536024 A CN 113536024A CN 202110918561 A CN202110918561 A CN 202110918561A CN 113536024 A CN113536024 A CN 113536024A
Authority
CN
China
Prior art keywords
slam
output
fpga
orb
feature point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110918561.XA
Other languages
Chinese (zh)
Other versions
CN113536024B (en
Inventor
张磊
汪成亮
张寻
任骜
陈咸彰
刘铎
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University
Original Assignee
Chongqing University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University filed Critical Chongqing University
Priority to CN202110918561.XA priority Critical patent/CN113536024B/en
Publication of CN113536024A publication Critical patent/CN113536024A/en
Application granted granted Critical
Publication of CN113536024B publication Critical patent/CN113536024B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/38Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation
    • G06F7/48Methods or arrangements for performing computations using exclusively denominational number representation, e.g. using binary, ternary, decimal representation using non-contact-making devices, e.g. tube, solid state device; using unspecified devices
    • G06F7/50Adding; Subtracting
    • G06F7/501Half or full adders, i.e. basic adder cells for one denomination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06MCOUNTING MECHANISMS; COUNTING OF OBJECTS NOT OTHERWISE PROVIDED FOR
    • G06M1/00Design features of general application
    • G06M1/27Design features of general application for representing the result of count in the form of electric signals, e.g. by sensing markings on the counter drum
    • G06M1/272Design features of general application for representing the result of count in the form of electric signals, e.g. by sensing markings on the counter drum using photoelectric means
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention provides an ORB _ SLAM relocation feature point retrieval acceleration method based on an FPGA, which comprises the following steps: s1, buffering the input picture and extracting a descriptor; s2, entering a working space Workspace, and solving the distance of the nodes through a computing circuit; s3, the results of each calculation circuit flow to a parallel comparison circuit together to find the point where the minimum value is located; s4, finally, judging whether the node is a bottom layer or not, if so, finishing the search to obtain a final node; s5, each node has an offset value for finding the address of the child node to obtain the key frame, and then relocates according to the key frame set. The invention adds the approximate unit AU in front of the counter to form the circuit structure of the accumulation parallel counter APC for the consumption of the circuit resource, based on the approximate calculation principle, the invention can reduce the consumption of the hardware resource and improve the circuit calculation speed under the condition that the bit stream is longer and a plurality of same structures need to be copied.

Description

ORB-SLAM relocation feature point retrieval acceleration method based on FPGA
Technical Field
The invention relates to the field of image processing, in particular to an ORB _ SLAM relocation feature point retrieval accelerating method based on an FPGA.
Background
When slam navigation positioning fails, the system starts repositioning, and based on one frame currently shot by the system, similar key frames are found from a frame library according to the characteristic points of the frame to update map point matching of the current frame, so that repositioning operation is performed. In the existing ORB relocation algorithm, the frame library is constructed by using a structure of k-means (k-means) tree, as shown in fig. 1. Extracting and calculating the feature points of all training pictures into descriptors, and clustering for d times of k-means to obtain a retrieval tree with k-degree of d layers, wherein the center of each cluster is represented by an average distance, and the associated frames are recorded in the class at the bottommost layer, so that each relocation algorithm finds the class to which all the feature points (256-bit descriptors) extracted from the current frame finally belong through the retrieval tree, each layer can calculate the distance, namely the hamming distance, from all the sub-nodes of each layer, the minimum value is the direction of the next layer search path until the last layer, and calculates similarity scores according to the finally found associated frames to screen out some key frames to complete the computation of relocation.
However, when the number of frames in our library becomes huge, the depth and breadth of the search tree become large, and the speed of feature point search becomes slow accordingly, and although a forward index is maintained in the ORB algorithm for fast search, the real-time performance is difficult to achieve in the case of a huge frame library.
Disclosure of Invention
The invention aims to at least solve the technical problems in the prior art, and particularly creatively provides an ORB _ SLAM relocation feature point retrieval acceleration method based on an FPGA.
In order to achieve the above object, the present invention provides an acceleration method for retrieving ORB _ SLAM relocation feature points based on FPGA, which includes:
s1, buffering the input picture and extracting a descriptor;
s2, entering a working space Workspace, and solving the distance of the nodes through a computing circuit;
s3, the results of each calculation circuit flow to a parallel comparison circuit together to find the point where the minimum value is located;
s4, finally, judging whether the node is a bottom layer or not, if so, finishing the search to obtain a final node;
s5, each node has an offset value for finding the address of the child node to obtain the key frame, and then relocates according to the key frame set.
In a preferred embodiment of the present invention, the calculation circuit includes:
firstly, the data passes through an exclusive-OR gate and then passes through an accumulation parallel counter APC or a parallel counter PC;
the accumulated parallel counter APC is added with an X-level approximate unit AU in front of the counter PC;
the approximation unit AU comprises: the first-level AU approximate unit is an AND gate and an OR gate of a column;
the parallel counter PC comprises a plurality of full adders.
In a preferred embodiment of the present invention, the parallel counter PC includes:
every three bits are sent to a full adder in a group, each full adder has a weight, wherein three inputs are the same weight, the output home bit and Sum send values to the full adder with the same weight, and the carry Cout to the adjacent high bit sends values to the high-level weight; the intermediate result is 'beat' by the same operation, then the full adder with all existing values input is calculated, the intermediate result is beat until the highest bit of the output result is calculated, and finally each arrow output is a bit.
In a preferred embodiment of the present invention, the parallel counter PC further includes:
v denotes the output bit width, and the input bit width N is 2vThe number of full adders consumed per added bit of output becomes 2 times plus v-1, i.e.:
f(v)=2*f(v-1)+v-1
wherein f (v) represents the number of full adders required for the output bit width v, and f (v-1) represents the number of full adders required for the output bit width v-1.
In a preferred embodiment of the invention, the gate level resources consumed by the parallel counter PC are:
g(v)=(2v-v-1)*5=(N-log2 N-1)*5
where N represents the input bit width and v represents the output bit width.
In a preferred embodiment of the present invention, the cumulative parallel counter APC is added to the X-level approximation unit AU before the full adder, and the consumed gate-level resources are:
Figure BDA0003206586520000031
where N represents the input bit width and X represents the number of levels of the approximation unit.
In a preferred embodiment of the present invention, the full adder includes:
two exclusive-or gates, two and gates and one or gate, and the specific logic expression is as follows:
Sum=(A^B)^Cin
Cout=(A&B)|((A^B)&Cin)
a, B, Cin is the input of full adder, which is the added number, and adjacent low-order carry number, Sum, Cout are the output of full adder, which is the carry number of output local Sum and adjacent high-order carry number, the exclusive OR operation is represented, the AND operation is represented, and the OR operation is represented.
In a preferred embodiment of the present invention, step S5 includes the following steps:
s51, randomly selecting I feature points from the key frame set, wherein I is a positive integer greater than or equal to 1, and calculating the pose (alpha, gamma) of the current frame, wherein alpha represents a rotation angle, and gamma is a translation amount;
s52, calculating the reprojection error of the rest key frames according to the pose (alpha, gamma) of the step S51, and if the calculated reprojection error is less than or equal to the set error threshold, the point is a key point;
s53, counting the number of key points and corresponding poses (alpha, gamma);
s54, using the pose (alpha, gamma) of the step S53 as the initial pose value to locally optimize the pose of the current frame, wherein the optimized objective function is as follows:
Figure BDA0003206586520000041
wherein e isxFor the x-th reprojection error observed by the camera, | | · | | represents norm, hxIs the number of observations; o represents the number of re-projections observed by the camera;
and S55, if the optimized key points exceed the set key points, the relocation is considered to be successful.
In a preferred embodiment of the present invention, the method for calculating the pose (α, γ) of the current frame in step S51 is:
Figure BDA0003206586520000042
wherein I represents the total number of the characteristic points in the reference frame;
(Xi,Yi) Representing the position coordinates of the ith feature point in the current frame;
(Xj,Yj) Representing the position coordinates of the j-th characteristic point in the current frame; j is not equal to i;
(Xi′,Yi') indicates the position coordinates of the ith feature point in the reference frame corresponding to the ith feature point in the current frame;
(Xj′,Yj') indicates the position coordinates of the jth characteristic point in the reference frame corresponding to the ith characteristic point in the current frame;
(x0,y0) Represents a reference starting point;
[Xi-x0,Yi-y0]a vector representing an ith feature point in the current frame;
[Xj-x0,Yj-y0]a vector representing a jth feature point in the current frame;
|Xi-x0,Yi-y0| represents a distance value of an ith feature point in the current frame;
|Xj-x0,Yj-y0the distance value of the jth characteristic point in the current frame;
Figure BDA0003206586520000043
representing a rotation error value;
Figure BDA0003206586520000044
in a preferred embodiment of the present invention, the calculation method of the reprojection error of the remaining keyframes in step S52 is:
Figure BDA0003206586520000051
wherein ε represents the equilibrium coefficient;
Figure BDA0003206586520000052
representing the degree of deviation of the pose (alpha, gamma) on the reference frame;
S(α,γ)representing the degree of shift of the pose (α, γ) over the remaining keyframes;
Kkrepresenting the reprojection error of the kth remaining keyframe;
when it is KkTau is less than or equal to tau, and tau represents a set error threshold value, and the selected characteristic points are key points;
when it is KkIf the value is more than tau, the selected characteristic point is not a key point.
In summary, due to the adoption of the technical scheme, the invention has the beneficial effects that:
1) the invention adopts the circuit structure of a parallel counter PC for solving the Hamming distance of two 256-bit characteristic points to form the accelerated calculation in a pipeline form.
2) The invention adds the approximate unit AU in front of the counter to form the circuit structure of the accumulation parallel counter APC for the consumption of the circuit resource, based on the approximate calculation principle, the invention can reduce the consumption of the hardware resource and improve the circuit calculation speed under the condition that the bit stream is longer and a plurality of same structures need to be copied.
3) Aiming at the characteristic that the upper layer of the k-means tree has a coarse clustering granularity and the lower layer has a fine clustering granularity, the upper layer adopts a multi-level approximate unit, and the lower layer adopts one layer or does not use an approximate unit.
4) The independence of distance calculation for each sub-node is designed into k workers, wherein k is the number of clustering clusters, and each Worker has a copy of an accelerating circuit and controls reading input and calculation of respective data in parallel.
5) The invention adopts a dynamic random access memory DRAM to store the characteristic points of the frame Bank, and according to the read-write parallel characteristic of each memory Bank in the DRAM, each sub-node respectively has different banks to improve the data reading speed, and each Worker is responsible for managing one Bank.
Additional aspects and advantages of the invention will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a block diagram of a prior art feature point search tree;
FIG. 2 is an overall block diagram of the present invention;
FIG. 3 is a schematic diagram of a Hamming distance parallel counter circuit according to the present invention (16-bit input is an example);
FIG. 4 is a schematic diagram of a gate level circuit for a full adder;
FIG. 5 is a schematic diagram of a parallel counting circuit employing an approximation unit;
FIG. 6 is a diagram illustrating a storage form of search tree data in the DRAM.
Detailed Description
Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the accompanying drawings are illustrative only for the purpose of explaining the present invention, and are not to be construed as limiting the present invention.
As shown in fig. 1, a search tree is trained offline with all feature points of the frame library, each feature point being represented by a 256-bit descriptor, and the tree nodes being represented by the mean of all descriptors belonging to the class. And finding the belonged class layer by layer during retrieval, namely finding the minimum Hamming distance until the lowest layer. The feature points are feature points in a specific picture, and the nodes refer to nodes in a tree structure and generally represent means belonging to the class.
The invention provides an ORB _ SLAM relocation feature point retrieval accelerating method based on an FPGA, which comprises the following steps:
s1, buffering the input picture and extracting a descriptor;
s2, entering a working space Workspace, and solving the distance of the nodes through a computing circuit;
s3, the results of each calculation circuit flow to a parallel comparison circuit together to find the point where the minimum value is located;
s4, finally, judging whether the node is a bottom layer or not, if so, finishing the search to obtain a final node;
s5, each node has an offset value for finding the address of the child node to obtain the key frame, and then relocates according to the key frame set.
In a preferred embodiment of the present invention, the calculation circuit includes:
firstly, the data passes through an exclusive-OR gate and then passes through an accumulation parallel counter APC or a parallel counter PC;
the accumulated parallel counter APC is added with an X-level approximate unit AU in front of the counter PC;
the approximation unit AU comprises: the first-level AU approximate unit is an AND gate and an OR gate of a column;
the parallel counter PC comprises a plurality of full adders.
In a preferred embodiment of the present invention, the parallel counter PC includes:
every three bits are sent to a full adder in a group, each full adder has a weight, wherein three inputs are the same weight, the output home bit and Sum send values to the full adder with the same weight, and the carry Cout to the adjacent high bit sends values to the high-level weight; the intermediate result is 'beat' by the same operation, then the full adder with all existing values input is calculated, the intermediate result is beat until the highest bit of the output result is calculated, and finally each arrow output is a bit.
In a preferred embodiment of the present invention, the parallel counter PC further includes:
v denotes the output bit width, and the input bit width N is 2vThe number of full adders consumed per added bit of output becomes 2 times plus v-1, i.e.:
f(v)=2*f(v-1)+v-1
wherein f (v) represents the number of full adders required for the output bit width v, and f (v-1) represents the number of full adders required for the output bit width v-1.
In a preferred embodiment of the invention, the gate level resources consumed by the parallel counter PC are:
g(v)=(2v-v-1)*5=(N-log2 N-1)*5
where N represents the input bit width and v represents the output bit width.
In a preferred embodiment of the present invention, the cumulative parallel counter APC is added to the X-level approximation unit AU before the full adder, and the consumed gate-level resources are:
Figure BDA0003206586520000081
where N represents the input bit width and X represents the number of levels of the approximation unit.
In a preferred embodiment of the present invention, the full adder includes:
two exclusive-or gates, two and gates and one or gate, and the specific logic expression is as follows:
Sum=(A^B)^Cin
Cout=(A&B)|((A^B)&Cin)
a, B, Cin is the input of full adder, which is the added number, and adjacent low-order carry number, Sum, Cout are the output of full adder, which is the carry number of output local Sum and adjacent high-order carry number, the exclusive OR operation is represented, the AND operation is represented, and the OR operation is represented.
FIG. 2 is an overall acceleration structure diagram of the present invention, where an input query is a graph with many feature points for search query, and since there are many inputs and the query cannot be sent all at once, a data structure, i.e. a queue with FIFO feature, is used for storage. The input module is a descriptor which is used for buffering the input pictures and is extracted;
workspace is a main working space, wherein each Worker is responsible for parallelly controlling a node distance to solve, each Worker also corresponds to a cache (cache) and a computing circuit (PC or APC), the cache is used for buffering sub-nodes which are accessed once, and because the distance between adjacent feature points is relatively small, some similar access paths can exist; when no data exists in the cache, the DRAM is accessed; the result of each calculation circuit will flow to a parallel comparison circuit together, find out the point where the minimum value is located, judge whether it is the bottom layer at last, if yes, search and finish, each node will have an deviant, is used for looking for and solving the address of the sub-node, then relocate.
DRAM is a storage structure in FPGA, can be understood as the memory in computer, and cache is a cache, and is smaller but has faster access speed than DRAM. The Cache stores a part of data of the DRAM, is small in size, but the access speed of the Cache is higher than that of accessing the DRAM, so that the data can be buffered in the Cache when the DRAM is accessed (due to the small size, some old data can be covered), the Cache can be accessed firstly when the data is accessed every time, and the DRAM is accessed and the data is buffered if corresponding data does not exist. The mechanism is mainly characterized in that the computer has personality, and data accessed recently is likely to be accessed again, so that the data access speed can be improved.
FIG. 3 shows a core computing circuit of the present invention, which employs a parallel counting structure, taking 16-bit input as an example. Each unit is a full adder, combinational logic is adopted, and the specific logic expression is as follows: sum ═ B ^ Cin, Cout ^ a & B | ((a ^ B) & Cin), consuming a total of two exclusive or gates, two and gates and one or gate, as shown in fig. 4.
Let output v denote that the output requires several bits to represent, i.e. the output bit width, and the input bit width N is 2vFrom FIG. 3, it can be seen that for each bit of output increase, the input is doubled; the number of full adders consumed per added bit of output becomes 2 times plus v-1, i.e.:
f(v)=2*f(v-1)+v-1
wherein f (2) ═ 1
Then the gate level resources consumed are:
g(v)=(2v-v-1)*5=(N-log2 N-1)*5
where N represents the input bit width and v represents the number of bits of the result, i.e., the output bit width. Based on the principle of approximate calculation, the APC adds a first-stage AU approximation unit, i.e. a column of and gates and or gates, before the full adder, as shown in fig. 5, so that the calculation result can be obtained with a small error (the experimental verification error is within 5), and the resource consumption can be reduced, where the consumed gate-level resource becomes:
Figure BDA0003206586520000091
similarly, APC adds an X-level AU approximation unit before the full adder, consuming gate-level resources becoming:
Figure BDA0003206586520000092
for the characteristic that the upper layer of the k-means tree has coarse clustering granularity and the lower layer has fine clustering granularity, the upper layer adopts a multi-level approximate unit, and the lower layer adopts one layer or does not use an approximate unit. The nodes of the same hierarchy adopt the calculation circuits with the same structure, and the nodes are parallel.
For 256-bit input, the one-stage (gate-stage) approximation unit can reduce resource consumption by about 50%, and meanwhile, the calculation speed is improved.
After two 256 bits of distance have entered the circuit, the circuit comprises an exclusive or gate before the calculation circuit PC or the calculation circuit APC. Firstly, the exclusive OR is solved by using combinational logic, and the number of 1 solved for the 256 bits after the exclusive OR is the distance between the two bits. The 256 bits after XOR first enter the approximate unit to halve the input, e.g. after one stage AU the input will be equivalent to 128 bits, but the weight of each bit becomes 21. Then using register non-blocking assignment to make one beat of intermediate result on clock rising edge, then feeding every third bit into full adder (firstly feeding weight 2)0Full adders) each having a weight, wherein three inputs are the same weight, Sum of the output sends a value to the full adders of the same level weight, and Cout of the carry sends a value to the higher level weight; the intermediate result is 'beaten' by the same operation, then the full adder with the existing value of the input is calculated, the intermediate result is beaten until the highest bit of the output result is calculated, each arrow of the final output is a bit (0 or 1) as shown in figure 5 by taking 16 bits as an example, the output can be represented by 4 bits, the power of 2 indicates which bit of the 4 bits the arrow points to, the circuit is directly connected, the actual calculation is that 0 or 1 represents a coefficient, and the power of 2 represents a weight and then the weights are accumulated. This forms a pipeline inside the computation circuit, the first descriptor will get the result in the 11 th cycle, and the computation results of one descriptor will be obtained in each cycle. Since the distance calculation with each sub-node is parallelAfter the result is obtained, all the data are sent to a comparison tree, and the minimum value is obtained to determine where the next layer of path goes.
The distance between clusters at the upper layer of the retrieval tree is larger, the difference is smaller and smaller when going downwards, the stage number of the AU calculation unit adopted by each layer is controlled according to the actual condition of the data set, the difference at the upper layer is larger, two to three layers can be adopted, and whether an approximate unit is adopted or not is determined according to the condition of the data set at the bottom layer.
For tree storage inside DRAM, as shown in fig. 6, DRAM shares one I/O control port per Bank, but reads and writes inside each Bank can be done in parallel. And each subnode is stored in a corresponding Bank, and each Worker can read the own Bank in parallel when data does not need to access the DRAM in the cache.
In a preferred embodiment of the present invention, step S5 includes the following steps:
s51, randomly selecting I feature points from the key frame set, wherein I is a positive integer greater than or equal to 1, and calculating the pose (alpha, gamma) of the current frame, wherein alpha represents a rotation angle, and gamma is a translation amount;
s52, calculating the reprojection error of the rest key frames according to the pose (alpha, gamma) of the step S51, and if the calculated reprojection error is less than or equal to the set error threshold, the point is a key point;
s53, counting the number of key points and corresponding poses (alpha, gamma);
s54, using the pose (alpha, gamma) of the step S53 as the initial pose value to locally optimize the pose of the current frame, wherein the optimized objective function is as follows:
Figure BDA0003206586520000111
wherein e isxFor the x-th reprojection error observed by the camera, | | · | | represents norm, hxIs the number of observations; o represents the number of re-projections observed by the camera;
and S55, if the optimized key points exceed the set key points, the relocation is considered to be successful.
In a preferred embodiment of the present invention, the method for calculating the pose (α, γ) of the current frame in step S51 is:
Figure BDA0003206586520000112
wherein I represents the total number of the characteristic points in the reference frame;
(Xi,Yi) Representing the position coordinates of the ith feature point in the current frame;
(Xj,Yj) Representing the position coordinates of the j-th characteristic point in the current frame; j is not equal to i;
(Xi′,Yi') indicates the position coordinates of the ith feature point in the reference frame corresponding to the ith feature point in the current frame;
(Xj′,Yj') indicates the position coordinates of the jth characteristic point in the reference frame corresponding to the ith characteristic point in the current frame;
(x0,y0) Represents a reference starting point;
[Xi-x0,Yi-y0]a vector representing an ith feature point in the current frame;
[Xj-x0,Yj-y0]a vector representing a jth feature point in the current frame;
|Xi-x0,Yi-y0| represents a distance value of an ith feature point in the current frame;
|Xj-x0,Yj-y0the distance value of the jth characteristic point in the current frame;
Figure BDA0003206586520000121
representing a rotation error value;
Figure BDA0003206586520000122
in a preferred embodiment of the present invention, the calculation method of the reprojection error of the remaining keyframes in step S52 is:
Figure BDA0003206586520000123
wherein ε represents the equilibrium coefficient;
Figure BDA0003206586520000124
representing the degree of deviation of the pose (alpha, gamma) on the reference frame;
S(α,γ)representing the degree of shift of the pose (α, γ) over the remaining keyframes;
Kkrepresenting the reprojection error of the kth remaining keyframe;
when it is KkTau is less than or equal to tau, and tau represents a set error threshold value, and the selected characteristic points are key points;
when it is KkIf the value is more than tau, the selected characteristic point is not a key point.
While embodiments of the invention have been shown and described, it will be understood by those of ordinary skill in the art that: various changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the invention, the scope of which is defined by the claims and their equivalents.

Claims (8)

1. An ORB _ SLAM relocation feature point retrieval acceleration method based on FPGA is characterized by comprising the following steps:
s1, buffering the input picture and extracting a descriptor;
s2, entering a working space Workspace, and solving the distance of the nodes through a computing circuit;
s3, the results of each calculation circuit flow to a parallel comparison circuit together to find the point where the minimum value is located;
s4, finally, judging whether the node is a bottom layer or not, if so, finishing the search to obtain a final node;
s5, each node has an offset value for finding the address of the child node to obtain the key frame, and then relocates according to the key frame set.
2. The FPGA-based ORB-SLAM relocation feature point retrieval acceleration method according to claim 1, wherein the calculation circuit comprises:
firstly, the data passes through an exclusive-OR gate and then passes through an accumulation parallel counter APC or a parallel counter PC;
the accumulated parallel counter APC is added with an X-level approximate unit AU in front of the counter PC;
the approximation unit AU comprises: the first-level AU approximate unit is an AND gate and an OR gate of a column;
the parallel counter PC comprises a plurality of full adders.
3. The FPGA-based ORB _ SLAM relocation feature point retrieval acceleration method according to claim 2, wherein the parallel counter PC comprises:
every three bits are sent to a full adder in a group, each full adder has a weight, wherein three inputs are the same weight, the output home bit and Sum send values to the full adder with the same weight, and the carry Cout to the adjacent high bit sends values to the high-level weight; the intermediate result is 'beat' by the same operation, then the full adder with all existing values input is calculated, the intermediate result is beat until the highest bit of the output result is calculated, and finally each arrow output is a bit.
4. The FPGA-based ORB _ SLAM relocation feature point retrieval acceleration method according to claim 2, wherein the parallel counter PC further comprises:
v denotes the output bit width, and the input bit width N is 2vThe number of full adders consumed per added bit of output becomes 2 times plus v-1, i.e.:
f(v)=2*f(v-1)+v-1
wherein f (v) represents the number of full adders required for the output bit width v, and f (v-1) represents the number of full adders required for the output bit width v-1.
5. The FPGA-based ORB _ SLAM relocation feature point retrieval acceleration method according to claim 2, wherein gate-level resources consumed by the parallel counter PC are:
g(v)=(2v-v-1)*5=(N-log2 N-1)*5
where N represents the input bit width and v represents the output bit width.
6. The method of claim 2, wherein the cumulative parallel counter APC is added with the X-level approximation unit AU before the full adder, and the consumed gate-level resources are:
Figure FDA0003206586510000021
where N represents the input bit width and X represents the number of levels of the approximation unit.
7. The FPGA-based ORB _ SLAM relocation feature point retrieval acceleration method according to claim 2, wherein the full adder comprises:
two exclusive-or gates, two and gates and one or gate, and the specific logic expression is as follows:
Sum=(A^B)^Cin
Cout=(A&B)|((A^B)&Cin)
a, B, Cin is the input of full adder, which is the added number, and adjacent low-order carry number, Sum, Cout are the output of full adder, which is the carry number of output local Sum and adjacent high-order carry number, the exclusive OR operation is represented, the AND operation is represented, and the OR operation is represented.
8. The method for accelerating the retrieval of ORB _ SLAM relocation feature points based on FPGA of claim 1, wherein the step S5 comprises the following steps:
s51, randomly selecting I feature points from the key frame set, wherein I is a positive integer greater than or equal to 1, and calculating the pose (alpha, gamma) of the current frame, wherein alpha represents a rotation angle;
s52, calculating the reprojection error of the rest key frames according to the pose (alpha, gamma) of the step S51, and if the calculated reprojection error is less than or equal to the set error threshold, the point is a key point;
s53, counting the number of key points and corresponding poses (alpha, gamma);
s54, using the pose (alpha, gamma) of the step S53 as the initial pose value to locally optimize the pose of the current frame, wherein the optimized objective function is as follows:
Figure FDA0003206586510000031
wherein e isxFor the x-th reprojection error observed by the camera, | | · | | represents norm, hxIs the number of observations; o represents the number of re-projections observed by the camera;
and S55, if the optimized key points exceed the set key points, the relocation is considered to be successful.
CN202110918561.XA 2021-08-11 2021-08-11 ORB-SLAM relocation feature point retrieval acceleration method based on FPGA Active CN113536024B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110918561.XA CN113536024B (en) 2021-08-11 2021-08-11 ORB-SLAM relocation feature point retrieval acceleration method based on FPGA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110918561.XA CN113536024B (en) 2021-08-11 2021-08-11 ORB-SLAM relocation feature point retrieval acceleration method based on FPGA

Publications (2)

Publication Number Publication Date
CN113536024A true CN113536024A (en) 2021-10-22
CN113536024B CN113536024B (en) 2022-09-09

Family

ID=78091542

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110918561.XA Active CN113536024B (en) 2021-08-11 2021-08-11 ORB-SLAM relocation feature point retrieval acceleration method based on FPGA

Country Status (1)

Country Link
CN (1) CN113536024B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019057179A1 (en) * 2017-09-22 2019-03-28 华为技术有限公司 Visual slam method and apparatus based on point and line characteristic
CN109919825A (en) * 2019-01-29 2019-06-21 北京航空航天大学 A kind of ORB-SLAM hardware accelerator
CN110070580A (en) * 2019-03-29 2019-07-30 南京华捷艾米软件科技有限公司 Based on the matched SLAM fast relocation method of local key frame and image processing apparatus
CN110782494A (en) * 2019-10-16 2020-02-11 北京工业大学 Visual SLAM method based on point-line fusion
CN111583093A (en) * 2020-04-27 2020-08-25 西安交通大学 Hardware implementation method for ORB feature point extraction with good real-time performance
CN111795704A (en) * 2020-06-30 2020-10-20 杭州海康机器人技术有限公司 Method and device for constructing visual point cloud map
CN112381890A (en) * 2020-11-27 2021-02-19 上海工程技术大学 RGB-D vision SLAM method based on dotted line characteristics
CN112991447A (en) * 2021-03-16 2021-06-18 华东理工大学 Visual positioning and static map construction method and system in dynamic environment
CN113160130A (en) * 2021-03-09 2021-07-23 北京航空航天大学 Loop detection method and device and computer equipment

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019057179A1 (en) * 2017-09-22 2019-03-28 华为技术有限公司 Visual slam method and apparatus based on point and line characteristic
CN109919825A (en) * 2019-01-29 2019-06-21 北京航空航天大学 A kind of ORB-SLAM hardware accelerator
CN110070580A (en) * 2019-03-29 2019-07-30 南京华捷艾米软件科技有限公司 Based on the matched SLAM fast relocation method of local key frame and image processing apparatus
CN110782494A (en) * 2019-10-16 2020-02-11 北京工业大学 Visual SLAM method based on point-line fusion
CN111583093A (en) * 2020-04-27 2020-08-25 西安交通大学 Hardware implementation method for ORB feature point extraction with good real-time performance
CN111795704A (en) * 2020-06-30 2020-10-20 杭州海康机器人技术有限公司 Method and device for constructing visual point cloud map
CN112381890A (en) * 2020-11-27 2021-02-19 上海工程技术大学 RGB-D vision SLAM method based on dotted line characteristics
CN113160130A (en) * 2021-03-09 2021-07-23 北京航空航天大学 Loop detection method and device and computer equipment
CN112991447A (en) * 2021-03-16 2021-06-18 华东理工大学 Visual positioning and static map construction method and system in dynamic environment

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
AYOUB MAMRI等: "ORB-SLAM accelerate on heterogeneous parallel architectures", 《E3S WEB OF CONFERENCES 229,01055(2021)》 *
WEIKANG FANG等: "FPGA-based ORB feature extraction for real-time visual SLAM", 《2017 INTERNATIONAL CONFERENCE ON FIELD PROGRAMMABLE TECHNOLOGY(ICFPT)》 *
唐醅林: "基于ORB-SLAM的特征匹配与建图方法研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
张超凡: "基于多目视觉与惯导融合的SLAM方法研究", 《中国优秀博硕士学位论文全文数据库(博士)信息科技辑》 *

Also Published As

Publication number Publication date
CN113536024B (en) 2022-09-09

Similar Documents

Publication Publication Date Title
Wang et al. Suppressing model overfitting in mining concept-drifting data streams
US9208374B2 (en) Information processing apparatus, control method therefor, and electronic device
He et al. Queryprop: Object query propagation for high-performance video object detection
Mayer et al. Hype: Massive hypergraph partitioning with neighborhood expansion
Zhu Dynamic feature pyramid networks for object detection
CN111160461A (en) Fuzzy clustering-based weighted online extreme learning machine big data classification method
US20230161811A1 (en) Image search system, method, and apparatus
CN111597230A (en) Parallel density clustering mining method based on MapReduce
CN107426315B (en) Distributed cache system Memcached improvement method based on BP neural network
CN112906865A (en) Neural network architecture searching method and device, electronic equipment and storage medium
Zhang et al. COLIN: a cache-conscious dynamic learned index with high read/write performance
CN115795065A (en) Multimedia data cross-modal retrieval method and system based on weighted hash code
CN109818971B (en) Network data anomaly detection method and system based on high-order association mining
US9135984B2 (en) Apparatuses and methods for writing masked data to a buffer
CN113536024B (en) ORB-SLAM relocation feature point retrieval acceleration method based on FPGA
Li et al. Multi-scale global context feature pyramid network for object detector
Sun Personalized music recommendation algorithm based on spark platform
CN108897847A (en) Multi-GPU Density Peak Clustering Method Based on Locality Sensitive Hashing
US10997497B2 (en) Calculation device for and calculation method of performing convolution
Ding et al. An Error-Bounded Space-Efficient Hybrid Learned Index with High Lookup Performance
Beutel et al. A machine learning approach to databases indexes
Huang et al. Unsupervised fusion feature matching for data bias in uncertainty active learning
Kargar et al. E2-NVM: A Memory-Aware Write Scheme to Improve Energy Efficiency and Write Endurance of NVMs using Variational Autoencoders.
Lovagnini et al. CIRCE: Real-time caching for instance recognition on cloud environments and multi-core architectures
Lee et al. StaleLearn: Learning acceleration with asynchronous synchronization between model replicas on PIM

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant