CN114820273A - Data processing device and method for ORB acceleration, chip and electronic equipment - Google Patents

Data processing device and method for ORB acceleration, chip and electronic equipment Download PDF

Info

Publication number
CN114820273A
CN114820273A CN202210420445.XA CN202210420445A CN114820273A CN 114820273 A CN114820273 A CN 114820273A CN 202210420445 A CN202210420445 A CN 202210420445A CN 114820273 A CN114820273 A CN 114820273A
Authority
CN
China
Prior art keywords
target image
image
block
memory
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210420445.XA
Other languages
Chinese (zh)
Inventor
李思旭
高洋
杭蒙
刘浩敏
章国锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Sensetime Technology Development Co Ltd
Original Assignee
Shanghai Sensetime Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Sensetime Technology Development Co Ltd filed Critical Shanghai Sensetime Technology Development Co Ltd
Priority to CN202210420445.XA priority Critical patent/CN114820273A/en
Publication of CN114820273A publication Critical patent/CN114820273A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/20Processor architectures; Processor configuration, e.g. pipelining
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T1/00General purpose image data processing
    • G06T1/60Memory management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the disclosure provides a data processing device and method for ORB acceleration, a chip and an electronic device, which are used for generating descriptors of feature points in a target image, wherein the device comprises: the target image block generation module is used for reading image blocks from the memory and determining a target image block for calculating the descriptor from the read image blocks; the memory comprises a plurality of memory blocks, each memory block comprises a plurality of memory addresses, each memory address of the same memory block is used for storing a plurality of columns in the same row of the target image, and the same memory address of each memory block is used for storing different columns in the same row of the target image; and the descriptor generation module is used for generating descriptors of the feature points in the target image block.

Description

Data processing device and method for ORB acceleration, chip and electronic equipment
Technical Field
The present disclosure relates to the field of chip technologies, and in particular, to a data processing apparatus and method for ORB acceleration, a chip, and an electronic device.
Background
An organized FAST and rotated BRIEF (ORB) is a widely used feature extraction and description scheme, and is widely used in tasks such as Simultaneous Localization and Mapping (SLAM), and can greatly improve the accuracy of tasks such as SLAM. Hardware accelerators are often used in the related art to implement fast operations of ORBs, but these hardware accelerators still have some problems to be solved.
Disclosure of Invention
In a first aspect, the present disclosure provides a data processing apparatus for ORB acceleration, configured to generate descriptors of feature points in a target image, where the data processing apparatus for ORB acceleration includes:
the target image block generation module is used for reading image blocks from the memory and determining a target image block for calculating the descriptor from the read image blocks; the memory comprises a plurality of memory blocks, each memory block comprises a plurality of memory addresses, each memory address of the same memory block is used for storing a plurality of columns in the same row of the target image, and the same memory address of each memory block is used for storing different columns in the same row of the target image;
and the descriptor generation module is used for generating descriptors of the feature points in the target image block.
Each storage address of the same storage block in the memory of the embodiment of the disclosure is used for storing a plurality of columns of the same row in the target image, and the same storage address of each storage block is used for storing different columns of the same row in the target image. In addition, the storage mode can realize the reading of the target image block by only adopting one memory, thereby reducing the cost and the volume of a data processing device for ORB acceleration.
In a second aspect, an embodiment of the present disclosure provides an ORB-accelerated data processing apparatus for generating descriptors of feature points in a target image, the ORB-accelerated data processing apparatus including:
the angle calculation module is used for calculating the rotation angle of the characteristic point;
a descriptor generation module for generating a descriptor of the feature point based on the rotation angle;
the rotation angle is calculated based on a centroid of a target image block where the feature point is located, wherein the centroid comprises a first image moment of the target image block in a row direction of the target image and a second image moment of the target image block in a column direction of the target image; the angle calculation module includes:
a first calculation unit configured to obtain a sum of squares of the first image moment and the second image moment;
a floating point conversion unit for converting the first image moment, the second image moment and the sum of squares into a floating point number;
a second calculation unit for calculating an inverse of an arithmetic square root of the converted sum of squares;
a third calculation unit configured to calculate a rotation angle based on the converted first image moment, the converted second image moment, and a reciprocal of the arithmetic square root;
and the fixed point conversion unit is used for converting the rotation angle into fixed point numbers and outputting the fixed point numbers to the descriptor generation module.
Two image moments are pixel sums and are therefore integral numbers, and the bit width of the fixed point output in the last floating point to fixed point stage is also a fixed value, so that the floating point unit in the module is not complete, and the area and the delay are saved. The process of calculating the reciprocal of the arithmetic square root of the converted sum of squares by the second calculation unit can be realized by operations such as shifting on hardware, so that the special property of floating point numbers is used in core calculation, and the mode of performing conversion by using fixed points on the outer layer can simultaneously improve the calculation precision and the calculation speed and reduce the hardware area.
In a third aspect, an embodiment of the present disclosure provides a chip including the data processing apparatus for ORB acceleration according to any embodiment of the present disclosure.
In a fourth aspect, an embodiment of the present disclosure provides an electronic device, where the electronic device includes the chip according to any embodiment of the present disclosure.
In a fifth aspect, an embodiment of the present disclosure provides a data processing method for ORB acceleration, configured to generate descriptors of feature points in a target image, the method including:
reading image blocks from a memory, and determining a target image block for calculating a descriptor from the read image blocks; the memory comprises a plurality of memory blocks, each memory block comprises a plurality of memory addresses, each memory address of the same memory block is used for storing a plurality of columns in the same row of the target image, and the same memory address of each memory block is used for storing different columns in the same row of the target image;
and generating descriptors of the feature points in the target image block.
In a sixth aspect, the present disclosure provides a data processing method for ORB acceleration, configured to generate descriptors of feature points in a target image, the method including:
acquiring the sum of squares of a first image moment and a second image moment of a target image block; the first image moment is the image moment of the target image block in the row direction of the target image, and the second image moment is the image moment of the target image block in the column direction of the target image;
converting the first image moment, the second image moment, and the sum of squares to floating point numbers;
calculating the reciprocal of the arithmetic square root of the converted sum of squares;
calculating a rotation angle based on the converted first image moment, the converted second image moment, and a reciprocal of the arithmetic square root;
converting the rotation angle into fixed point number;
generating a descriptor of the feature point based on the converted rotation angle.
In a seventh aspect, the embodiments of the present disclosure provide a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements the method according to any of the embodiments of the present disclosure.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and, together with the description, serve to explain the principles of the disclosure.
Fig. 1A is a process flow diagram of the ORB algorithm in the related art.
Fig. 1B is a schematic diagram of a specific process of determining the feature points.
Fig. 2 is a schematic diagram of a data processing apparatus for ORB acceleration according to an embodiment of the present disclosure.
Fig. 3 is a schematic diagram of a three-stage NMS.
Fig. 4 is a schematic diagram of a storage manner of the target image in the memory in the process of determining the feature points.
Fig. 5 is a schematic diagram of a scanning process of a target image block.
Fig. 6 and 7 are schematic diagrams of pipelines of respective modules in the data processing apparatus for ORB acceleration, respectively.
Fig. 8 is a schematic structural diagram of the target image block generation module.
Fig. 9 is a schematic structural diagram of a calculation module.
Fig. 10 is a schematic diagram of the structure of the comparison unit in the calculation module.
FIG. 11 is a flow chart of the window filtering stage.
Fig. 12 is a schematic structural diagram of a window filtering module.
Fig. 13 is a schematic structural diagram of a mesh filter module.
Fig. 14 is a schematic diagram of a data processing apparatus for ORB acceleration that includes two processing cores.
Fig. 15 is a schematic diagram of a data processing apparatus for ORB acceleration according to another embodiment of the present disclosure.
Fig. 16 is a schematic diagram of a data processing apparatus for ORB acceleration according to still another embodiment of the present disclosure.
Fig. 17 is a schematic diagram of a storage manner of the target image in the memory in the process of generating the descriptor.
Fig. 18 and 19 are schematic structural views of the angle calculation module, respectively.
Fig. 20 is a schematic diagram of a data processing apparatus for ORB acceleration including a plurality of PEs.
Fig. 21 is a schematic diagram of a data processing apparatus for ORB acceleration according to a fourth embodiment of the present disclosure.
Fig. 22 is a schematic diagram of a data processing method for ORB acceleration according to an embodiment of the present disclosure.
Fig. 23 is a schematic diagram of a data processing method for ORB acceleration according to another embodiment of the present disclosure.
Fig. 24 is a schematic diagram of a data processing method for ORB acceleration according to still another embodiment of the present disclosure.
Fig. 25 is a schematic diagram of a data processing method for ORB acceleration according to a fourth embodiment of the present disclosure.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below do not represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.
The terminology used in the disclosure is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. As used in this disclosure and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
In order to make the technical solutions in the embodiments of the present disclosure better understood and make the above objects, features and advantages of the embodiments of the present disclosure more comprehensible, the technical solutions in the embodiments of the present disclosure are described in further detail below with reference to the accompanying drawings.
The ORB is a widely applied feature extraction and description scheme, and the operation of the ORB is divided into three parts, namely feature point extraction, Non-Maximum Suppression (NMS) and descriptor (descriptor) generation. The feature point extracting part may use fast (features from accessed Segment test) algorithm for primarily screening feature points, the NMS is configured to filter out a part of poor feature points from the primarily screened feature points, and the descriptor generating part is configured to generate description information (i.e., a descriptor) describing the feature points for the specified feature points. The processing flow of the ORB algorithm is described below with reference to fig. 1A.
The core idea of the FAST algorithm is that if the difference between the pixel values of N continuous points around a certain pixel and the pixel is greater than a threshold, the pixel is regarded as a feature point. Wherein, N is a positive integer, the value of N can be 9, 11 or 12, and the corresponding FAST algorithms are respectively called FAST-9 algorithm, FAST-11 algorithm and FAST12 algorithm. The FAST-9 algorithm will be described below as an example.
For a given target image, the FAST algorithm first determines a target image block (e.g., a7 × 7 image region centered on a given pixel point (also referred to as a candidate feature point), referred to herein as Patch. From the target image block, a number of points on the circumference around the candidate feature point may be extracted to obtain a pixel sequence. As shown in fig. 1B, the pixel sequence includes 16 points on the circumference around the candidate feature point, and the numerical value in the figure represents the serial number of each point in the pixel sequence. In this example, the sequence of pixels may be denoted as I [1:16 ]. Assuming that the pixel value of the candidate feature point is p and the threshold value is t, the FAST algorithm may include the following steps:
(1) calculating a difference value d [ I ] between each pixel point in the image sequence and the pixel value of the candidate feature point, wherein d [ I ] is p-I [ I ], and I is more than or equal to 0 and less than or equal to 15;
(2) d is classified. If d [ i ] > t, marking M2[ i ] ═ 1; d [ i ] < -t, then, M1[ i ] ═ 1. Therefore, the pixel points on the circumference corresponding to the candidate characteristic points can be classified into 3 types: the pixel point with the pixel value obviously larger than the pixel value (satisfying d [ i ] > t) of the candidate characteristic point, the pixel point with the pixel value obviously smaller than the pixel value (satisfying d [ i ] < -t) of the candidate characteristic point, and the pixel point without obvious difference (neither satisfying d [ i ] > t nor satisfying d [ i ] < -t) with the pixel value of the candidate characteristic point.
(3) 8 pixels are used for fast screening (Quick Judge). The fast screening was performed using the following formula:
Figure BDA0003606600670000051
wherein "or" represents a logical operation "or" and "represents a logical operation" and ".
(4) If a pixel sequence corresponding to a candidate feature point meets one of the two conditions Mask1F and Mask2F, that is, one of Mask1F and Mask2F is 1, further screening (Detail join) is performed, and if not, the candidate feature point is not considered as a feature point. And further screening to traverse 9 continuous points on the circumference, and checking whether the 9 continuous points meet the condition Mask1F or the condition Mask2F, if so, regarding the candidate feature point as the feature point, otherwise, regarding the candidate feature point as not the feature point.
The FAST algorithm is divided into two stages by the steps, and the step screening property enables the FAST to have remarkable advantages in speed compared with other characteristic point algorithms. Further, in order to ensure that feature points of different scales are extracted, feature point extraction may be performed on a multi-layer pyramid.
After a certain point is regarded as a feature point, the feature response of the feature point can be calculated, because when the number of feature points is too large, the feature points need to be screened, and the better feature points are selected. There are many methods for calculating the feature response, and the method for calculating the feature response in OpenCV is described as an example below. First, two intermediate quantities a0 and B0 may be calculated:
A0=max(AMp,AMn);
wherein,
Figure BDA0003606600670000052
B0=min(BMp,BMn)
wherein,
Figure BDA0003606600670000053
then, the feature response Score can be calculated from a0 and B0:
Score=-min(B0,-max(A0,t))-1。
after the feature response is calculated, local non-maxima suppression is required. This step can be implemented by using a sliding window, for example, the size of the sliding window is 3x3, the image area around the feature point is 3x3, and if the pixel point with the largest feature response in the 3x3 area is different from the feature point, the feature response of the feature point is set to 0. This may be done to eliminate some locally poor feature points.
The process of generating the descriptor can be realized based on an rBRIEF (Rotated Binary Robust Independent elements) algorithm. In order to ensure that the descriptors of the feature points are unchanged after the image is rotated, the descriptor pairs for generating the descriptors need to be rotated according to the direction of the centroid. A descriptor of a feature point contains key information needed to identify the feature point. A256-bit descriptor may contain 256 descriptor pairs, denoted P [ i,1:4], where 1 ≦ i ≦ 256. Let I [ j, k ] be Img [ x + j, y + k ], the centroid is calculated as follows:
Figure BDA0003606600670000061
where, (x, y) is the pixel coordinate of the central pixel point of a target image block for calculating a descriptor, Img represents the pixel value, j and k are variables, assuming that the size of the target image block is 32 × 32, d is a circle in the target image block, and d is [15,15,15,15,14,14,14,13,13,12,11,10,9,8,6,3 ═ d]。m 10 And m 01 Also known as image moments.
In the formation of m 10 And m 01 Then, the rotation amount can be calculated according to the following formula:
Figure BDA0003606600670000062
then, the descriptor pair may be shifted based on the rotation amount, and the shifted descriptor pair is denoted as:
Figure BDA0003606600670000063
and the scale invariance of the descriptor can be ensured by calculating the rotated descriptor pair.
For a feature point with coordinates (x, y), assume that its ith point pair is:
Figure BDA0003606600670000064
then the descriptor of the feature point can be described as the following formula:
Figure BDA0003606600670000065
for descriptors, feature points extracted in different layers all need to be computed in the highest resolution layer.
In order to improve the processing efficiency of the ORB algorithm, an accelerator is generally used in the related art to perform fast operation of the ORB, and the ORB accelerator in the related art generally includes a feature extraction module and a descriptor generation module, where the feature extraction module is configured to extract features from an image, and the descriptor generation module is configured to generate a descriptor for feature points extracted by the feature extraction module. There are still some problems to be solved in the accelerator in the related art.
(1) And calculating NMS. In the related art, only the feature extraction module of the ORB accelerator realizes simplest 3 × 3 window-type non-maximum suppression, but the too small window results in too many feature points being generated, and the suppression effect is not obvious. Moreover, the feature points suitable for the SLAM task need to be distributed on the image as much as possible, and the feature points concentrated on the local area will seriously affect the accuracy of the SLAM system, so the feature points obtained by using the window of 3 × 3 are not suitable for SLAM tracking.
(2) In the related art, the efficiency of the feature extraction module of the ORB accelerator in extracting the feature response is low, resulting in low feature point extraction efficiency.
(3) In the related art, the descriptor generation module of the ORB accelerator fails to solve the problem of data conflict of descriptor calculation of different feature points, and the descriptor calculation efficiency is low.
(4) In order to ensure that the descriptors of the feature points are unchanged after the image is rotated, before generating the feature descriptors, the descriptor generation module needs to calculate the angles of the feature points. In this step, the conventional scheme is to use a cordic algorithm for fitting, and is limited by the influence of the cordic algorithm itself, and the generated angle has the limitation of convergence domain and precision.
Based on this, the present disclosure provides a data processing apparatus, a chip, and an electronic device for ORB acceleration to solve at least one technical problem above.
Example one
As shown in fig. 2, the data processing apparatus for ORB acceleration in an embodiment of the present disclosure includes:
the window filtering module 201 is configured to perform non-maximum suppression processing on the feature response of each candidate feature point of the target image in the sliding window to obtain a plurality of first feature points with feature responses greater than zero; the target image is divided into a plurality of grids;
the mesh filtering module 202 is configured to perform non-maximum suppression processing on the feature response of the first feature point in each mesh of the multiple meshes to obtain multiple second feature points with feature responses greater than zero;
and the image filtering module 203 is configured to filter the second feature points in the multiple adjacent grids to obtain multiple third feature points with feature responses greater than zero.
The feature point filtering operation performed by the window filtering module 201 may be referred to as window filtering or window suppression. In the window filtering module 201, a sliding window may be adopted to slide on the target image according to a preset sliding step, and the size of the sliding window may be 3x3, 5x5, and the like. The sliding window can cover a group of feature points on the target image every time the sliding window slides once, and non-maximum suppression processing can be performed on the group of feature points covered by the sliding window at each sliding position. Specifically, it may be determined whether the feature response of the central feature point located in the center of the sliding window in the current coverage area satisfies the window filtering condition, if so, the feature response of the central feature point is retained, otherwise, the feature response of the central feature point is set to zero. The window filtering conditions are as follows: and the characteristic responses of other characteristic points except the central characteristic point in the current coverage range of the sliding window are smaller than the characteristic response of the central characteristic point.
Fig. 3 illustrates an embodiment of window filtering, where each rectangular box represents a feature point on the target image, the numbers inside the box represent the feature response of the corresponding feature point, and the grey dashed box represents a sliding window. For simplicity, only the coverage when the sliding window slides to two positions (denoted as P1 and P2, respectively) on the target image is shown, and the feature response of each feature point in the target image before filtering is shown as array a. It can be seen that when the sliding window is at the position P1, the central feature point is the feature point at row 1 and column 1 in the target image, and the feature response of the feature point does not satisfy the window filtering condition, so the feature response of the feature point at row 1 and column 1 in the target image is set to 0. Similarly, when the sliding window is at the position P2, the central feature point is the feature point at row 5 and column 2 in the target image, and the feature response of the feature point is greater than the feature responses of other feature points in the current coverage range of the sliding window, so that the feature responses of the feature points at row 2 and column 2 in the target image are retained. In this way, the feature response of each feature point in the target image can be filtered, and the feature response of each feature point in the filtered target image is shown as an array b.
Each feature point having a feature response greater than 0 remaining after suppression by the window may be referred to as a first feature point. After the window suppression operation, the number of the remaining feature points is still too large, so that further filtering may be performed by the mesh filtering module 202, and the feature point filtering operation performed by the mesh filtering module 202 may be referred to as mesh filtering or mesh suppression.
The target image may be divided into a plurality of grids (cells) in advance, and one grid may include an image area of m rows and n columns in the target image, and the size of the grid is larger than that of the sliding window. In some embodiments, assuming that the size of the target image is 640x400, an image area of a block of 32x20 may be divided into one grid, for a total of 20x20 grids. For convenience of description, the following embodiments are described by taking the size of the target image as 640x400 and the size of the grid as 32x20 as examples, however, it can be understood by those skilled in the art that the size of the target image and the manner of dividing the grid are not limited to the cases described in the above embodiments.
Still referring to fig. 3, for ease of distinction, different grids are represented in different colors. The figure is divided into 4 grids, each grid having a size of 4x4, which are a black area, a dark gray area, a light gray area, and a white area in the figure. When the grid filtering is performed, the non-maximum suppression can be performed on the feature response of each feature point in the same grid, only the maximum feature response in the grid is reserved, and the feature responses except the maximum feature response are set to zero. For example, in the grid shown by the black area of the array b, two characteristic responses "7" and "9" are included, and therefore, the characteristic response "7" may be set to zero, and the characteristic response "9" may be retained. Similarly, the mesh filtering results in the other three meshes can be obtained, the mesh filtering result of the whole target image is shown as an array c in fig. 3, and each feature point whose feature response after the mesh filtering is greater than zero can be referred to as a second feature point.
The number of feature points after the mesh filtering is still large, so that the feature points can be further filtered by the image filtering module 203, and the filtering operation performed by the image filtering module 203 may be referred to as image-level filtering. The image-level filtering may be performed in units of a plurality of adjacent grids, for example, for a certain grid (referred to as grid GA), 8 grids (referred to as grid GB) adjacent to the periphery of the grid may be obtained. For any one feature point rA in the grid GA, if a feature point rB meeting the following condition exists in any one grid GB, the feature response of the feature point rA is set to zero: the characteristic response of rB is greater than that of rA, and the distance between rB and rA is less than the preset distance. Still referring to fig. 3, assuming that the distance between the feature point in the dark gray grid in the target image (i.e., the feature point whose feature response is "9") and the feature point in the white grid (i.e., the feature point whose feature response is "15") is less than the preset distance, the feature response "9" may be set to zero, and the image-level filtering result is shown as an array d in fig. 3.
The data processing apparatus in this embodiment is used to improve a feature extraction module of an ORB accelerator in the related art, and the embodiment of the present disclosure can obtain the following technical effects by adopting the three-stage filtering manner (window filtering, mesh filtering, and image-level filtering):
first, as described above, in the related art, local NMS only performing 3 × 3 is not enough to screen dense feature points and is not suitable for SLAM tracking, and the embodiment of the present disclosure adds grid filtering and image-level filtering operations on the basis of window suppression, and can filter out more feature points, so that the feature points are dispersed on an image, and the obtained feature points are suitable for SLAM tracking.
Secondly, an NMS solution in the related art is to find a point with the largest response among current feature points by sorting, select the feature point, then create a Mask around the feature point, and screen out the points covered by the Mask. And repeating the process, and then obtaining the screened feature points. The problem of this scheme is that the scheme belongs to a pure serial scheme, and meanwhile, the creation of the Mask needs a large amount of distance calculation. For some pictures with dense feature points, sudden large delay is caused, and the sudden large delay in the SLAM system needs to be avoided. Meanwhile, the pure serial process is not beneficial to hardware implementation.
The filtering scale of the three-stage filtering mode adopted by the method is gradually increased, most of the characteristic points which do not meet the requirements can be filtered out through the sliding window, and due to the fact that the scale of the sliding window is small, the times of sequencing and comparison are few when the window is filtered every time; after window filtering, screening a part of feature points which do not meet the conditions, thereby greatly reducing the times of sequencing and comparison when carrying out grid filtering with larger scale; similarly, since the number of feature points is further reduced by the mesh filtering operation, the number of times of sorting and comparing is greatly reduced when image-level filtering is performed. Therefore, by the mode of gradually reducing the number of the feature points, the times of sequencing and comparison in the filtering process can be effectively reduced under the condition of reducing the number of the feature points, the efficiency of the filtering process is improved, and the hardware implementation is facilitated.
In addition, the time consumption of window filtering and grid filtering is related to the number of pixels of the target image and is a fixed value, and the time consumption of image-level filtering has a maximum value due to the fixed grid size, and is still kept at a low level by being limited by the number of grids. Therefore, the three-stage filtering method has low processing time stability and delay and is suitable for application scenes with high real-time requirements.
In some embodiments, the data processing apparatus for ORB acceleration further includes a calculation module, configured to calculate a feature response of a feature point in each target image block in the target image, and output the feature response of the feature point in the target image block to the window filtering module. The target image block is a minimum image unit for calculating the feature response, and as described in the previous embodiment, a7 × 7 image area may be used as a target image block, which may be referred to as Patch.
In some embodiments, the target image may comprise a plurality of image blocks; the data processing apparatus for ORB acceleration further includes: and the target image block generating module is used for acquiring the image blocks in the target image from a memory and generating the target image blocks for calculating the characteristic response based on the image blocks. The generated target image blocks may be fed to a computation module for computing the feature responses. The size of one image Block (Block) may be greater than or equal to the size of the one image Block, for example, in the case that the size of the target image Block is 7x7, the size of the image Block may be 8x8, so that a plurality of target image blocks can be obtained from one image Block at a time, and data multiplexing is achieved.
In some embodiments, the data processing apparatus for ORB acceleration may further include an address generation module configured to generate storage addresses of the respective image blocks in the memory. The address generation module may scan an image block from the memory based on the storage address of the data block and output the scanned image block to the target image block generation module.
In order to ensure efficient reading of the target image, the storage mode of the target image can be improved. In some embodiments, the memory for storing the target image may include a plurality of memory blocks, each memory block including a plurality of memory addresses; different rows in the same image block are stored in the same storage address of different storage blocks respectively, and corresponding rows in at least two image blocks are stored in the same storage block.
The following description is given by taking the target image as 640x400 in size, one image block as 8x8 in size, and one target image block as 7x7 in size, and only the first 16 rows and the first 24 columns of each row are shown for purposes of space. Referring to fig. 4, the memory includes 8 memory blocks, which are denoted as Block1, Block2, … …, and Block 8. Each memory block includes a plurality of memory addresses, denoted Addr0, Addr1, … … Addr82, … …. The width of one storage address is 64 bits, and 8 pixel points can be stored in total. Taking the example of the first 8 rows in the target image, 1-8 columns of the first 8 rows are one image block (called image block 1), 9-16 columns of the first 8 rows are another image block (called image block 2), 17-24 columns of the first 8 rows are another image block (called image block 3), and so on. It can be seen that in image Block1, each row is stored in a different memory Block, e.g., row 1 is stored in Block1 and row 2 is stored in Block 2. And each row in image block1 is stored in the same storage address Addr 0. In addition, rows 1-8 in tile 2 and rows 1-8 in tile 3 are also stored in blocks 1-Block8, i.e., row 1 in each tile is stored in Block1, row 2 in each tile is stored in Block2, and so on. The different image blocks differ in that each image block is stored at a different storage address, e.g. image block1 is stored at storage address Addr0 and image block2 is stored at storage address Addr 1.
In some embodiments, the address generation module may obtain an address offset between a storage address of the image block in a corresponding storage block and an initial storage address of the corresponding storage block; determining a target address based on the initial memory address and the address offset; and scanning the image blocks from the target addresses and outputting the scanned image blocks to the target image block generation module. The initial storage address may be a storage address of the 1 st image block in the target image, i.e. the storage address Addr0 in fig. 4, or a starting storage address of the memory, e.g. 0. The above storage mode can realize the line scanning by simply accumulating the addresses, and the starting position of the line scanning can be adjusted by adjusting the address offset, for example, when Block1 and Block2 are accumulated from 80, and Block2 to Block8 are accumulated from 0, the line scanning from the 3 rd line to the 10 th line can be realized, and the reading efficiency of the image Block is improved.
As shown in fig. 5, the address generation module acts directly on the memory for progressive scanning, because the image is divided into 8 blocks in the memory, and 8 rows of line scanning can be performed simultaneously, and the size of one target image Block is 7 × 7, so that one line scanning can calculate the characteristic response of two rows. As shown in (5-a) of fig. 5, the pixel points for calculating the feature response obtained after the first scanning are a circle of pixel points at the inner edge of the black frame in (5-a) and a circle of pixel points at the inner edge of the gray frame, so as to obtain two feature responses in the same column. And during the second scanning, the black frame and the gray frame both move to the right by one pixel position, and two characteristic responses of the other column can be obtained based on a circle of pixel points on the inner edge of the moved black frame and the moved gray frame. Therefore, the first 8x8 image block can only support two clock cycles, but when the 2 nd 8x8 image block is taken out, the 8 clock cycles can be supported, and in the subsequent stage of the row, an 8x8 image block is read every 8 cycles. It should be noted that the individual 8x8 image blocks in the target image may be overlapping, for example, one 8x8 image block includes columns 1-8 in rows 1-8 in the target image (i.e., image block1 described above), and another 8x8 image block includes columns 1-8 in rows 2-9 in the target image (referred to as image block 4). Similarly, the whole image block2 is shifted down by one pixel position to obtain an image block 5; the image block 6 can be obtained by shifting the whole image block3 by one pixel position. The manner of obtaining the target image blocks from the image blocks 4, 5, and 6 is similar to the manner of obtaining the target image blocks from the image blocks 1, 2, and 3, and is shown as (5-d), (5-e), and (5-f) in fig. 5, and is not described herein again.
And under the condition that the image block is scanned completely, the address generation module can also adjust the address offset and determine a target address based on the initial storage address and the adjusted address offset. As shown in (5-b) in fig. 5, after each line is scanned, the address offset (as shown by the dotted arrow) of the two image blocks is modified, and thus the target image block generation of the whole target image can be realized. The address offset is specifically adjusted according to the numerical values in the table in fig. 5, for example, when the 9 th to 16 th columns in the first 8 rows of the target image are scanned, the address offset is adjusted to 1; the address offset amount is adjusted to 2 when scanning the 17 th to 24 th columns in the first 8 rows of the target image. For address generation during line scanning. As shown in (5-c) in fig. 5, reading line by line in this way achieves sequential transmission of 8 × 8 image blocks in the target image, and can ensure that the target image block for acquiring two characteristic responses can be generated every clock cycle in the subsequent target image block generation process, thereby ensuring the smoothness of the pipeline.
Of course, the storage manner in the above embodiment is only an optional implementation manner, and the storage manner of the target image of the present disclosure is not limited thereto. For example, the memory may include 16 memory blocks, where one group (8) of memory blocks is used to store the 1 st to 200 th rows in the target image, and the other group (8) of memory blocks is used to store the 201 nd and 400 th rows in the target image, and the storage manner of each group of memory blocks is similar to that in the foregoing embodiment, and is not described herein again. For another example, the number of columns stored in each storage block may be greater than the number of columns of one image block, and in the case where the number of columns of image blocks is 8, 9, 10, or more columns of data may be stored in one storage block. As another example, a plurality of rows of data in an image block may be stored in a memory block. Accordingly, the data scanning mode can be adjusted based on the data storage mode, and is not described one by one here.
The embodiment of the disclosure performs full-flow design on the calculation process of the characteristic response, and performs NMS in the process of generating the characteristic response. As shown in fig. 6, after the start of the processing, the calculation module may scan the entire target image, scan several lines in the target image each time, and calculate the feature responses of the several lines, and the calculated feature responses may be immediately sent to the window filtering module for processing, followed by the mesh filtering. Assuming that the size of the sliding window is 3x3 and the size of the grid is 32x20, the feature response can be scanned and calculated in a two-line manner, the feature response is sent to the window filtering module immediately after being generated, then the grid filtering result is carried out, and since the feature response is generated in lines, the image-level filtering of the first line can be carried out after the grid filtering of the first 20 lines is completed. Then, each row of grid-level filtering results is generated, and image-level filtering can be performed instantly. In the above process, the grid filter results are designed to be output immediately, and a part of the NMS is fused into the pipeline of the feature response computation for processing without causing serious serial delay. Therefore, the scheme ensures the stability of processing time and reduces serial delay while reducing the number of characteristic points.
In the above pipeline, since the execution time of each stage of the pipeline has stability, after the data processing apparatus for ORB acceleration is enabled, the corresponding module is activated after delaying for several cycles, and the whole pipeline can be started.
In some embodiments, the data processing apparatus for ORB acceleration further includes a control unit, configured to send enable signals to the address generation module, the target image block generation module, the window filtering module, and the mesh filtering module, for triggering the target image block generation module, the window filtering module, and the mesh filtering module to perform corresponding operations.
The time when the target image block generating module receives the enable signal is determined based on the time when the address generating module receives the enable signal and the duration of one image block acquired by the target image block generating module; the time when the window filtering module receives the enabling signal is determined based on the time when the target image block generating module receives the enabling signal, the time length when the target image block generating module generates the target image block and the time length when the computing module computes the characteristic response of the characteristic point in the target image block; the time when the mesh filtering module receives the enabling signal is determined based on the time when the window filtering module receives the enabling signal and the time length when the window filtering module obtains the plurality of first feature points.
As shown in fig. 7, the address generation module may determine, in response to receiving the enable signal, a storage address of the currently-to-be-read image block in the memory, and read an image block from the storage address and output the image block to the target image block generation module. The target image block generation module receives the enable signal after delaying for 4 cycles and generates target image blocks in response to the enable signal, and two target image blocks can be generated in each cycle. The generated target image block can be sent to a calculation module to calculate a feature response, and 2 lines of feature responses can be generated each period. And the window filtering module receives the enabling signal after delaying for 19 periods, and performs window filtering in response to the enabling signal, and 2 characteristic responses after the window filtering are obtained each time. The grid filtering module receives the enabling signal after delaying for 28 periods, and responds to the enabling signal to carry out grid filtering, and 2 lines of characteristic responses after grid filtering are obtained each time.
The enable signal received by each module may be sent by the control unit after delaying for different periods, or may be sent by the control unit at the same time, and the delay unit delays for different periods, which is not limited in this disclosure. Through the embodiment, the pipelining processing of each module can be automatically realized. The specific structure of each module is explained below.
In some embodiments, the target image block generation module comprises: the device comprises a first selection unit, at least two storage units, an alignment unit and a scanning unit; the first selection unit is used for selecting one storage unit from the at least two storage units, and the adjacent two selections are used for selecting different storage units; each storage unit of the at least two storage units is used for caching the image blocks acquired from the memory under the condition that the storage unit is selected by the first selection unit; the alignment unit is used for aligning the cached image blocks to obtain aligned image blocks, and the relative positions of all pixel points in the aligned image blocks are matched with the relative positions of all pixel points in the corresponding target image blocks; and the scanning unit is used for scanning the target image blocks line by line from the alignment unit.
In the embodiment of the present disclosure, a Ping-Pong Buffer (Ping-Pong Buffer) is formed by using at least two storage units, and a specific manner of generating a target image block is described below with reference to fig. 8 by taking a case where the number of the storage units is equal to 2 as an example. After each image block is received, the image block is immediately written into the Ping-Pong Buffer. Each storage unit may include a plurality of cache blocks for caching data retrieved from respective storage blocks in the memory. The number of cache blocks may be greater than or equal to the number of memory blocks. After receiving the image block1, the image block1 may be stored into one of the storage units (referred to as Buffer1), and in the next 2 cycles, a target image block is generated based on the image blocks buffered in the Buffer 1; after receiving the image block2, the image block2 may be stored into another storage unit (referred to as Buffer2), and in the next 8 cycles, a target image block is generated based on the image blocks buffered in the Buffer1 and the Buffer 2; after receiving the image block3, the image block3 may be stored in the Buffer1, and in the next 8 cycles, a target image block is generated based on the image blocks cached in the buffers 2 and 1; and so on.
The storage units for caching the image blocks may be determined by a first selection unit (denoted as Ping-Pong Write Switch), which may alternatively select the storage units, and still take the example that the number of storage units is equal to 2, the first selection unit may alternatively select two storage units in the order of Buffer1, Buffer2, Buffer1, and Buffer2 … …. The selected storage unit may be used to store the image block.
In the above-mentioned storage process, when the rows are switched, the problem that the Block order does not match the logic row order occurs. For example, for the aforementioned image Block 4 (columns 1-8 in rows 2-9 of the target image), rows 1-7 of the image Block are sequentially stored in blocks 2-Block8, and row 8 of the image Block is in Block1 here, but the scanning is performed sequentially in the order of Block1 to Block8, so the order of the rows in the scanned image Block is sequentially: line 8, line 1, line 2, … …, line 7. In addition, due to the characteristics of the Ping-Pong Buffer itself, a problem of line misalignment may occur. For example, after the image block2 is received, the Buffer1 buffers the previous column in the target image, and the Buffer2 buffers the next column in the target image, so that the data in the Buffer1 needs to be acquired first, and then the data in the Buffer2 needs to be acquired. After the image block3 is received, the previous column in the target image is cached in the Buffer2, and the subsequent column in the target image is cached in the Buffer1, so that the data in the Buffer2 needs to be acquired first, and then the data in the Buffer1 needs to be acquired.
Therefore, the data in the storage unit can be aligned by providing an alignment unit (denoted as Image Align Switch) after which the Image is sorted into a state in which both the row and column order are correct. As shown in fig. 8, before alignment, the data stored in Buffer1 and Buffer2 sequentially correspond to the 7 th row, the 8 th row, the 1 st row, … … and the 6 th row in the target image block, and the preceding column is stored in Buffer2 and the following column is stored in Buffer 1. By aligning, the rows and columns in the obtained image block both correspond to the rows and columns in the target image block. In the above-described embodiment, each cache block in the storage unit may be constituted by a Flip Flop (FF), and thus complete random reading can be made. After alignment, the scanning unit (Patch Scan) may sequentially Scan out the target image blocks for calculating the feature responses, where the generated target image blocks are two lines of target image blocks for subsequent modules to simultaneously calculate two lines of feature responses.
In some embodiments, the target image block generation module further comprises a counter for counting the number of times the target image block generation module fetches an image block from the memory; the first selection unit is used for selecting one storage unit from the at least two storage units based on the counting value of the counter and the number of the storage units; the alignment unit is used for aligning the data in each storage unit based on the count value of the counter, the number of the storage units and the number of rows of the image blocks.
Still taking the image block size as 8 × 8 and the number of storage units equal to 2 as an example, the first selection unit may obtain the count value of the counter and the modulo-2 result. If the modulo result is 1, Buffer1 is selected, and if the modulo result is 0, Buffer2 is selected. The alignment unit may also obtain the count value of the counter and the modulo result of 2, and if the modulo result is 1, take the data in Buffer1 as the preceding column and the data in Buffer2 as the following column; if the modulo result is 0, the data in Buffer2 is taken as the preceding column and the data in Buffer1 is taken as the following column. Furthermore, the alignment unit may further acquire the count value and a modulo result of 8, and adjust the order of the rows in the storage unit based on the modulo result. The modulus result obtained by the first selection unit and the alignment unit may be directly output by the counter, or may be obtained by performing a modulus operation in the unit after the count value of the counter is obtained.
After two rows of Patch are generated in a pipelining manner, the feature response calculation is needed, and the calculation process of the feature response is still designed for the full pipelining. In some embodiments, the calculation module comprises: the extraction unit is used for extracting a pixel sequence from the target image block, wherein the pixel sequence comprises a plurality of target pixel points around the feature points in the target image block; a difference calculating unit, configured to obtain difference values between pixel values of each pixel in the pixel sequence and pixel values of feature points in the target image block respectively; the recombination unit is used for acquiring a plurality of groups of difference values, wherein each group of difference values comprises a difference value corresponding to each target pixel point in a plurality of continuous target pixel points in the pixel sequence; a plurality of comparison units, each comparison unit for determining a maximum difference value and a minimum difference value from a set of difference values, determining a first intermediate quantity based on the maximum difference value, and determining a second intermediate quantity based on the minimum difference value; and the calculating unit is used for calculating the characteristic response of the characteristic points in the target image block according to the first intermediate quantity and the second intermediate quantity corresponding to each group of difference values.
Referring to FIG. 9, assuming that the pixel sequence is I [1:16] as described in the previous embodiment, the pixel points can be extracted by the extraction unit. The difference unit can calculate the difference between the pixel value p of each pixel point and the central pixel point in the pixel sequence, and the difference is respectively marked as p-1, p-2, … … and p-16. For convenience of description, each of the above differences is assigned a number, e.g., p-1 is numbered 0, p-2 is numbered 1, and so on. The recombination unit may obtain N consecutive difference values from the plurality of difference values output by the difference calculating unit. In the FAST-9 algorithm, N takes the value 9. Thus, the consecutive N difference values may include a difference value numbered 1 to 9, a difference value numbered 2 to 10, a difference value numbered 3 to 11, and so on. Then, a set of consecutive 9 difference values may be compared by the comparing unit to obtain the first and second intermediate quantities, i.e. a0 and B0 in the previous embodiment. In the figure ap0, ap1, bp0 and bp1 are all buffers for respectively buffering Amp, AMn, BMp and BMn, see the above formula, Amp and AMn are used for calculating a0, BMp and BMn are used for calculating B0. To increase the processing efficiency, a plurality of parallel comparison units may be used here, and the number of comparison units may be equal to the number of sets of differences. Then, a calculation unit may be used to calculate the characteristic response based on the first intermediate quantity and the second intermediate quantity, in the manner described in the formula for calculating Score in the previous embodiment.
Further, for convenience of hardware implementation, the difference calculating unit includes a plurality of sub-units, and each sub-unit is configured to obtain a difference between a pixel value of a pixel in the pixel sequence and a pixel value of a feature point in the target image block; and obtaining the difference value corresponding to each pixel point in the previous pixel points in the pixel sequence by two subunits in the plurality of subunits. Still referring to fig. 9, the difference values corresponding to the pixel points with the sequence numbers of 0 to 9 in the pixel sequence can be obtained by two subunits in the difference calculating unit, each subunit can sort a group of difference values, and determine the first intermediate quantity and the second intermediate quantity corresponding to the group of difference values based on the sorting result.
The sorting task is usually a task with higher seriousness, and in order to ensure the whole pipelining property, the pipelining sorter shown in fig. 10 can be adopted, and the idea of the pipelining sorter is consistent with the merging sorting, and all data does not need to be cached in the process. Fig. 10 illustrates the sorting process by taking the comparison unit in the first column of fig. 9 as an example, according to the calculation formulas of the aforementioned embodiments of AMp and AMn, d0, d1, … …, and d9 can be fed into the data input terminal in fig. 10, and the first intermediate quantity and the second intermediate quantity are calculated according to the connection relationship in fig. 10. Wherein x and y are two numbers of d0, d1, … … and d9, a represents the larger of the two numbers, I represents the smaller of the two numbers, H represents that the corresponding number is kept unchanged and output, namely output 0.
In some embodiments, the calculation module further includes a Mask generation module (Mask Generator) configured to obtain a Mask of each target pixel in the image block, where the target pixel is a pixel around a candidate pixel in the image block, and the Mask of one target pixel is used to represent a pixel value difference between the target pixel and the candidate pixel; a first screening module (Quick Judge) for determining whether the candidate feature point satisfies a first condition based on the mask of each target pixel point; the first condition is that the number of masks representing that the pixel value difference is larger than a preset difference is larger than a first number; a second filtering module (Detail Judge) for determining whether the candidate feature point satisfies a second condition if the candidate feature point satisfies a first condition, and determining the candidate feature point as a feature point if the candidate feature point satisfies the second condition; the second condition is that the number of masks representing that the pixel value difference is greater than a preset difference is greater than a second number; the second number is greater than the first number.
The Mask is Mask1F or Mask2F in the foregoing formula, the first filtering module may perform filtering in the manner described in the foregoing embodiment of "fast filtering", and if one target image block satisfies any one of Mask1F and Mask2F, output the target image block to the second filtering module for filtering. The second screening module may perform screening in the manner described in the previous "further screening" embodiment, and this module traverses all possible consecutive 9 points on the circumference in parallel, for a total of 16 possibilities. The specific processing modes of the mask generation module, the first screening module and the second screening module are detailed in the foregoing embodiments, and are not described herein again.
In some embodiments, the calculation module further includes a second selection unit configured to output one target image block to the extraction unit if the target image block includes a feature point; in the case where the feature point is not included in one target image block, the image blocks of all zeros are output to the extraction unit. The second selecting unit is the selector in fig. 9, and the selector can select the target image block to output to the extracting unit when the feature point determination is passed; and when the feature point judgment fails, selecting all-zero image blocks and outputting the all-zero image blocks to the extraction unit. And the characteristic point is judged to be a target image block comprising the characteristic point.
The characteristic points obtained by the calculating module can be output to the window filtering module for filtering. In some embodiments, the window filtering module may include: the device comprises a first cache unit, a write unit, a second cache unit, a read unit and a filter unit; the first caching unit is used for caching the feature response of each group of feature points in a target image block, wherein each group of feature points in the target image block comprises a plurality of rows of feature points in the target image block; the writing unit is configured to, under the condition that the feature response of the (i + 1) th group of feature points in the target image block is obtained, read the feature response of the (i) th group of feature points in the target image block from the first cache unit, and write the feature response of the (i) th group of feature points and the feature response of the (i + 1) th group of feature points into the second cache unit; the writing unit is configured to, when feature responses of a1 st group of feature points in the target image block are obtained, obtain a group of all-zero feature responses, and write the all-zero feature responses and the feature responses of the 1 st group of feature points into the second cache unit; the reading unit is used for reading the characteristic response of the characteristic point in the sliding window from the second cache unit and sending the characteristic response to the filtering unit; the filtering unit is used for outputting the characteristic response of the first characteristic point in the sliding window.
Referring to fig. 11 and 12, in the case where the size of the sliding window is 3x3, the window filtering operation may also be referred to as 3x3 NMS. A difficulty with 3x3NMS pipelining is that it needs to rely on data that has not yet been generated, for which the present disclosure contemplates a two-stage 3x3NMS as follows. As shown in fig. 11, 0 is pre-stored in the buffer first, the first line of feature response is substantially the fourth line of the target image due to the limitation of the sliding window, and meanwhile, since the memory is designed by 8 blocks, two lines of feature responses can be calculated at one time. For the two coming lines of feature responses, the feature responses are pieced together with two lines 0 to form 4 lines of feature responses, and the 4 lines of feature responses are sent to the second cache unit, and the image Block of 3x3 used for calculating the feature responses, namely the NMS Block in the figure, is read out from the second cache unit by the reading unit. The NMS of 3x3 needs to compute the upper and lower rows of data of the target row, so the resulting data are row 0 and the fourth row feature response through the 3x3NMS, which can be written into the buffer while the 4 th and fifth rows are being computed, spliced out when the 6 th and 7 th rows arrive, output the 5 th and 6 th row feature responses through the 3x3NMS using the data of the 4-7 rows, and so on, and the processing of the last row is similar to the first row.
In some embodiments, the first cache unit includes a plurality of first cache subunits, each of which is configured to cache the feature responses of a group of feature points, and at least one first target cache subunit in the plurality of first cache subunits is in the process of writing the feature responses, and the cached feature responses in at least one second target cache subunit in the plurality of cache subunits other than the first target cache subunit are read by the write unit.
The plurality of first buffer subunits may constitute a ping-pong buffer. Referring to fig. 12, taking the number of the first buffer sub-units equal to 2 as an example, the plurality of first buffer sub-units may include a buffer a (buffer a) and a buffer b (buffer b). When two rows of feature responses arrive, the two rows of feature responses may be stored into the Buffer a, and then the feature responses are read from the Buffer a, and are spliced with the other two rows of feature responses to form four rows of feature responses in the manner described above. Meanwhile, the other two rows of feature responses can be cached through the Buffer B, and the feature responses cached in the Buffer B and the other two rows of feature responses are spliced into four rows of feature responses in the next processing. Thus, the data read-write efficiency can be improved. Two rows of feature responses can be selectively stored into Buffer a or Buffer B through the selector 1, data in the Buffer a or Buffer B can be selectively read out through the selector 2, and two rows of feature responses or two rows of all-zero rows can be selectively read out through the selector 3.
In some embodiments, the second cache unit includes a plurality of second cache subunits, each second cache subunit being configured to cache a column in a feature response array composed of the feature responses of the ith group of feature points and the feature responses of the (i + 1) th group of feature points; the reading unit is used for polling the characteristic responses in the plurality of second buffer subunits, and each polling outputs the characteristic responses in part of the plurality of second buffer subunits according to an output sequence, wherein the output sequence is related to the polling times.
Still referring to fig. 12, since the number of rows of the signature response acquired by the write unit is 4, 4 second buffer units, i.e., registers Reg1, Reg2, Reg3, and Reg4 in the drawing, may be provided. The writing unit writes each column of data in the four-row feature response into Reg1, Reg2, Reg3 and Reg4 in turn, and then reads out according to a certain rule. For example, when the register Reg3 is written, the register Reg1 and the register Reg2 should store data of the first two columns of Reg3, and at this time, Reg1, Reg2 and Reg3 can constitute one 3x3 data block, so that the data of Reg1, Reg2 and Reg3 are read out, and the data block is divided into two 3x3 data blocks, subjected to the 3x3NMS operation, and output after completion. For 4 registers, since a mechanism of writing in turns is adopted, a similar polling mechanism also exists during reading, and the specific reading sequence is as follows: reg1, Reg2, Reg 3; reg2, Reg3, Reg 4; reg3, Reg4, Reg 1; and Reg4, Reg1, Reg 2.
The window filtered feature responses may be sent to a mesh filter module for further filtering. In some embodiments, the grid filter module includes a plurality of max hold arrays, different max hold arrays corresponding to different column addresses; each maximum holding array of the plurality of maximum holding arrays is to: acquiring a characteristic response of a first characteristic point corresponding to a column address of the array; and under the condition that the line number of the acquired feature responses reaches a preset line number, outputting the maximum feature response in the acquired feature responses of the first feature points to the image filtering module.
Referring to fig. 13, assuming a mesh size of 32x20, each Max Hold array (Max Hold Cell) may be used to compare the signature responses of 32 columns thereof and output the largest signature response thereof. For example, a first max hold array may compare the signature responses of columns 0 through 31, a second max hold array may compare the signature responses of columns 32 through 63, and so on. For two rows of 3x3NMS results, which are entered by column, different maximum hold arrays are entered according to column address allocation, each maximum hold array holds a respective maximum value, the result is output every 32 rows and the current stored value is cleared. Meanwhile, because the running time of each stage of the pipeline is fixed, each module in the pipeline is only used for transmitting the 3x3NMS result.
After mesh filtering, image level filtering may be performed. The image-level filtering is a pure serial step, which can be implemented by a state machine, but since the grid filtering results of each column are generated continuously during operation, the delay caused by the image-level filtering only needs to consider the last row, the image-level filtering needs to compare 8 grids around each grid, which requires 8 read cycles, 8 calculation cycles, and 9 write-back cycles, so the total delay caused by the image-level filtering is 20x (8+8+9) ═ 500 clock cycles, which is equivalent to 0.5us at 1 GHz. For the multi-pyramid case, the delay of image-level filtering will only occur in the highest layer, since the smaller two-layer pyramid is faster to compute than the highest layer. For the frame delay of the feature response generation as a whole, firstly, one side of the pixels of the whole target image needs to be scanned, the total number of the pixels is 640x400 is 256000, the target image scanning needs 128000 cycles because a processing core for two-row parallel scanning is designed, 3x3NMS and grid-level NMS are performed in a pipelining mode due to feature response calculation, the pipeline delay is 28 cycles, one row of delay is needed in 3x3NMS, the pipeline delay is 128668 cycles, the number of cycles of image-level filtering described above is calculated, the total number of 129168 cycles is calculated, and the theoretical delay is 0.13ms at 1 GHz. In the related art, the frame delay of the ORB algorithm at 2GHz is generally about 10ms, and therefore, the scheme of the disclosure can effectively reduce the frame delay.
In some embodiments, the data processing apparatus for ORB acceleration includes a plurality of processing cores, each processing core including the window filtering module and the mesh filtering module; different processing kernels are used for filtering target images with different resolutions, and the target images with different resolutions are obtained by compressing the same original image based on different compression ratios; the image filtering module is further configured to merge the third feature points in each target image to obtain the feature points of the original image.
By processing the target images with different resolutions corresponding to the same original image, the feature points with different scales in the original image can be ensured to be extracted. Alternatively, the original image may be one of the target images, for example, assuming that the size of the original image is 640x400, the original image of 640x400 may be determined as one of the target images, and the original image is compressed by a compression ratio of 4 times, resulting in a second target image of 320x200 and a third target image of 160x 100. Alternatively, only the 320x200 image and the 160x100 image may be determined as the target images, and the original image may not be determined as the target image. Of course, the resolution and the compression ratio of each image are exemplary, and the resolution and the compression ratio of each image used in the present disclosure may be other values.
Since the number of target images is plural and the resolutions of different target images are different, the processing time period required for each target image is different. In order to improve the processing efficiency, it is necessary to reasonably allocate each target image to the plurality of processing cores. In some embodiments, each processing core is assigned at least one target image, and the target images assigned to the same processing core are processed serially in the corresponding processing core; and the difference of the total time lengths of the processing cores for processing the target images distributed to the processing cores is smaller than a preset difference value. In this way, the parallelism of the NMS scheme is improved, and the resource utilization rate is improved.
Referring to fig. 14, a case of including two processing cores of Core1 and Core2 is taken as an example, and it is assumed that a target image LvL0 with a resolution of 640x400, a target image LvL1 with a resolution of 320x200, and a target image LvL2 with a resolution of 160x100 are included, where a low-resolution image may be acquired by an image pyramid (e.g., a gaussian pyramid, a laplacian pyramid, etc.). Since the two low-resolution target images are less computationally intensive than the highest-resolution target image, two parallel computation channels can be designed, which are implemented by Core1 and Core2, respectively. In the process of calculating the target image with the highest resolution by using one channel, another two target images with low resolution are sequentially calculated by using the other channel, the time required by pyramid construction is considered, and the processing time required by the two calculation channels is close. In this way, the free resources can be kept at a low level.
Example two
Referring to fig. 15, an embodiment of the present disclosure further provides a data processing apparatus for ORB acceleration, configured to determine feature points in a target image, where the target image includes a plurality of image blocks, different rows in a same image block are stored in a same memory address of different memory blocks in a memory, respectively, and corresponding rows in at least two image blocks are stored in a same memory block in the memory; the data processing apparatus for ORB acceleration includes:
a target image block generation module 1501, configured to acquire image blocks from target storage addresses of respective storage blocks, and generate target image blocks based on the acquired image blocks;
a calculating module 1502, configured to calculate feature responses of candidate feature points in the target image block;
a filtering module 1503, configured to determine a target feature point from the candidate feature points based on the feature response of the candidate feature points.
The data processing apparatus in this embodiment is used to improve a feature extraction module of an ORB accelerator in the related art. The filtering module 1503 may include the window filtering module 201, the mesh filtering module 202, and the image filtering module 203 in the first embodiment, or may only include the window filtering module 201, or adopt other structures, and the functions of the modules in this embodiment may refer to the first embodiment, which is not described herein again.
In some embodiments, the address generation module may obtain an address offset between a storage address of the image block in a corresponding storage block and an initial storage address of the corresponding storage block; determining a target address based on the initial memory address and the address offset; and scanning the image blocks from the target addresses and outputting the scanned image blocks to the target image block generation module. The initial storage address may be a storage address of the 1 st image block in the target image, or a starting storage address of the memory, for example, 0. The storage mode can realize the line scanning by only accumulating the addresses, and the initial position of the line scanning can be adjusted by adjusting the address offset, thereby improving the reading efficiency of the image block.
EXAMPLE III
Referring to fig. 16, an embodiment of the present disclosure further provides a data processing apparatus for ORB acceleration for generating a descriptor of a feature point in a target image, where the data processing apparatus for ORB acceleration includes:
a target image block generation module 1601, configured to read image blocks from a memory, and determine a target image block for calculating a descriptor from the read image blocks; the memory comprises a plurality of memory blocks, each memory block comprises a plurality of memory addresses, each memory address of the same memory block is used for storing a plurality of columns in the same row of the target image, and the same address of each memory block is used for storing different columns in the same row of the target image;
a descriptor generating module 1602, configured to generate a descriptor of a feature point in the target image block.
The data processing apparatus in this embodiment is used to improve the descriptor generation module of the ORB accelerator in the related art. The target image block of the embodiment of the present disclosure may include a plurality of feature points, where the plurality of feature points may be obtained by using the data processing apparatus for ORB acceleration in the first embodiment or the second embodiment, for example, the window filtering module 201 in the first embodiment may perform window filtering on candidate feature points in the target image to obtain a plurality of first feature points whose feature responses are greater than zero, then the mesh filtering module 202 performs mesh filtering on the first feature points to obtain a plurality of second feature points whose feature responses are greater than zero, and then the image filtering module 203 performs filtering on second feature points in a plurality of adjacent meshes to obtain a plurality of third feature points whose feature responses are greater than zero. The third feature points are the feature points that need to generate the descriptor in this embodiment. For a specific way of determining a plurality of feature points that need to generate a descriptor from a target image, reference may be made to the foregoing embodiment, and details are not described here again. In other embodiments, the feature points of the descriptor that needs to be generated in this embodiment may also be obtained in other manners, or the aforementioned multiple third feature points and the multiple feature points obtained in other manners are used together as the feature points of the descriptor that needs to be generated in this embodiment.
An embodiment of a data processing apparatus for ORB acceleration for generating a descriptor is explained below, and for brevity, differences of the present embodiment from the foregoing embodiment are mainly described below.
The following description is given by taking the size of the target image as 640x400, and only the first 8 rows of the target image and the first 72 columns of each row are shown in the figure for the sake of space. Referring to fig. 17, the number of memory blocks is 8, which are denoted as Block1, Block2, … …, and Block8, which are denoted as B1, B2, B3, … …, and B8, respectively. Each memory block includes a plurality of memory addresses, denoted as a0, a1, … … a72, … …. Each memory address of the same memory Block may be used to store 8 columns of pixel values in the same row of the target image, and the same memory address of each memory Block may be used to store a different column in the same row of the target image, e.g., 1-8 columns in address A0 of Block1 of the target image row 1, 9-16 columns in address A0 of Block2 of the target image row 1, 17-24 columns in address A0 of Block3 of the target image row 1, and so on. Further, at least two different memory addresses of the same memory block may be used for storing different columns in the same row of the target image. For example, address A0 of Block1 stores columns 1-8 of row 1 of the target image, and address A1 of Block1 stores columns 65-72 of row 1 of the target image. Of course, the above examples are merely illustrative and not the only possible implementations. For example, in practical applications, the number of memory blocks, the amount of data stored in each memory address in a memory block, and the like can be adjusted according to practical needs.
In order to save cost and reduce volume, the memory is often only one, and the cost of increasing the memory is too large, so that how to efficiently utilize image storage becomes a key for optimization. In the present embodiment, unlike the feature point detection process, descriptor generation does not use storage of the target image in the manner of 8 × 8 data block storage, but in the storage manner as shown in fig. 17. The advantage of this storage is that 64 consecutive numbers in the same row can be read out in one cycle. For the descriptor subtask, because there is a rotation amount, the maximum possible arrival distance is 18 pixels around the target pixel, and this arrangement storage can be implemented in 18+18+1 cycles, and the whole data block for calculating the descriptor in the target image is read out.
In some embodiments, the data processing apparatus for ORB acceleration further includes an address generation module, configured to obtain an address offset between a storage address of a column to be read in a corresponding storage block in a target image and an initial storage address of the corresponding storage block; determining a target address based on the initial memory address and the address offset; and scanning the column to be read from the target address. The storage mode enables data scanning to be achieved by simply accumulating addresses, the starting position of scanning can be adjusted by adjusting the address offset, for example, when the blocks 1 to 8 are accumulated from 0, scanning of the 1 st to 64 th columns of the 1 st row can be achieved, and when the blocks 1 to 8 are accumulated from 11, scanning of the 1 st to 64 th columns of the 2 nd row can be achieved, and the reading efficiency of the image Block is improved.
Next to the previous example, the data of the same storage address in blocks 1 to 8 can be read out in one cycle, so that 64 pixels in 1 line in the target image can be read in one cycle, and these pixels are an image Block. Unlike the target image blocks of 7x7 in the first embodiment, the size of the target image blocks in the present embodiment is 37x 37. In some examples, 64 pixel points in each of 37 rows in the target image may be read out through 37 consecutive cycles, and the target image block of 37 × 37 may be truncated therefrom. A 37x37 target image block is the smallest image unit used to compute the descriptor. Of course, the above examples are merely illustrative, and the specific values may be adjusted according to actual needs.
After the target image block is obtained, descriptors of feature points in the target image block can be generated by the descriptor generation module 1602. In order to ensure that the descriptors of the feature points of the image are unchanged after rotation, the point pairs for generating the descriptors need to be rotated in the direction of the centroid. Thus, in some embodiments, the data processing apparatus for ORB acceleration further comprises: and the angle calculation module is used for calculating a rotation angle of the feature point in the target image block and outputting the rotation angle to the descriptor generation module so that the descriptor generation module generates the descriptor of the feature point based on the rotation angle.
In some embodiments, the rotation angle is calculated based on a centroid of the target image block, the centroid comprising a first image moment of the target image block in a row direction of the target image and a second image moment of the target image block in a column direction of the target image; the angle calculation module includes: a first calculation unit configured to obtain a sum of squares of the first image moment and the second image moment; a floating point conversion unit for converting the first image moment, the second image moment and the sum of squares into a floating point number; a second calculation unit for calculating an inverse of an arithmetic square root of the converted sum of squares; a third calculation unit configured to calculate a rotation angle based on the converted first image moment, the converted second image moment, and a reciprocal of the arithmetic square root; and the fixed point conversion unit is used for converting the rotation angle into fixed point numbers and outputting the fixed point numbers to the descriptor generation module.
The specific way to calculate the rotation angle can be seen in the calculation process of sin theta and cos theta mentioned above, wherein m is 10 I.e. the first moment of the image, m 01 I.e. the second image moment. Referring to fig. 18, the first calculation unit may include two multipliers and an adder, the two multipliers being used to calculate m, respectively 01 Sum of squares m of 10 Square of (d). The adder is used for pair m 01 Sum of squares m of 10 The squares of (a) are summed. In the angle calculation, the trigonometric function calculation needs to be processed, and the difference of the angle calculation can be directly reflected on the offset of the descriptor pair, thereby influencing the descriptor calculation. From the foregoing formula, m 10 And m 01 Is a sum of a large number of multiply-accumulate operations, so the value will be large, when m is calculated 10 And m 01 The multiplier in square can be split into multiple (for example, 4) 8 × 8 multiplications for final addition, after the square sum calculation is completed, the fixed point number is converted into the floating point number, the floating point number is used to calculate the reciprocal of the square root, the floating point multiplication is used to calculate the precise trigonometric function value, and finally the floating point conversion to the fixed point is performed.
The first calculation unit, the second calculation unit, the floating point conversion unit, and the fixed point conversion unit may be multiplexed by a state machine. Since there is no need to handle general floating point conversions, where m 10 And m 01 Since the sum of pixels is necessarily an integer, and the bit width of the fixed point output by the last floating point fixed point conversion stage is also a fixed value, the floating point unit in the module is not complete, and the area and the delay are saved. The process of calculating the reciprocal of the arithmetic square root of the converted sum of squares by the second calculation unit may be realized by operations such as shifting in hardware, and thus, in the core calculationThe special property of floating point number is used in the method, and the outer layer uses fixed point to carry out conversion, so that the calculation precision and the calculation speed can be improved simultaneously, and the hardware area is reduced.
In some embodiments, the angle calculation module further comprises: the device comprises a first extraction unit, a fifth calculation unit, a counter and a second extraction unit; the first extraction unit is used for extracting the target image block generated by the target image block generation module and outputting the pixel values of the pixel points in the target image block to the fifth calculation unit; the fifth calculation unit is configured to calculate the centroid based on the pixel values output by the first extraction unit; the counter is used for counting the number of times that the first extraction unit outputs the pixel value to the fifth calculation unit; the second extracting unit is used for extracting the centroid from the fifth calculating unit when the counting value of the counter reaches a preset counting value.
The process of extracting the centroid can be seen in the aforementioned m 01 And m 10 The calculation process of (2). Referring to FIG. 19, according to the formula, the vector A is I [ v, w ] in the formula]The vector B is I [ -v, w ] in the formula]Vector C is w in the formula, and one of the two addition trees is used for pair I [ v, w ]]+I[-v,w]Are accumulated, and the other is used to pair w × (I [ v, w ]]-I[-v,w]) And accumulating. One of the two accumulators is used for outputting m 01 And the other for outputting m 10
In some embodiments, the number of the descriptor generation modules is greater than 1, and the target image block generation module is configured to output a target image block to the descriptor generation modules in an idle state, so that the descriptor generation modules generate descriptors of feature points in the received target image block in parallel.
Referring to fig. 20, each PE is a descriptor generation module, and a PE may include a storage unit (Patch Reg) for storing a target image block; an adjusting unit (Pattern Convert) for adjusting the descriptor pair based on the rotation angle; and a descriptor generation unit for generating a descriptor based on the target image block and the rotation angle. The generated descriptor can be received by the receiving module and sent to the storage unit.
According to the formula, the calculation of the central pixel point of the target image block requires first calculating the difference and sum of several lines around the image, and due to the above stored design, each PE can be used to calculate m 01 And m 10 When the target image block is read, a PE in an idle state is occupied at the same time, and the target image block is written into a storage unit of the PE in the idle state. At m 01 And m 10 After the calculation is finished, the angle is calculated, then a starting signal is sent to the PE, and the PE starts to operate independently.
Since the angle rotation patterns (patterns) of the descriptors are approximately randomly distributed, the computation of the descriptors belongs to a pure serial operation, in some embodiments, the descriptors total to 256 bits, there are 256 patterns, so that a PE needs 256 cycles to end after starting to run, and the target image block generation module and the angle computation module described above need 60 cycles to process a computation, so that the efficiency can be maximized by setting 3 PEs or 4 PEs, and the design of 3 PEs is adopted here because 3 PEs can already support the throughput of FAST. In practical applications, the operation duration of a single PE and the total processing duration of the target image block generation module and the angle calculation module may be different according to actual situations, and therefore, the number of PEs may be correspondingly adjusted according to the operation duration of the single PE and the total processing duration of the target image block generation module and the angle calculation module.
Example four
Referring to fig. 21, an embodiment of the present disclosure further provides a data processing apparatus for ORB acceleration for generating a descriptor of a feature point in a target image, where the data processing apparatus for ORB acceleration includes:
an angle calculating module 2101, configured to calculate a rotation angle of the feature point;
a descriptor generation module 2102 for generating a descriptor of the feature point based on the rotation angle;
the rotation angle is calculated based on a centroid of a target image block where the feature point is located, wherein the centroid comprises a first image moment of the target image block in a row direction of the target image and a second image moment of the target image block in a column direction of the target image; the angle calculation module includes:
a first calculating unit 2101a, configured to obtain a sum of squares of the first image moment and the second image moment;
a floating point conversion unit 2101b to convert the first image moment, the second image moment and the sum of squares to a floating point number;
a second calculation unit 2101c for calculating the reciprocal of the arithmetic square root of the converted sum of squares;
a third calculating unit 2101d for calculating a rotation angle based on the converted first image moment, the converted second image moment and the reciprocal of the arithmetic square root;
the fixed point conversion unit 2101e is configured to convert the rotation angle into fixed point numbers and output the fixed point numbers to the descriptor generation module.
The data processing apparatus in this embodiment is used to improve the descriptor generation module of the ORB accelerator in the related art. The specific functions of the various modules and units in the disclosed embodiments are detailed in the previous embodiments, since there is no need to handle general floating point translations, where m is 10 And m 01 Since the sum of pixels is necessarily an integer, and the bit width of the fixed point output by the last floating point fixed point conversion stage is also a fixed value, the floating point unit in the module is not complete, and the area and the delay are saved. The process of calculating the reciprocal of the arithmetic square root of the converted sum of squares by the second calculation unit can be realized by operations such as shifting on hardware, so that the special property of floating point numbers is used in core calculation, and the mode of performing conversion by using fixed points on the outer layer can simultaneously improve the calculation precision and the calculation speed and reduce the hardware area.
In some embodiments, the data processing apparatus for ORB acceleration used for generating descriptors in this embodiment and the data processing apparatus for ORB acceleration used for determining feature points in the foregoing embodiments are both chips, and the two data processing apparatuses for ORB acceleration may be integrated on the same chip or two different chips.
In some embodiments, the present disclosure also provides a chip including the data processing apparatus for ORB acceleration according to any of the embodiments of the present disclosure. For example, the chip may comprise data processing means for ORB acceleration for determining feature points, or data processing means for ORB acceleration for generating descriptors, or data processing means for ORB acceleration for determining feature points and data processing means for ORB acceleration for generating descriptors.
In some embodiments, the present disclosure also provides an electronic device comprising the chip of any one of the embodiments of the present disclosure.
Referring to fig. 22, the present disclosure also provides a data processing method for ORB acceleration, the method including:
step 2201: carrying out non-maximum suppression processing on the feature response of each candidate feature point of the target image in the sliding window to obtain a plurality of first feature points with the feature response being greater than zero; the target image is divided into a plurality of grids;
step 2202: carrying out non-maximum suppression processing on the characteristic response of the first characteristic point in each grid of the grids to obtain a plurality of second characteristic points with characteristic responses larger than zero;
step 2203: and filtering the second characteristic points in the plurality of adjacent grids to obtain a plurality of third characteristic points with characteristic response larger than zero.
Referring to fig. 23, the present disclosure further provides a data processing method for ORB acceleration, configured to determine feature points in a target image, where the target image includes a plurality of image blocks, different rows in a same image block are stored in a same memory address of different memory blocks in a memory, respectively, and corresponding rows in at least two image blocks are stored in a same memory block in the memory; the method comprises the following steps:
step 2301: acquiring image blocks from target storage addresses of the storage blocks, and generating target image blocks based on the acquired image blocks;
step 2302: calculating the characteristic response of candidate characteristic points in the target image block;
step 2303: determining a target feature point from the candidate feature points based on feature responses of the candidate feature points.
Referring to fig. 24, the present disclosure also provides a data processing method for ORB acceleration for generating descriptors of feature points in a target image, the method including:
step 2401: reading image blocks from a memory, and determining a target image block for calculating a descriptor from the read image blocks; the memory comprises a plurality of memory blocks, each memory block comprises a plurality of memory addresses, each memory address of the same memory block is used for storing a plurality of columns in the same row of the target image, and the same memory address of each memory block is used for storing different columns in the same row of the target image;
step 2401: and generating descriptors of the feature points in the target image block.
Referring to fig. 25, the present disclosure also provides a data processing method for ORB acceleration for generating descriptors of feature points in a target image, the method including:
step 2501: acquiring the sum of squares of a first image moment and a second image moment of a target image block; the first image moment is the image moment of the target image block in the row direction of the target image, and the second image moment is the image moment of the target image block in the column direction of the target image;
step 2502: converting the first image moment, the second image moment, and the sum of squares to floating point numbers;
step 2503: calculating the reciprocal of the arithmetic square root of the converted sum of squares;
step 2504: calculating a rotation angle based on the converted first image moment, the converted second image moment, and a reciprocal of the arithmetic square root;
step 2505: converting the rotation angle into fixed point number;
step 2506: generating a descriptor of the feature point based on the converted rotation angle.
The above method can be implemented by using the data processing apparatus in the foregoing embodiments, and specific details are given in the foregoing embodiments of the data processing apparatus, and are not described herein again.
The embodiments of the present disclosure also provide a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method according to any of the embodiments of the present disclosure.
Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.
From the above description of the embodiments, it is clear to those skilled in the art that the embodiments of the present disclosure can be implemented by software plus necessary general hardware platform. Based on such understanding, the technical solutions of the embodiments of the present specification may be essentially or partially implemented in the form of a software product, which may be stored in a storage medium, such as a ROM/RAM, a magnetic disk, an optical disk, etc., and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments of the present specification.
The systems, devices, modules or units illustrated in the above embodiments may be implemented by a computer chip or an entity, or by a product with certain functions. A typical implementation device is a computer, which may be in the form of a personal computer, laptop, cellular telephone, camera phone, smart phone, personal digital assistant, media player, navigation device, email messaging device, game console, tablet computer, wearable device, or a combination of any of these devices.
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the apparatus embodiment, since it is substantially similar to the method embodiment, it is relatively simple to describe, and reference may be made to some descriptions of the method embodiment for relevant points. The above-described apparatus embodiments are merely illustrative, and the modules described as separate components may or may not be physically separate, and the functions of the modules may be implemented in one or more software and/or hardware when implementing the embodiments of the present disclosure. And part or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
The foregoing is only a specific embodiment of the embodiments of the present disclosure, and it should be noted that, for those skilled in the art, a plurality of modifications and decorations can be made without departing from the principle of the embodiments of the present disclosure, and these modifications and decorations should also be regarded as the protection scope of the embodiments of the present disclosure.

Claims (11)

1. A data processing apparatus for ORB acceleration for generating descriptors of feature points in a target image, the apparatus comprising:
the target image block generation module is used for reading image blocks from the memory and determining a target image block for calculating the descriptor from the read image blocks; the memory comprises a plurality of memory blocks, each memory block comprises a plurality of memory addresses, each memory address of the same memory block is used for storing a plurality of columns in the same row of the target image, and the same memory address of each memory block is used for storing different columns in the same row of the target image;
and the descriptor generation module is used for generating descriptors of the feature points in the target image block.
2. The data processing apparatus for ORB acceleration as claimed in claim 1, wherein the number of descriptor generating modules is greater than 1, and the target image block generating module is configured to output the target image block to the descriptor generating module in an idle state, so that each descriptor generating module generates descriptors of feature points in the received target image block in parallel.
3. A data processing apparatus for ORB acceleration according to claim 1 or 2, wherein the apparatus further comprises:
and the angle calculation module is used for calculating a rotation angle of the feature point in the target image block and outputting the rotation angle to the descriptor generation module so that the descriptor generation module generates the descriptor of the feature point based on the rotation angle.
4. The data processing apparatus for ORB acceleration according to claim 3, wherein the rotation angle is calculated based on a centroid of the target image block, the centroid comprising a first image moment of the target image block in a row direction of the target image and a second image moment of the target image block in a column direction of the target image; the angle calculation module includes:
a first calculation unit configured to obtain a sum of squares of the first image moment and the second image moment;
a floating point conversion unit for converting the first image moment, the second image moment and the sum of squares into a floating point number;
a second calculation unit for calculating an inverse of an arithmetic square root of the converted sum of squares;
a third calculation unit configured to calculate a rotation angle based on the converted first image moment, the converted second image moment, and a reciprocal of the arithmetic square root;
and the fixed point conversion unit is used for converting the rotation angle into fixed point numbers and outputting the fixed point numbers to the descriptor generation module.
5. The data processing apparatus for ORB acceleration according to claim 4, wherein the angle calculation module further comprises:
the device comprises a first extraction unit, a fifth calculation unit, a counter and a second extraction unit;
the first extraction unit is used for extracting the target image block generated by the target image block generation module and outputting the pixel values of the pixel points in the target image block to the fifth calculation unit;
the fifth calculation unit is configured to calculate the centroid based on the pixel values output by the first extraction unit;
the counter is used for counting the number of times that the first extraction unit outputs the pixel value to the fifth calculation unit;
the second extracting unit is used for extracting the centroid from the fifth calculating unit when the counting value of the counter reaches a preset counting value.
6. A data processing apparatus for ORB acceleration for generating descriptors of feature points in a target image, the apparatus comprising:
the angle calculation module is used for calculating the rotation angle of the characteristic point;
a descriptor generation module for generating a descriptor of the feature point based on the rotation angle;
the rotation angle is calculated based on a centroid of a target image block where the feature point is located, wherein the centroid comprises a first image moment of the target image block in a row direction of the target image and a second image moment of the target image block in a column direction of the target image; the angle calculation module includes:
a first calculation unit configured to obtain a sum of squares of the first image moment and the second image moment;
a floating point conversion unit for converting the first image moment, the second image moment and the sum of squares into a floating point number;
a second calculation unit for calculating a reciprocal of an arithmetic square root of the converted sum of squares;
a third calculation unit configured to calculate a rotation angle based on the converted first image moment, the converted second image moment, and a reciprocal of the arithmetic square root;
and the fixed point conversion unit is used for converting the rotation angle into fixed point numbers and outputting the fixed point numbers to the descriptor generation module.
7. A chip characterized in that it comprises a data processing apparatus for ORB acceleration according to any of claims 1 to 6.
8. An electronic device, characterized in that it comprises a chip according to claim 7.
9. A data processing method for ORB acceleration for generating descriptors of feature points in a target image, the method comprising:
reading image blocks from a memory, and determining a target image block for calculating a descriptor from the read image blocks; the memory comprises a plurality of memory blocks, each memory block comprises a plurality of memory addresses, each memory address of the same memory block is used for storing a plurality of columns in the same row of the target image, and the same memory address of each memory block is used for storing different columns in the same row of the target image;
and generating descriptors of the feature points in the target image block.
10. A data processing method for ORB acceleration for generating descriptors of feature points in a target image, the method comprising:
acquiring the sum of squares of a first image moment and a second image moment of a target image block; the first image moment is the image moment of the target image block in the row direction of the target image, and the second image moment is the image moment of the target image block in the column direction of the target image;
converting the first image moment, the second image moment, and the sum of squares to floating point numbers;
calculating the reciprocal of the arithmetic square root of the converted sum of squares;
calculating a rotation angle based on the converted first image moment, the converted second image moment, and a reciprocal of the arithmetic square root;
converting the rotation angle into fixed point number;
generating a descriptor of the feature point based on the converted rotation angle.
11. A computer-readable storage medium, on which a computer program is stored, which program, when being executed by a processor, is adapted to carry out the method of any one of claims 9 to 10.
CN202210420445.XA 2022-04-20 2022-04-20 Data processing device and method for ORB acceleration, chip and electronic equipment Pending CN114820273A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210420445.XA CN114820273A (en) 2022-04-20 2022-04-20 Data processing device and method for ORB acceleration, chip and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210420445.XA CN114820273A (en) 2022-04-20 2022-04-20 Data processing device and method for ORB acceleration, chip and electronic equipment

Publications (1)

Publication Number Publication Date
CN114820273A true CN114820273A (en) 2022-07-29

Family

ID=82505278

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210420445.XA Pending CN114820273A (en) 2022-04-20 2022-04-20 Data processing device and method for ORB acceleration, chip and electronic equipment

Country Status (1)

Country Link
CN (1) CN114820273A (en)

Similar Documents

Publication Publication Date Title
CN108416327B (en) Target detection method and device, computer equipment and readable storage medium
CN102665049B (en) Programmable visual chip-based visual image processing system
CN111898733B (en) Deep separable convolutional neural network accelerator architecture
WO2019136764A1 (en) Convolutor and artificial intelligent processing device applied thereto
WO2022127225A1 (en) Image stitching method and apparatus, and device and storage medium
Sun et al. A flexible and efficient real-time orb-based full-hd image feature extraction accelerator
CN110717589A (en) Data processing method, device and readable storage medium
Ngo et al. Resource-aware architecture design and implementation of Hough transform for a real-time iris boundary detection system
CN114329324A (en) Data processing circuit, data processing method and related product
US9030570B2 (en) Parallel operation histogramming device and microcomputer
US20220113944A1 (en) Arithmetic processing device
CN111610963B (en) Chip structure and multiply-add calculation engine thereof
CN114820273A (en) Data processing device and method for ORB acceleration, chip and electronic equipment
CN114820274A (en) Data processing device and method for ORB acceleration, chip and electronic equipment
CN111260042B (en) Data selector, data processing method, chip and electronic equipment
US20220351432A1 (en) Reconfigurable hardware acceleration method and system for gaussian pyramid construction
CN112132914A (en) Image scale space establishing method and image processing chip
Motten et al. Low-cost real-time stereo vision hardware with binary confidence metric and disparity refinement
CN111260043A (en) Data selector, data processing method, chip and electronic equipment
Li et al. Design of high speed median filter based on neighborhood processor
CN113657587A (en) FPGA-based deformable convolution acceleration method and device
US11132569B2 (en) Hardware accelerator for integral image computation
US11580617B2 (en) Method of matching images to be merged and data processing device performing the same
CN112330524B (en) Device and method for quickly realizing convolution in image tracking system
EP4390711A1 (en) Performing an operation on an array of values at a processing unit

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination