CN113781290A - Vectorization hardware device for FAST corner detection - Google Patents
Vectorization hardware device for FAST corner detection Download PDFInfo
- Publication number
- CN113781290A CN113781290A CN202110998588.4A CN202110998588A CN113781290A CN 113781290 A CN113781290 A CN 113781290A CN 202110998588 A CN202110998588 A CN 202110998588A CN 113781290 A CN113781290 A CN 113781290A
- Authority
- CN
- China
- Prior art keywords
- unit
- hardware device
- data
- vectorization
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/38—Information transfer, e.g. on bus
- G06F13/42—Bus transfer protocol, e.g. handshake; Synchronisation
- G06F13/4204—Bus transfer protocol, e.g. handshake; Synchronisation on a parallel bus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Software Systems (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a vectorization hardware device for FAST corner detection. The hardware device includes: the image buffer register file unit consists of 9 vector registers and is used for buffering gray pixel data of an image; the vector processing unit is used for receiving FAST window data from the image cache register file unit aiming at the calculation and judgment process of the FAST corner, and designing a plurality of parallel processing units which parallelize the calculation corner result data; the data write-back unit is used for writing the corner data result back to the function of the external storage device and is used for realizing the function through the vector register; the whole hardware device performs control functions among the modules through a state machine control unit and provides an AXI bus interface. The invention can realize the vectorization calculation of FAST corner detection and has the advantages of high parallelization, no loss of precision, easy hardware instruction, modularization and the like.
Description
Technical Field
The invention relates to the field of computer vision processors, in particular to a vectorization hardware device for FAST corner detection.
Background
The corner detection is an important technology of image processing, and can be applied to computer vision scenes such as object identification, system tracking, navigation positioning and the like. FAST corner detection has linear computational complexity, and is more suitable for high frame rate computation scenes. On the other hand, the FAST algorithm can also be used as a calculation basis for a complex corner detection process, such as an ORB algorithm process. With the increasing requirements of image calculation scenes with high resolution, high frame rate and high real-time performance in recent years, it is more important to accelerate the FAST corner detection calculation process.
The current-stage corner detection algorithm is usually run by general-purpose computer processors, such as a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), and the like. The algorithm process is accelerated by a GPU or a multithreading technique, and a vectorization acceleration technique for a specific corner detection algorithm has not yet appeared.
Therefore, the vectorization hardware device for FAST corner detection proposed herein implements the FAST16 algorithm process through a hardware circuit, improves the computation parallelism through the vectorization technology, and can fully utilize AXI bus bandwidth resources. The device can be used as a hardware IP core to be embedded into an SoC processor chip, and can also be used as an independent hardware system by building a related peripheral circuit. The device is also suitable for a vector processor with an instruction design and has expansibility. The device has the characteristics of high real-time performance, lossless FAST algorithm precision, configurability aiming at different image resolution requirements, high performance and low power consumption.
Disclosure of Invention
The invention provides a vectorization hardware device for FAST corner detection, which is used for solving the problem that no special vectorization hardware device exists in the FAST corner detection and calculation process in the prior art.
Vectorization hardware device for FAST corner detection, characterized in that said device comprises: the image cache register comprises an image cache register file unit, a vector processing unit, a data write-back unit and a state machine control unit, wherein:
the image cache register file unit is used for receiving gray level image data to be input, caching according to the size of a designed register resource and supplying the gray level image data to the vector processing unit for corner detection calculation, wherein the data loaded by the image cache register corresponds to a plurality of window data to be calculated;
the vector processing unit is used for carrying out parallel calculation on a plurality of angular points, carrying out angular point attribute judgment calculation on a pixel point to be judged, calculating a non-maximum value inhibition response value and converting a pixel point coordinate value;
the data write-back unit is used for compressing the corner calculation result to the result register so as to merge and write out the corner calculation result;
and the state machine control unit is used for controlling the task scheduling of the image cache register file unit, the vector processing unit and the data write-back unit.
Preferably, the vectorization hardware device is characterized in that the image cache register file unit is composed of 9 multi-bit vector registers, and is configured to cache pixel point window data required by multiple groups of corner point detection algorithms; the gray image data are distributed to the image buffer register file unit according to continuous groups, and the state machine controls the splicing step length of the gray image data. The image cache register file unit adopts a ping-pong architecture to balance the time overhead of the caching process and the computing process.
Preferably, the vectorization hardware device is characterized in that the multi-bit vector register is used for loading the line pixel data of each frame of image and the coordinate position information of a single boundary pixel point in batch.
Preferably, the vectorization hardware device is characterized in that the vector processing unit obtains a plurality of sets of calculation window data by accessing the image cache register file unit in parallel, and parallelizes the operation corner detection calculation process.
Preferably, the vectorization hardware device is characterized in that the vector processing unit has a corner detection computing unit with the same number as the computing window data sets, and the computing unit shares the whole image cache register file unit and synchronously accesses the corner computing window data.
Preferably, the vectorization hardware device is characterized in that the corner detection and calculation unit implements FAST corner operation logic and non-maximum suppression comparison logic by using a pipeline architecture.
Preferably, the vectorization hardware device is characterized in that the device controls task scheduling by a state machine unit and has an AXI bus read-write interface protocol; the image cache register file unit can refresh the gray image data of the pixel points to be calculated in batches through an AXI bus; the vectorized hardware device may complete processing tasks with other device configurations through the AXI bus protocol.
Preferably, the vectorization hardware device is characterized in that the state machine unit has a state that the image buffer register file receives the gray image data and a calculation state of the vector processing unit; the vectoring hardware device has configuration control ports for start and end signals.
Preferably, the vectorization hardware device is characterized in that the data write-back unit is composed of a multi-bit vector register, and synchronously compresses a plurality of groups of results of the result vector processing unit to the vector register, and the storage information includes a corner attribute determination result, a non-maximum suppression response value calculation result and a pixel coordinate value conversion result; the data write-back unit may write back to the destination device over the AXI bus.
The invention can realize the vectorization calculation of FAST corner detection, the design of the vector register group improves the throughput efficiency of the gray image data in the device, and the invention has the advantages of high parallelization, no loss of precision, easy hardware instruction, modularization and the like.
Drawings
Fig. 1 is a schematic structural diagram of a vectorization hardware apparatus for FAST corner detection according to an embodiment of the present invention.
Fig. 2 is a schematic structural diagram of an image cache register file unit of a vectorization hardware device for FAST corner detection according to an embodiment of the present invention.
Fig. 3 is a schematic structural diagram of a vector register of a vectorization hardware apparatus for FAST corner detection according to an embodiment of the present invention.
Fig. 4 is a schematic diagram of a FAST16 corner detection process of a vectorization hardware apparatus for FAST corner detection according to an embodiment of the present invention.
Fig. 5 is a schematic structural diagram of a vector processing unit of a vectorization hardware apparatus for FAST corner detection according to an embodiment of the present invention.
Fig. 6 is a schematic structural diagram of a state machine control unit of a vectorization hardware device for FAST corner detection according to an embodiment of the present invention.
Fig. 7 is a schematic structural diagram of an application of a vectorization hardware apparatus for FAST corner detection in a processor system according to an embodiment of the present invention.
Detailed Description
The present invention will be described in detail below with reference to the accompanying drawings and specific embodiments.
Fig. 1 is a schematic structural diagram of a vectorization hardware apparatus for FAST corner detection according to an embodiment of the present invention. As shown in fig. 1, the apparatus includes: the device comprises an image cache register file unit, a vector processing unit, a data write-back unit and a state machine control unit. The image buffer register file unit is used for receiving a gray image pixel data block to be processed, dividing the image block according to the line data and line data specification required by the resolution of the video stream image, configuring the bit width of the image buffer register to achieve the buffer purpose, and supplying the bit width to the vector processing unit for carrying out corner detection calculation.
Fig. 2 is a schematic structural diagram of an image cache register file unit of a vectorization hardware device for FAST corner detection according to an embodiment of the present invention. Each gray image pixel point is represented by a square grid, and the intensity value of each pixel point is represented by 8-bit bits. The image buffer register file unit consists of 9 256-bit vector registers, V0, V1, V2, V3, V4, V5, V6, V7, and V8, each vector register having a bit width of 240 bits for loading gray scale image data. In the image cache register file unit shown in fig. 2, pixel points to be calculated are P00 to P78, 8 groups of non-local maximum suppression windows are shared, the size of each pixel point is 3 × 3, and the pixel points are P00 to P08, P10 to P18, P20 to P28, P30 to P38, P40 to P48, P50 to P58, P60 to P68, and P70 to P78, and the pixel points are grouped according to the above and respectively correspond to 8 groups of arithmetic elements of the vector processing unit. Each group of operation components can simultaneously carry out the angular point calculation process of 3 longitudinal pixel points. After the calculation of the pixel points to be calculated of all the image cache register file units is completed, the buffer data of the image cache register file units can be refreshed.
Fig. 3 is a schematic structural diagram of a vector register of a vectorization hardware apparatus for FAST corner detection according to an embodiment of the present invention. The associated Vector register set of the image buffer register file unit is V0-V8, and 16-255 bits of each Vector register are used for loading the gray image data block data to be processed, and is represented by Pixel _ Vector _ n. The 0 to 16 bits of each vector register of the register groups V0 to V8 are used for loading row coordinate and column coordinate information marks of initial pixel points of gray image line data blocks to be processed in the whole image, and the line coordinate and column coordinate information marks are respectively expressed by xn and yn, and the coordinate information marks can be used for converting coordinate values of the pixel points in a written result. The vector register V9 has 256 bits in total, and is used for loading 32-bit results to be written out of each corner detection calculation unit in the vector processing unit, namely PE0_ result-PE 7_ result, wherein 31-23 bits of each 32-bit result represent score, and the score is used for representing corner strength; 22-12 bits represent a row coordinate x _ index, and 11-1 bits represent a column coordinate y _ index; the lowest bit c represents the corner point determination result. In the illustration, PE0_ result-PE 7_ result are loaded in order from high to low.
Fig. 4 is a schematic diagram of a FAST16 corner detection process of a vectorization hardware apparatus for FAST corner detection according to an embodiment of the present invention. FAST16 is a corner detection algorithm for a certain pixel point P to be determined, and as shown in the figure, 1-16 gray boxes represent related pixel points for performing difference calculation with the pixel point P. The hardware device can completely load gray image source data, FAST16 corner detection can be completed without precision loss, the summation of absolute values of pixel value differences can be reserved as score scores of the corner detection, a non-maximum suppression (NMS) process needs to compare the score scores of pixel points to be judged, and at most one corner data is reserved in each NMS window. FAST16 arithmetic logic and NMS comparison logic operate in a pipelined architecture in parallel processing units of a vector processing unit.
Fig. 5 is a schematic structural diagram of a vector processing unit of a vectorization hardware apparatus for FAST corner detection according to an embodiment of the present invention. The vector processing unit VPU synchronously fetches 8 compute window data through a data line parallel access image cache register file unit, and has 8 parallel processing units for FAST16, PE0 PE7 respectively. All compute units share the entire image buffer register file unit (V0-V8) and synchronously access data of any bit interval of the vector register. Each PE unit has FAST16 operation logic and NMS comparison logic of pipeline architecture, and can complete the calculation process of each group of 3 × 3 pixel windows in 4 cycles. After completing the vector processing task, the VPU writes the results to a data write-back unit consisting of V9 vector registers, V9 has 256 bits in total, and can load 8 sets of PE unit results. And synchronously compressing 8 groups of results of the VPU to a V9 vector register, wherein the stored information comprises a corner point attribute judgment result, a non-maximum value inhibition response value calculation result and a pixel point coordinate value conversion result.
Fig. 6 is a schematic structural diagram of a state machine control unit of a vectorization hardware device for FAST corner detection according to an embodiment of the present invention. The vectorization hardware device for corner detection controls the task scheduling of each module by a state machine unit, and configures START and END state instructions of the hardware device through the state machine. IDLE indicates the sleep-standby state of the vectorization hardware device, LOAD indicates the state where the image cache register file receives the grayscale image data, and COMP indicates the operation state of the VPU. Under the LOAD state, a counter 1 can be set to count and count pixel points, and the count and the increment of a bus base address are calculated to obtain a bus access address; and the counter 2 is updated when the COMP state is completed, is used for counting the calculation process of the pixel point processing task and is used as a threshold value for completing the processing task of a single image.
Fig. 7 is a schematic structural diagram of an application of a vectorization hardware apparatus for FAST corner detection in a processor system according to an embodiment of the present invention. The processor may configure the configuration START signal START of the vectoring hardware device and the device may also return a configuration END signal END. The vectorization hardware device has an AXI bus read-write interface protocol; the image cache register file unit supports directly obtaining corresponding gray scale image data in the memory device through an AXI bus burst mode; the data write back unit supports write back to the memory device through an AXI bus burst mode; the vectorized hardware device may complete processing tasks with other device configurations through the AXI bus protocol. The device can expand the bit width of a relevant vector register group of an image cache register file unit, the number of calculation units of a vector processing unit VPU and the number and the bit width of a vector register of a data write-back unit so as to be suitable for any resolution image processing scene and finish the vectorization calculation of the number of corner points in a large batch. The hardware device may be implemented as a coprocessor component of a processor system.
The invention is further illustrated above using specific embodiments. It should be noted that the above-mentioned embodiments are only specific embodiments of the present invention, and should not be construed as limiting the present invention. Any modification, replacement, improvement and the like within the idea of the present invention should be within the protection scope of the present invention.
Claims (9)
1. Vectorization hardware device for FAST corner detection, characterized in that said device comprises: the image cache register comprises an image cache register file unit, a vector processing unit, a data write-back unit and a state machine control unit, wherein:
the image cache register file unit is used for receiving gray level image data to be input, caching according to the size of a designed register resource and supplying the gray level image data to the vector processing unit for corner detection calculation, wherein the data loaded by the image cache register corresponds to a plurality of window data to be calculated;
the vector processing unit is used for carrying out parallel calculation on a plurality of angular points, carrying out angular point attribute judgment calculation on a pixel point to be judged, calculating a non-maximum value inhibition response value and converting a pixel point coordinate value;
the data write-back unit is used for compressing the corner calculation result to the result register so as to merge and write out the corner calculation result;
and the state machine control unit is used for controlling the task scheduling of the image cache register file unit, the vector processing unit and the data write-back unit.
2. The vectorization hardware device according to claim 1, wherein said image buffer register file unit is composed of 9 multi-bit vector registers for buffering pixel window data required by multiple sets of corner point detection algorithms; distributing the gray image data to an image cache register file unit according to continuous groups, and controlling the splicing step length of the gray image data by a state machine; the image cache register file unit adopts a ping-pong architecture to balance the time overhead of the caching process and the computing process.
3. The vectorization hardware device of claim 2 wherein said multi-bit vector register is configured to batch load row pixel data and coordinate location information of a single border pixel for each frame of image.
4. The vectorization hardware device according to claim 1, wherein the vector processing unit obtains multiple sets of computation window data by accessing the image cache register file unit in parallel, and parallelizes the computation process for corner detection.
5. The vectorization hardware device according to claim 4, wherein the vector processing unit has the same number of corner detection calculation units as the calculation window data sets, the calculation units sharing the entire image cache register file unit and synchronously accessing the corner calculation window data.
6. The vectorization hardware device according to claim 5, wherein the corner detection computation unit implements the FAST corner computation logic and the non-maximum suppression comparison logic in a pipelined architecture.
7. Vectoring hardware device according to claim 1, characterized in that the device is task scheduled by a state machine unit control and has an AXI bus read-write interface protocol; the image cache register file unit can refresh the gray image data of the pixel points to be calculated in batches through an AXI bus; the vectorized hardware device may complete processing tasks with other device configurations through the AXI bus protocol.
8. The vectorization hardware device according to claim 7, wherein the state machine unit has an image buffer register file for receiving the state of the gray scale image data and a vector processing unit for calculating the state; the vectoring hardware device has configuration control ports for start and end signals.
9. The vectorization hardware device according to claim 1, wherein said data write-back unit is configured by a multi-bit vector register, and is configured to synchronously compress the results of the result vector processing unit into the vector register, and the storage information includes the corner attribute determination result, the non-maximum suppression response value calculation result, and the pixel coordinate value conversion result; the data write-back unit writes back to the destination device through the AXI bus.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110998588.4A CN113781290B (en) | 2021-08-27 | 2021-08-27 | Vectorization hardware device for FAST corner detection |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110998588.4A CN113781290B (en) | 2021-08-27 | 2021-08-27 | Vectorization hardware device for FAST corner detection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113781290A true CN113781290A (en) | 2021-12-10 |
CN113781290B CN113781290B (en) | 2023-01-31 |
Family
ID=78839896
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110998588.4A Active CN113781290B (en) | 2021-08-27 | 2021-08-27 | Vectorization hardware device for FAST corner detection |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113781290B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS5916081A (en) * | 1982-03-31 | 1984-01-27 | ゼネラル・エレクトリツク・カンパニイ | Method and apparatus for processing visible video and regulating corner points in visible video processing system |
CN1979527A (en) * | 2005-12-09 | 2007-06-13 | 中国科学院沈阳自动化研究所 | Image corner rapid extraction method and implementation device |
CN102930278A (en) * | 2012-10-16 | 2013-02-13 | 天津大学 | Human eye sight estimation method and device |
CN103530224A (en) * | 2013-06-26 | 2014-01-22 | 郑州大学 | Harris corner detecting software system based on GPU |
US20140348431A1 (en) * | 2013-05-23 | 2014-11-27 | Linear Algebra Technologies Limited | Corner detection |
CN105046637A (en) * | 2015-07-31 | 2015-11-11 | 深圳市哈工大交通电子技术有限公司 | OmapL138 chip based optical flow tracking realization method |
CN108681984A (en) * | 2018-07-26 | 2018-10-19 | 珠海市微半导体有限公司 | A kind of accelerating circuit of 3*3 convolution algorithms |
CN112837256A (en) * | 2019-11-04 | 2021-05-25 | 珠海零边界集成电路有限公司 | Circuit system for Harris angular point detection and detection method |
-
2021
- 2021-08-27 CN CN202110998588.4A patent/CN113781290B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS5916081A (en) * | 1982-03-31 | 1984-01-27 | ゼネラル・エレクトリツク・カンパニイ | Method and apparatus for processing visible video and regulating corner points in visible video processing system |
CN1979527A (en) * | 2005-12-09 | 2007-06-13 | 中国科学院沈阳自动化研究所 | Image corner rapid extraction method and implementation device |
CN102930278A (en) * | 2012-10-16 | 2013-02-13 | 天津大学 | Human eye sight estimation method and device |
US20140348431A1 (en) * | 2013-05-23 | 2014-11-27 | Linear Algebra Technologies Limited | Corner detection |
CN103530224A (en) * | 2013-06-26 | 2014-01-22 | 郑州大学 | Harris corner detecting software system based on GPU |
CN105046637A (en) * | 2015-07-31 | 2015-11-11 | 深圳市哈工大交通电子技术有限公司 | OmapL138 chip based optical flow tracking realization method |
CN108681984A (en) * | 2018-07-26 | 2018-10-19 | 珠海市微半导体有限公司 | A kind of accelerating circuit of 3*3 convolution algorithms |
CN112837256A (en) * | 2019-11-04 | 2021-05-25 | 珠海零边界集成电路有限公司 | Circuit system for Harris angular point detection and detection method |
Non-Patent Citations (2)
Title |
---|
李鸿龙;杨杰;张忠星;罗迁;于双铭;刘力源;吴南健: "用于实时目标检测的高速可编程视觉芯片", 《红外与激光工程》 * |
郑杰,刘杰,黄超,陈更生: "一种改进的基于角点检测的并行化电子稳像算法", 《复旦学报(自然科学版)》 * |
Also Published As
Publication number | Publication date |
---|---|
CN113781290B (en) | 2023-01-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11847550B2 (en) | Sparse convolutional neural network accelerator | |
US11270197B2 (en) | Efficient neural network accelerator dataflows | |
US7969446B2 (en) | Method for operating low power programmable processor | |
US7574466B2 (en) | Method for finding global extrema of a set of shorts distributed across an array of parallel processing elements | |
US12045924B2 (en) | Real-time hardware-assisted GPU tuning using machine learning | |
CN110738308B (en) | Neural network accelerator | |
CN111931918B (en) | Neural network accelerator | |
CN111143174A (en) | Optimal operating point estimator for hardware operating under shared power/thermal constraints | |
US11645533B2 (en) | IR drop prediction with maximum convolutional neural network | |
US9798543B2 (en) | Fast mapping table register file allocation algorithm for SIMT processors | |
US20060161720A1 (en) | Image data transmission method and system with DMAC | |
US10916252B2 (en) | Accelerated data transfer for latency reduction and real-time processing | |
CN110569019A (en) | random rounding of values | |
CN114399035A (en) | Method for transferring data, direct memory access device and computer system | |
CN112258378A (en) | Real-time three-dimensional measurement system and method based on GPU acceleration | |
CN106780415B (en) | Histogram statistical circuit and multimedia processing system | |
CN110377874B (en) | Convolution operation method and system | |
US7199799B2 (en) | Interleaving of pixels for low power programmable processor | |
US7389006B2 (en) | Auto software configurable register address space for low power programmable processor | |
CN113781290B (en) | Vectorization hardware device for FAST corner detection | |
US20240168639A1 (en) | Efficient reduce-scatter via near-memory computation | |
US8314803B2 (en) | Buffering deserialized pixel data in a graphics processor unit pipeline | |
CN114330691B (en) | Data handling method for direct memory access device | |
CN115687194A (en) | Memory interface with reduced energy transfer mode | |
US7268786B2 (en) | Reconfigurable pipeline for low power programmable processor |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |