CN117876206B - Pyramid optical flow accelerating circuit based on reconfigurable computing array - Google Patents
Pyramid optical flow accelerating circuit based on reconfigurable computing array Download PDFInfo
- Publication number
- CN117876206B CN117876206B CN202410068385.9A CN202410068385A CN117876206B CN 117876206 B CN117876206 B CN 117876206B CN 202410068385 A CN202410068385 A CN 202410068385A CN 117876206 B CN117876206 B CN 117876206B
- Authority
- CN
- China
- Prior art keywords
- optical flow
- image
- pyramid
- computing
- calculation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000003287 optical effect Effects 0.000 title claims abstract description 268
- 238000004364 calculation method Methods 0.000 claims abstract description 113
- 230000015654 memory Effects 0.000 claims abstract description 20
- 238000010276 construction Methods 0.000 claims abstract description 19
- 230000003993 interaction Effects 0.000 claims abstract description 4
- 239000010410 layer Substances 0.000 claims description 78
- 238000000034 method Methods 0.000 claims description 30
- 230000008569 process Effects 0.000 claims description 16
- 239000011159 matrix material Substances 0.000 claims description 10
- 238000009825 accumulation Methods 0.000 claims description 8
- 230000003044 adaptive effect Effects 0.000 claims description 8
- 230000001133 acceleration Effects 0.000 claims description 7
- 239000002356 single layer Substances 0.000 claims description 7
- 230000001174 ascending effect Effects 0.000 claims description 2
- 238000013500 data storage Methods 0.000 claims description 2
- 238000001514 detection method Methods 0.000 claims description 2
- 230000033001 locomotion Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 238000013461 design Methods 0.000 description 5
- 238000005457 optimization Methods 0.000 description 5
- 238000013507 mapping Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000003491 array Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000036961 partial effect Effects 0.000 description 2
- 230000000007 visual effect Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000002860 competitive effect Effects 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000003384 imaging method Methods 0.000 description 1
- 230000004807 localization Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 238000011002 quantification Methods 0.000 description 1
- 238000011084 recovery Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/20—Processor architectures; Processor configuration, e.g. pipelining
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T1/00—General purpose image data processing
- G06T1/60—Memory management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/20—Analysis of motion
- G06T7/269—Analysis of motion using gradient-based methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20016—Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The invention discloses a pyramid optical flow accelerating circuit based on a reconfigurable computing array, and belongs to the technical field of computer vision. The invention comprises the following steps: the system comprises an optical flow control module, a parameter configuration unit, a main memory, an image pyramid construction module, a self-adaptive initial optical flow generation module and an optical flow calculation core; the optical flow control module is used for controlling the interaction of logic signals and data among the unit modules, and the parameter configuration unit is used for configuring optical flow calculation related parameters; the image pyramid construction module is used for carrying out real-time downsampling on the original image; the main memory is used for storing characteristic points, image pyramid data and the like; the self-adaptive initial optical flow generating module is used for providing an applicable initial value for optical flow iterative operation so as to reduce the iterative times, and the optical flow computing core is used for completing all optical flow computing operations. The invention provides proper initial value for optical flow iteration, the iteration times are reduced considerably, the optical flow calculation speed is improved obviously, and the invention has higher flexibility and frame rate.
Description
Technical Field
The invention belongs to the technical field of computer vision, and particularly relates to a pyramid optical flow accelerating circuit based on a reconfigurable computing array.
Background
Optical flow has wide application in the field of computer vision, such as robot navigation, autopilot, vision synchronous positioning, drawing and the like. Optical flow refers to the instantaneous velocity of motion of an object in a camera below the imaging plane of the camera, and is a quantitative depiction of the motion of a pixel in a video or image sequence. According to the distribution of optical flows in an image, the distribution can be roughly divided into sparse optical flows and dense optical flows. The sparse optical flow can describe the tracking matching condition of the key points, and the dense optical flow describes the optical flow information of each pixel point in the image and can be used for realizing three-dimensional recovery and three-dimensional mapping of the image.
Currently, optical flow algorithms can be divided into two broad categories. One is a classical conventional algorithm, such as the optimization algorithm based on optical flow information, e.g., lucas-Kanade (L & K), horn-Schunck (H & S), etc. Another is a neural network-based algorithm, typically used to perform computation of dense optical flow.
The traditional optical flow algorithm performs point-by-point calculation on a plurality of extracted feature points, and generally assumes that the gray level of the same feature point pixel in the continuous image stream remains unchanged, and searches for a corresponding matching relationship according to the minimum luminosity error between the region pixels. Therefore, iterative optimization algorithms such as gauss newton's method are generally employed to solve this problem. In gauss newton's method, the initial value has a considerable influence on the number of iterations.
For the optical flow algorithm of the neural network, the robustness and learning capability of deep learning are fully utilized, however, huge computational complexity is introduced, and matching and deployment are performed on edge equipment. On the other hand becomes less competitive. Furthermore, neural network-based optical flow algorithms rely more on data sets, lacking generalization capability in some new scenarios. Whereas for conventional optical flow algorithms, most algorithms set the initial optical flow value directly to 0 when optimizing using Gauss Newton's method. Since there may be a large distance between successive images, directly searching for optical flow matching relationships on the original image size easily results in local optimality. And in some specific occasions, such as a scene of visual positioning and mapping (Simultaneous Localization AND MAPPING, SLAM), the movement of the camera is a main reason for generating optical flows of two adjacent frames of images, so that the optical flow movement directions of most of characteristic points have similarity. At this time, if the iteration is still performed with an initial value of 0, redundancy caused by such similarity is wasted, and the calculation amount required for the iteration is increased.
In addition, the image pyramid is often applied to the traditional optical flow algorithm, and after the pyramid is constructed, optical flow calculation is firstly performed on the image layer with the minimum dimension, and an initial value is provided for the optical flow iteration of the next layer by the calculation result. The image pyramid can effectively improve the calculation accuracy, but the calculation amount is greatly increased, and a certain difficulty is caused for parallel acceleration due to strong data dependency among different layers.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a pyramid optical flow accelerating circuit based on a reconfigurable computing array, which performs iterative initialization according to the optical flow of the adjacent characteristic points of an image and adopts a reconfigurable optimal computing array architecture so as to improve the flexibility.
The invention adopts the technical scheme that:
A pyramid optical flow acceleration circuit based on a reconfigurable computing array, the unit modules of the circuit comprising: the system comprises an optical flow control module, a parameter configuration unit, a main memory, an image pyramid construction module, a self-adaptive initial optical flow generation module and an optical flow calculation core;
The optical flow control module is used for controlling the interaction of logic signals and data among the unit modules;
the parameter configuration unit is used for configuring working parameters of the optical flow calculation core, such as iteration times, confidence coefficient thresholds and the like;
The main memory comprises a feature point storage unit, an image pyramid storage unit and an optical flow storage unit; the pyramid image storage unit is used for storing image pyramid data, and the optical flow storage unit is used for storing optical flow calculation related results;
the image pyramid construction module is used for carrying out real-time downsampling on two input adjacent frames of images so as to obtain image pyramid data, and storing the image pyramid data into the image pyramid storage unit;
the self-adaptive initial optical flow generating module is used for providing an applicable initial optical flow for optical flow iterative operation of an optical flow computing core so as to reduce the iterative times;
the optical flow computing core comprises a reconfigurable computing array formed by a plurality of computing engine CE units, wherein the input of each CE unit is a characteristic point stored in a characteristic point storage unit and an initial optical flow output by the self-adaptive initial optical flow generating module, and the optical flow computing core is used for completing single-layer optical flow computing operation of a single characteristic point and storing an optical flow computing related result into the optical flow storage unit.
Further, the adaptive initial optical flow generation module includes: the device comprises an accumulation calculation unit, an optical flow average calculation unit, a characteristic point distance calculation unit, a comparator, a characteristic point updating unit and an initial optical flow output unit;
under the control of an optical flow control module, reading an optical flow result obtained by current round calculation from a designated storage unit of a main memory to an accumulation calculation unit, carrying out accumulation operation on the optical flow result obtained by current round calculation by the accumulation calculation unit, and inputting the operation result into an optical flow average value calculation unit;
The optical flow average value calculation unit is used for calculating the optical flow average value of the current round and inputting the optical flow average value result into the initial optical flow output unit;
The characteristic point distance calculation unit is used for calculating the distance between the characteristic point used by the current wheel calculation and the first characteristic point of the current wheel characteristic point stored in the appointed storage unit in the main storage, and inputting the distance calculation result into the comparator;
the comparator compares the current distance calculation result with the input distance threshold value, and if the current distance calculation result is larger than the input distance threshold value, the first characteristic point of the current wheel characteristic point is updated; otherwise, the initial optical flow output unit is controlled to take the current optical flow average value as a new initial optical flow and output.
Further, the storage mode of the image pyramid storage unit is as follows:
For a plurality of layers of pyramid images of the image pyramid, adopting a local storage mode for pyramid images with the image size larger than a preset image size threshold value, and adopting a full storage mode for pyramid images with the image size smaller than or equal to the preset image size threshold value; the local storage mode refers to: based on the preset image storage size, the image block data which does not exceed the image storage size is read and stored according to the established image data reading sequence, and the image data storage logic adopts a first-in first-out writing mode.
Furthermore, the image pyramid storage unit adopts pyramid images of each layer of a plurality of RAM memories, determines the image block size of each layer of pyramid images based on the number of the configured RAM memories, performs row and column block on each layer of pyramid images, and stores the image block into the corresponding RAM memories according to the image blocks.
Further, the feature point storage unit is used for storing the horizontal and vertical coordinates of the feature points, and the feature point storage unit stores the feature points in the ascending order of the vertical coordinates and sequentially reads the feature points to the optical flow calculation core in the same order as the feature point storage unit is stored.
Further, the indexes of the optical flow calculation related results stored by the optical flow storage unit are in one-to-one correspondence with the characteristic point indexes of the characteristic point storage unit, and if iteration fails in the optical flow obtaining process, the identification position 0 for identifying the iteration state in the iteration state register is used; otherwise, the iteration state is identified at the identification position 1.
Further, the CE unit includes an iteration control module, a linear equation set solving module, a gradient calculating module, a gaussian newton method calculating module, an image 1 indexing module, an image 2 indexing module, a buffer area 1 and a buffer area 2, and the calculating process of the CE unit includes:
After the initial optical flow is input into the iteration control module, calculating the coordinates of the feature points corresponding to the second image in the two adjacent images, and providing the coordinates to the image 2 index module; meanwhile, the image 1 index module stores the acquired pixel values in the field range of the current feature points into the buffer area 1 based on the appointed field range, and the image 2 index module stores the acquired pixel values in the field range of the feature points corresponding to the initial optical flow into the buffer area 2 based on the appointed field range;
the gradient calculation module gradually takes out the values in the buffer area and sequentially calculates gradient operators and error values;
The Gaussian Newton method calculation module calculates a Hessen matrix H and a corresponding standard vector g based on a gradient operator and an error value output by the gradient calculation module, and sends the Hessen matrix H and the standard vector g into the linear equation set solving module;
the linear equation set solving module is used for solving the linear equation set based on the hessian matrix H and the standard vector g to obtain an initial light flow value, and sending the initial light flow value to the control iteration control module;
The iteration control module receives the initial light value obtained by current calculation, detects whether iteration can be ended or not based on a preset iteration ending condition, and if so, outputs the initial light value received currently; otherwise, updating the corresponding feature point coordinates in the second image, and repeating the calculation flow to execute the optical flow calculation of the next round until the iteration can be ended.
Further, the image input control logic of the optical flow control module to the image pyramid construction module includes:
the optical flow control module sends out a characteristic point reading signal and registers the characteristic points extracted by the user to the characteristic point storage unit;
The optical flow control module sends out an image loading signal to trigger the image pyramid construction module to synchronously start working, the image pyramid construction module carries out real-time downsampling on two input adjacent frames of images, image pyramid data are obtained and loaded into the image pyramid storage unit, and when the data amount of the loaded image pyramid data meets the optical flow calculation working of a first feature point, an optical flow calculation core is started;
After the optical flow calculation core is started, the optical flow control module detects the image range required by the next feature point according to the set detection frequency, and if new image data is required, the image loading operation is started; therefore, the image loading and optical flow operation are carried out simultaneously, so that the operation time is saved;
before starting the image loading operation, the optical flow control module checks the feature points used for calculating the optical flow, judges whether the newly loaded image data can cover the image data required by the feature points currently participating in calculation, if so, pauses the image loading and feature point loading operation until the newly loaded image data cannot cover the image data required by the feature points currently participating in calculation; and in the image loading process, the optical flow control module detects whether the loaded image data meets the optical flow calculation of the current new feature points in real time, and if so, the image data loading is stopped.
Further, the optical flow computing core is a computing array architecture comprising a plurality of computing layers, each computing layer comprises a plurality of CE units, the number of CE units included in each computing layer is the same, and the number of computing layers included in the optical flow computing core is consistent with the number of image layers of the image pyramid data constructed by the image pyramid construction module.
Further, for each computing layer in the computing array architecture, the image scale of the pyramid image corresponding to the feature points processed by each computing layer is increased layer by layer according to the sequence from top to bottom, namely, the CE unit of the computing layer at the uppermost layer is used for acquiring the feature point of the smallest scale layer of the image pyramid data, and the CE unit of the computing layer at the lowermost layer is used for acquiring the feature point of the largest scale layer of the image pyramid data; and in the calculation, the optical flow calculation results are transmitted to the corresponding CE units of the next layer in time after the optical flow calculation operation of the CE units of the previous calculation layer is finished, and the final optical flow calculation result is obtained based on the optical flow calculation results of the CE units of the lowest calculation layer.
The technical scheme provided by the invention has at least the following beneficial effects:
(1) The optical flow accelerating circuit has high operation rate, and based on the improved initial point iteration strategy, the optical flow accelerating circuit can effectively reduce iteration times, improve the optical flow tracking effect under a larger movement distance and improve the characteristic point optical flow operation rate.
(2) The method separates the single-point optimization process from the top-level data scheduling, and constructs the pyramid-based optical flow computing array, so that the flexibility and the performance of hardware are obviously improved.
(3) The optical flow accelerating circuit provided by the invention can realize the processing speed of 405 frames of images per second, and can fully meet the real-time application requirements of equipment in the related field.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic diagram of a pyramid optical flow acceleration circuit based on a reconfigurable computing array according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an initial optical flow generation algorithm based on adaptive prediction according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an adaptive initial optical flow generation module according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a pyramid image storage unit according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a feature point storage unit according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of the architecture of a Compute Engine (CE) unit of an embodiment of the present invention;
FIG. 7 is a schematic diagram showing a process of reading image data by the single-layer optical flow calculating unit according to the embodiment of the present invention;
FIG. 8 is a schematic diagram of a configurable computing array architecture in accordance with an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the embodiments of the present invention will be described in further detail with reference to the accompanying drawings.
As shown in fig. 1, a unit module of a pyramid optical flow accelerating circuit based on a reconfigurable computing array according to an embodiment of the present invention includes: the system comprises an optical flow control module (i.e. a main controller), a parameter configuration unit, a main memory, an image pyramid construction module (i.e. an image pyramid generator), an adaptive initial optical flow generation module and an optical flow calculation unit (i.e. an optical flow calculation core); the main controller is used for controlling the interaction of logic signals and data among the unit modules, and the parameter configuration unit is used for configuring parameters related to optical flow calculation, such as parameters of iteration times, confidence coefficient threshold values and the like; the image pyramid construction module is used for carrying out real-time downsampling on the original image; the main memory is used for storing characteristic points, image pyramid data and the like, namely, is used for storing related data participating in optical flow calculation and optical flow calculation related results, namely, the main memory can be subdivided into a characteristic point storage unit, an image pyramid storage unit and an optical flow storage unit; in order to achieve a hardware architecture with higher real-time performance and more quantification, the embodiment of the invention provides an initial optical flow generating algorithm based on self-adaptive prediction and a set of optical flow computing architecture based on a configurable array, wherein a self-adaptive initial optical flow generating module is used for providing an applicable initial value for optical flow iterative operation so as to reduce the iterative times, and an optical flow computing core is used for completing all optical flow computing operations.
As a possible implementation manner, in the embodiment of the present invention, each unit module of the pyramid optical flow acceleration circuit based on the reconfigurable computing array is specifically:
(1) And an adaptive initial optical flow generation module.
In an application scene based on visual positioning and mapping, the motion directions of adjacent feature points are basically the same in theory according to the relation between two images. Accordingly, the embodiment of the present invention uses the optical flow result obtained from the last feature point as the initial position of the optical flow for the next feature point. As shown in fig. 2, assuming that two consecutive images are taken out of a video sequence, a left image is a schematic view of a previous frame image, a right image is a subsequent frame image, D 0、E0 and F 0 in the drawings are feature points that have been already extracted, next, optical flows corresponding to each feature point are obtained from the two images, and corresponding feature points that match the left image are found in the right image.
First, the optical flow direction of the pixel D is obtained by an iterative optimization methodThen, it is used as the predicted initial optical flow direction of the subsequent pixels E and F. In this way, the number of iterations for calculating the optical flow for points E and F is significantly reduced.
However, there may be several cases in the implementation that may cause the optical flows of the two points to differ.
1) The camera performs rotational and translational movements, so that its optical flow usually takes on a swirl-like shape to some extent. Therefore, the optical flow motion direction and the optical flow motion size are different from the global scope. But the optical flow exhibits a more pronounced similarity in terms of the optical flow from the nearer feature points.
2) The extracted positions of the feature points show randomness, for example, the feature points at the corner points are more, and the feature points on the smooth curved surface are less, so that the distances between two adjacent key points are not fixed, and the similarity of the optical flows is different.
3) The photometric errors at the corresponding points can also cause differences in the optical flow tracking results.
Based on the above-described idea, therefore, an embodiment of the present invention designs an adaptive initial optical flow generating module as shown in fig. 3, where the basic idea is to average the optical flows of feature points with distances within a certain threshold range, and use the average value as the initial optical flow of new feature points. As shown in fig. 3, the adaptive initial optical flow generating module includes an accumulating calculating unit, an optical flow average calculating unit, a feature point distance calculating unit, a comparator, a feature point updating unit and an initial optical flow output unit, wherein the accumulating calculating unit is used for accumulating the optical flow obtained by the calculation of the round and inputting the operation result into the optical flow average calculating unit, and the optical flow average calculating unit is used for calculating the optical flow average obtained by the calculation of the round and inputting the average result into the initial optical flow output unit; the characteristic point distance calculation unit is used for calculating the distance between the characteristic point used in the calculation of the current round and the first characteristic point in the register, inputting the distance calculation result into the comparator, comparing the current distance calculation result with the input distance threshold value by the comparator, and if the current distance calculation result is larger than the input distance threshold value, updating the first characteristic point in the register, namely taking the current characteristic point with the distance calculation result larger than the distance threshold value as the first characteristic point in the register; otherwise, the initial optical flow output unit is controlled to output the current received optical flow average value, namely the optical flow average value is used as a new initial optical flow.
(2) And a main memory.
The module is used for storing the image pyramid, the feature point coordinates and the calculated optical flow, namely the main memory comprises a feature point storage unit, an image pyramid storage unit and an optical flow storage unit.
The pyramid image storage unit is used for accessing the image pyramid. Since two adjacent frame images are needed to calculate the optical flow, 2 pyramids with the same layer number are needed to be stored. Taking a 4-layer image pyramid with a size of 640 x 480 as an example, as shown in fig. 4, the lowest layer (layer 1) is the original image, the required storage resources are the most, the uppermost layer (layer 4) is the image with the smallest size, the length and width of the image are 1/8 of the original image, and the required storage resources are the least.
Since only images with a fixed window neighborhood size around the feature points are needed for each iteration in the optical flow calculation process, all images do not need to be cached. Based on this idea, an embodiment of the present invention designs the storage logic as shown in fig. 5. The image sizes of layers 2, 3 and4 are small, so that the whole storage mode is adopted. In the memory cell corresponding to layer1, only 256 rows and 640 columns of images are stored. Because the feature points with smaller coordinates are input first, the operation requirement can be met by only loading images with proper lines. And then, the number of lines of the loaded image can be continuously adjusted according to the coordinate change condition of the feature points so as to ensure that the operation requirement is always met. Furthermore, the pyramid image storage unit employs logic similar to a circular FIFO (first in first out), i.e., when an image of more than 256 lines is written, the writing is automatically resumed from the start position.
In the actual storage process, the embodiment of the invention is carried out in a RAM (random access memory), and in addition, in order to quickly read the image, the embodiment of the invention disperses one image in 8 blocks of RAM for storage. As shown in fig. 5, the image is column-segmented into 8 lines each, and then each line of each segment is stored in the corresponding RAM. At this time, 8 pixel blocks can be accessed at one time, so that the access rate is greatly improved compared with the case of only using 1 block of RAM for storage.
The feature point storage unit is used for storing x-direction and y-direction coordinates of the feature points. In order to simplify the subsequent calculation logic, the default feature points are stored in the order of increasing y-direction coordinates, and are sequentially input into the optical flow calculation unit in the same order.
The optical flow storage unit comprises corresponding optical flow output and iteration condition, and indexes of the optical flow storage unit are in one-to-one correspondence with the characteristic point storage unit. If the iteration fails in the optical flow solving process, optical flow data are not saved, and the corresponding bit (iteration state identification bit) of the iteration state register is set to 0, otherwise, the optical flow data are saved, and the corresponding bit of the iteration state register is set to 1.
(3) And a main controller.
The main controller is responsible for the scheduling of the reading and writing of the image storage module, the data transmission control of the feature points and the like. In order to facilitate construction and flexible configuration of an optical flow computing array, the embodiment of the invention separates control and computing logic to reduce the coupling degree and improve the flexibility.
(4) An optical flow calculation unit.
In the optical flow calculation based on the image pyramid, each feature point performs single-point optical flow calculation at each layer, and the calculation result is used as an initial value of the next layer. After decomposing this process, it can be found that the single-layer optical flow calculation of a single feature point is the minimum calculation logic that can be decomposed, which is called a Calculation Engine (CE) in the embodiment of the present invention.
In the embodiment of the present invention, the structure of the CE unit is shown in fig. 6, which calculates the optical flow of a certain layer of feature points based on the gaussian newton method. The input of the CE unit can be seen as the extracted original feature points and the initial optical flow, and the calculation process of the CE unit comprises the following steps:
1) After the initial optical flow is input into the iteration control module, the coordinates of the feature points corresponding to the second image can be calculated and provided for the image 2 index module. Meanwhile, after the image 1 index module receives the original feature points, the original feature points and the original feature points start to acquire pixel values close to the corresponding points respectively and store the pixel values in the buffer area. That is, the image 1 index module stores the acquired pixel values within the domain range of the current feature point in the buffer 1 based on the specified domain range, and the image 2 index module stores the pixel values within the domain range of the feature point corresponding to the initial optical flow in the buffer 2 based on the specified domain range. In the embodiment of the invention, the buffer areas (buffer area 1 and buffer area 2) can output one pixel value and the pixel values of the upper, lower, left and right 4 neighborhood thereof at one time. Then, the values in the buffer area are gradually fetched, and gradient values and error values are calculated in sequence. The buffer 1 is input to the gradient calculation module corresponding to 5 pixels at this time, and is used for calculating the gradient Ix in the x direction and the gradient Iy in the y direction. In the embodiment of the invention, a simpler forward gradient operator is used.
2) Next, a gradient operator will be received, and the hessian matrix H and the corresponding standard vector g will be calculated according to gauss newton's method. Wherein the hessian matrix H is a2×2 symmetric matrix, and the three components of the hessian matrix H are calculated by the following methods: ix is multiplied by Ix, ix is multiplied by Iy, iy is multiplied by Iy. The standard vector g is a2×1 vector whose component is the result of multiplying the error value by Ix and Iy. In addition, the error is subjected to binary norm to obtain the cost value, and the cost value is used for updating judgment in subsequent iteration. And after the calculation of the values is completed, calculating a solution linear equation set to obtain an initial light value.
3) Finally, the control iteration module receives the initial light value and judges whether iteration can be ended or not. If it can be finished, the initial value is output to the module. Otherwise, updating the corresponding feature point coordinates in the second image, and repeating the calculation process until the iteration can be ended.
Since the architecture is applied to optical flow based on image pyramids, each cell can be parameter configured to flexibly apply to optical flow calculations at different layers.
(5) And a main controller.
The partial control logic is used for controlling the loading of the image and the characteristic points and the on and off of optical flow calculation.
Each single-layer optical flow computing unit needs to acquire the required image block, which involves an image indexing operation. Since a plurality of single-layer optical flow computing cores are adopted for parallel computing, the situation that the image storage modules are read simultaneously can exist, and read-write conflict is caused. The embodiment of the invention applies a polling arbitration mechanism to solve the problem, so as to ensure that image blocks are acquired by all optical flow cores with the same priority.
As shown in fig. 7, each single-layer optical flow computing unit may respectively send out an image reading request signal, where the request signal is sent to a priority selecting circuit, which selects one CE unit according to a polling arbitration mode, and selects a corresponding address index to send to the storage unit. After the storage unit takes out the image block data, the corresponding CE unit receives the data, and the rest CE units do not receive the data.
In addition, since the whole image is not required to be used in one optical flow calculation, the embodiment of the invention designs a set of logic for controlling the input of the image, so that the optical flow is calculated after partial images are input, and the storage space is saved, and the control logic is as follows:
First, the main controller sends out a characteristic point read-in signal, and registers the characteristic point in the characteristic point storage unit. Then sending out image loading signals, at the moment, the pyramid construction module synchronously starts working, and loading the image data into the image pyramid storage unit after processing the image data. When a sufficient number of rows of images are loaded to satisfy the first feature point operation, the optical flow operation is started.
After the start operation, the main controller continuously detects the image range required by the next feature point, and if a new image is required, the loading operation is started. The image loading and the optical flow operation are performed simultaneously, so that the operation time is saved.
Since the image pyramid storage unit only holds a part of the images, before loading starts, the feature points for calculating the optical flow are checked first to determine whether the newly read image will cover the original image, and if so, the image loading and feature point loading operations will be suspended until the above conditions are met.
When the input image just satisfies the optical flow calculation of the new feature point, the loading is stopped.
In the optical flow operation, the embodiment of the invention also designs a configurable computing array architecture which performs optical flow calculation based on the image pyramid. In the architecture, a computing engine array is constructed to perform optical flow computation, and by flexible scheduling of Computing Engines (CEs), higher parallelism and computation rate can be achieved compared to the rest of the optical flow acceleration circuitry. Because the pyramid-based optical flow is calculated layer by layer in series, the next layer can be processed after the upper layer is finished, which brings strong data dependence and brings difficulty to real-time processing. To address this problem, embodiments of the present invention design a configurable compute array architecture as shown in FIG. 8 to improve performance, the configurable compute array architecture comprising four layers, each layer comprising a number of Compute Engines (CEs), forming CE arrays, which may be dynamically grouped according to the number of layers of the pyramid based on the configurability and flexibility of the CE arrays. According to the calculation accuracy requirements in different occasions, the number of pyramid layers can be selected appropriately, and a 4-layer pyramid is taken as an example. The number of parallel CEs in each layer is still configurable, and the embodiment of the invention configures 3 CEs for each layer, so that the whole circuit has 12 CEs.
In order to further improve the calculation efficiency of each layer, the calculation between layers is pipelined, and the CE at the uppermost layer acquires the feature points of the smallest scale layer (i.e. the x and y direction coordinates are 1/8 of the original coordinates), and starts the optical flow calculation at the uppermost layer. After the calculation is finished, if the CE of the next layer is not finished, the CE of the next layer waits until the CE of the next layer is finished, and the optical flow and the corresponding characteristic point are amplified by one time and then transmitted to the characteristic point of the corresponding position of the next layer. At this time, it can be seen from the figure that after each CE is calculated, the result is immediately transmitted to the CE of the next layer, and the result of the CE of the previous layer is received. Therefore, the optical flow of the first layer corresponding to the CE is the final optical flow.
Compared with the prior art, the pyramid optical flow accelerating circuit based on the reconfigurable computing array has fewer iteration times, and the optical flow computing speed is remarkably improved by providing a proper initial value for optical flow iteration, wherein the iteration times are considerably reduced; the method has higher flexibility, and the flexibility and performance of the optical flow computing array based on the pyramid are obviously improved through the separation of the single-point optimization process and the data scheduling logic and the construction of the optical flow computing array based on the pyramid; the method and the device have higher frame rate, can realize the processing speed of 405 frames of images per second, have higher instantaneity and can fully meet the application requirements of the related fields.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.
What has been described above is merely some embodiments of the present invention. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit of the invention.
Claims (9)
1. Pyramid optical flow accelerating circuit based on reconfigurable computing array, characterized in that the unit module of the circuit comprises: the system comprises an optical flow control module, a parameter configuration unit, a main memory, an image pyramid construction module, a self-adaptive initial optical flow generation module and an optical flow calculation core;
The optical flow control module is used for controlling the interaction of logic signals and data among the unit modules;
the parameter configuration unit is used for configuring working parameters of the optical flow computing core;
The main memory comprises a feature point storage unit, an image pyramid storage unit and an optical flow storage unit; the pyramid image storage unit is used for storing image pyramid data, and the optical flow storage unit is used for storing optical flow calculation related results;
the image pyramid construction module is used for carrying out real-time downsampling on two input adjacent frames of images so as to obtain image pyramid data, and storing the image pyramid data into the image pyramid storage unit;
the self-adaptive initial optical flow generation module is used for providing an applicable initial optical flow for optical flow iterative operation of an optical flow calculation core;
The optical flow computing core comprises a reconfigurable computing array formed by a plurality of computing engine CE units, wherein the input of each CE unit is a characteristic point stored in a characteristic point storage unit and an initial optical flow output by a self-adaptive initial optical flow generating module, and the optical flow computing core is used for completing single-layer optical flow computing operation of a single characteristic point and storing an optical flow computing related result into the optical flow storage unit;
Wherein the adaptive initial optical flow generation module comprises: the device comprises an accumulation calculation unit, an optical flow average calculation unit, a characteristic point distance calculation unit, a comparator, a characteristic point updating unit and an initial optical flow output unit;
under the control of an optical flow control module, reading an optical flow result obtained by current round calculation from a designated storage unit of a main memory to an accumulation calculation unit, carrying out accumulation operation on the optical flow result obtained by current round calculation by the accumulation calculation unit, and inputting the operation result into an optical flow average value calculation unit;
The optical flow average value calculation unit is used for calculating the optical flow average value of the current round and inputting the optical flow average value result into the initial optical flow output unit;
The characteristic point distance calculation unit is used for calculating the distance between the characteristic point used by the current wheel calculation and the first characteristic point of the current wheel characteristic point stored in the appointed storage unit in the main storage, and inputting the distance calculation result into the comparator;
the comparator compares the current distance calculation result with the input distance threshold value, and if the current distance calculation result is larger than the input distance threshold value, the first characteristic point of the current wheel characteristic point is updated; otherwise, the initial optical flow output unit is controlled to take the current optical flow average value as a new initial optical flow and output.
2. The pyramid optical flow accelerating circuit based on the reconfigurable computing array of claim 1, wherein the image pyramid storage unit stores the following modes:
For a plurality of layers of pyramid images of the image pyramid, adopting a local storage mode for pyramid images with the image size larger than a preset image size threshold value, and adopting a full storage mode for pyramid images with the image size smaller than or equal to the preset image size threshold value; the local storage mode refers to: based on the preset image storage size, the image block data which does not exceed the image storage size is read and stored according to the established image data reading sequence, and the image data storage logic adopts a first-in first-out writing mode.
3. The pyramid optical flow accelerating circuit based on the reconfigurable computing array according to claim 1, wherein the image pyramid storage unit adopts pyramid images of each layer of a plurality of RAM memories, determines the image block size of each layer of pyramid images based on the number of the configured RAM memories, performs row and column block on each layer of pyramid images, and stores the image block into the corresponding RAM memories according to the image blocks.
4. The pyramid optical flow accelerating circuit based on the reconfigurable computing array according to claim 1, wherein the feature point storage unit is used for storing the horizontal and vertical coordinates of the feature points, and the feature point storage unit stores the feature points in the ascending order of the vertical coordinates and sequentially reads the feature points to the optical flow computing core in the same order as the feature point storage unit is stored.
5. The pyramid optical flow accelerating circuit based on the reconfigurable computing array according to claim 1, wherein the index of the optical flow computing related result stored in the optical flow storage unit corresponds to the characteristic point index of the characteristic point storage unit one by one, and if the iteration fails in the optical flow obtaining process, the identification position 0 for identifying the iteration state in the iteration state register is used for identifying the iteration state; otherwise, the iteration state is identified at the identification position 1.
6. The pyramid optical flow accelerating circuit based on the reconfigurable computing array of claim 1, wherein the CE unit comprises an iteration control module, a linear equation set solving module, a gradient computing module, a gaussian newton method computing module, an image 1 indexing module, an image 2 indexing module, a buffer 1 and a buffer 2, and the computing process of the CE unit comprises:
After the initial optical flow is input into the iteration control module, calculating the coordinates of the feature points corresponding to the second image in the two adjacent images, and providing the coordinates to the image 2 index module; meanwhile, the image 1 index module stores the acquired pixel values in the field range of the current feature points into the buffer area 1 based on the appointed field range, and the image 2 index module stores the acquired pixel values in the field range of the feature points corresponding to the initial optical flow into the buffer area 2 based on the appointed field range;
the gradient calculation module gradually takes out the values in the buffer area and sequentially calculates gradient operators and error values;
The Gaussian Newton method calculation module calculates a Hessen matrix H and a corresponding standard vector g based on a gradient operator and an error value output by the gradient calculation module, and sends the Hessen matrix H and the standard vector g into the linear equation set solving module;
the linear equation set solving module is used for solving the linear equation set based on the hessian matrix H and the standard vector g to obtain an initial light flow value, and sending the initial light flow value to the control iteration control module;
The iteration control module receives the initial light value obtained by current calculation, detects whether iteration is finished or not based on a preset iteration finishing condition, and if yes, outputs the initial light value received currently; otherwise, updating the corresponding feature point coordinates in the second image, and repeating the calculation process to execute the optical flow calculation of the next round until the iteration is ended.
7. The reconfigurable computing array-based pyramid optical flow acceleration circuit of claim 1, wherein the optical flow control module inputs control logic to the image pyramid construction module comprises:
the optical flow control module sends out a characteristic point reading signal and registers the characteristic points extracted by the user to the characteristic point storage unit;
The optical flow control module sends out an image loading signal to trigger the image pyramid construction module to synchronously start working, the image pyramid construction module carries out real-time downsampling on two input adjacent frames of images, image pyramid data are obtained and loaded into the image pyramid storage unit, and when the data amount of the loaded image pyramid data meets the optical flow calculation working of a first feature point, an optical flow calculation core is started;
After the optical flow calculation core is started, the optical flow control module detects the image range required by the next feature point according to the set detection frequency, and if new image data is required, the image loading operation is started; therefore, the image loading and optical flow operation are carried out simultaneously, so that the operation time is saved;
before starting the image loading operation, the optical flow control module checks the feature points used for calculating the optical flow, judges whether the newly loaded image data can cover the image data required by the feature points currently participating in calculation, if so, pauses the image loading and feature point loading operation until the newly loaded image data cannot cover the image data required by the feature points currently participating in calculation; and in the image loading process, the optical flow control module detects whether the loaded image data meets the optical flow calculation of the current new feature points in real time, and if so, the image data loading is stopped.
8. The reconfigurable computing array-based pyramid optical flow acceleration circuit of claim 1, wherein the optical flow computing core is a computing array architecture comprising a plurality of computing layers, each computing layer comprises a plurality of CE units, the number of CE units included in each computing layer is the same, and the number of computing layers included in the optical flow computing core is consistent with the number of image layers of the image pyramid data constructed by the image pyramid construction module.
9. The pyramid optical flow accelerating circuit based on the reconfigurable computing array according to claim 8, wherein for each computing layer in the computing array architecture, the image scale of pyramid images corresponding to feature points processed by each computing layer is increased layer by layer in the sequence from top to bottom, and the computing is performed layer by layer from top to bottom, after the optical flow computing operation of the CE units of the previous computing layer is finished, optical flow computing results are transmitted to the corresponding CE units of the next computing layer in time, and final optical flow computing results are obtained based on the optical flow computing results of the CE units of the computing layer at the bottom layer.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410068385.9A CN117876206B (en) | 2024-01-17 | 2024-01-17 | Pyramid optical flow accelerating circuit based on reconfigurable computing array |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410068385.9A CN117876206B (en) | 2024-01-17 | 2024-01-17 | Pyramid optical flow accelerating circuit based on reconfigurable computing array |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117876206A CN117876206A (en) | 2024-04-12 |
CN117876206B true CN117876206B (en) | 2024-07-23 |
Family
ID=90588123
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410068385.9A Active CN117876206B (en) | 2024-01-17 | 2024-01-17 | Pyramid optical flow accelerating circuit based on reconfigurable computing array |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117876206B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2019204193A (en) * | 2018-05-22 | 2019-11-28 | キヤノン株式会社 | Image processing device, image processing method, and program |
CN114612513A (en) * | 2022-03-10 | 2022-06-10 | 西安交通大学 | Image pyramid optical flow value calculation method and system based on FPGA |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3785225B1 (en) * | 2018-04-24 | 2023-09-13 | Snap Inc. | Efficient parallel optical flow algorithm and gpu implementation |
CN117237417A (en) * | 2023-11-13 | 2023-12-15 | 南京耀宇视芯科技有限公司 | System for realizing optical flow tracking based on image and imu data hardware |
-
2024
- 2024-01-17 CN CN202410068385.9A patent/CN117876206B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2019204193A (en) * | 2018-05-22 | 2019-11-28 | キヤノン株式会社 | Image processing device, image processing method, and program |
CN114612513A (en) * | 2022-03-10 | 2022-06-10 | 西安交通大学 | Image pyramid optical flow value calculation method and system based on FPGA |
Also Published As
Publication number | Publication date |
---|---|
CN117876206A (en) | 2024-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11430134B2 (en) | Hardware-based optical flow acceleration | |
WO2019170164A1 (en) | Depth camera-based three-dimensional reconstruction method and apparatus, device, and storage medium | |
EP1975879B1 (en) | Computer implemented method for tracking object in sequence of frames of video | |
EP3451241A1 (en) | Device and method for performing training of convolutional neural network | |
US11182908B2 (en) | Dense optical flow processing in a computer vision system | |
CN109859178B (en) | FPGA-based infrared remote sensing image real-time target detection method | |
CN113065645B (en) | Twin attention network, image processing method and device | |
US11682212B2 (en) | Hierarchical data organization for dense optical flow processing in a computer vision system | |
CN113538527B (en) | Efficient lightweight optical flow estimation method, storage medium and device | |
CN116468995A (en) | Sonar image classification method combining SLIC super-pixel and graph annotation meaning network | |
CN116912804A (en) | Efficient anchor-frame-free 3-D target detection and tracking method and model | |
CN116310997A (en) | Deep learning-based marine small target detection method | |
CN112799599A (en) | Data storage method, computing core, chip and electronic equipment | |
CN117876206B (en) | Pyramid optical flow accelerating circuit based on reconfigurable computing array | |
CN117710309A (en) | Target detection method for surface defects of aluminum material | |
CN117115447A (en) | Forward-looking sonar image segmentation method and device based on meta-shift learning | |
JP2023519725A (en) | Image processing system and method | |
CN112541972B (en) | Viewpoint image processing method and related equipment | |
Zheng et al. | TiPU: A Spatial-Locality-Aware Near-Memory Tile Processing Unit for 3D Point Cloud Neural Network | |
CN106952215B (en) | Image pyramid feature extraction circuit, device and method | |
Zhang et al. | Bucket-FEM: A Bucket-based Architecture of Real-time ORB Feature Extraction and Matching for Embedded SLAM Applications | |
Schmidt et al. | An optimized FPGA implementation for a parallel path planning algorithm based on marching pixels | |
CN113344765A (en) | Frequency domain astronomical image target detection method and system | |
CN118521929B (en) | Unmanned aerial vehicle aerial photography small target detection method based on improved RT-DETR network | |
CN116295356B (en) | Monocular detection and ranging method, electronic equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |