CN115170635A

CN115170635A - High-speed and high-energy-efficiency binocular vision hardware accelerator and method based on image block level

Info

Publication number: CN115170635A
Application number: CN202210628902.4A
Authority: CN
Inventors: 王弘昱; 娄鑫
Original assignee: ShanghaiTech University
Current assignee: ShanghaiTech University
Priority date: 2022-06-06
Filing date: 2022-06-06
Publication date: 2022-10-11

Abstract

The invention provides a high-speed and high-energy-efficiency binocular vision hardware accelerator and method based on an image block level. The invention provides a high-performance and energy-saving binocular distance measuring processor design method based on a patch Match. A concept verification binocular ranging processor based on an FPGA is developed, the peak performance of 1920x 1080@165.7FPS is achieved under 128 parallax levels, and the power consumption is 3.35W. The energy and resource efficiency of the proposed design is superior to the state-of-the-art FPGA-based stereo matching processors. When the disparity level is increased to 256, the hardware resource increment of the proposed design is much lower than the existing WTA-based design. In addition, unlike existing dedicated stereo matching processors that only output disparity information, the proposed design of the present invention also enables derivation of plane tilt, which may be beneficial for subsequent tasks such as 3D reconstruction. The invention can reduce the energy consumption and the area of the detection system on the basis of not influencing the detection precision.

Description

High-speed and high-energy-efficiency binocular vision hardware accelerator and method based on image block level

Technical Field

The invention relates to a binocular vision distance measurement method and a hardware accelerator, in particular to a high-speed and high-energy-efficiency binocular vision hardware accelerator and a method, and belongs to the technical field of binocular vision and artificial intelligence.

Background

With the development of the internet of things and the 5G industry, machine vision puts forward more and more rigorous requirements on the performance and efficiency of a binocular ranging system. Such as Automatic Driving Assistance Systems (ADAS), require high resolution and frame rate on the one hand and cannot accept power consumption of hundreds of watts on the other hand. A single von neumann architecture based general purpose processor can no longer push the performance and performance of the system up at the same time, as binocular ranging is a data intensive and computationally intensive task. Dedicated detection circuits can support massively parallel computing, and thus can challenge both performance and performance limits.

In general, conventional binocular ranging algorithms can be classified into three categories: local methods (refer to X.Mei, et al, "" On building a stereoscopic recording system On graphics hardware "" in 2011IEEE International Conference On Computer Vision works (ICCV works), barcelona,2011pp.467-474. "), global methods (refer to Jian Sun, nan-Ning Zheng and Heung-Yung Shum" "Stereo recording using performance," in IEEE Transactions On Pattern Analysis and Machine Analysis, vol.25, no.7, pp.787-800, J.2003. ") and semi-global methods (refer to H.Hich" "Stereo Processing by recording System Analysis and Machine Analysis, vol.341, IEEE transaction, J.328, and Bio-field, fech.30. Analysis, and Fech.2. In FIGS. Local methods that only consider internal window information are generally computationally simple. However, despite some techniques, such as variable support area and combined cost measurement, the local approach suffers from limited support area, making it difficult to deal with key issues like non-textured areas and repetitive patterns. Global methods, on the other hand, represent the stereo matching problem as a Markov Random Field (MRF), which can be translated into a global energy function minimization problem. The energy function typically consists of a univariate cost of assessing similarity and a smooth cost of penalizing disparity discontinuities. Generally, global methods generate disparity maps with higher accuracy than local methods, but at the cost of more computations. Thus, even with powerful computing devices, the global approach has difficulty achieving real-time performance. In order to alleviate the problem of large calculation amount of the 2D DMRF in the global method, a semi-global matching (SGM) is proposed, and the 2D DMRF is simplified into a plurality of 1D scan lines, which can be solved by dynamic planning. SGM strikes a better balance between accuracy and computation. However, due to the data flow of the backward scan, it is still challenging to implement a complete SGM algorithm in real time without further optimization (refer to Z.Li et al, "A1920x 1080 30-frames/s 2.3TOPS/W Stereo-Depth Processor for Energy-Efficient Autonomous Navigation of Micro Electrical Vehicles," in IEEE Journal of Solid-State Circuits, vol.53, no.1, pp.76-90, jan.2018.). Furthermore, from a hardware perspective, the SGM has a problem of forward and backward separation processing, which reduces efficiency of a corresponding hardware design. While neural network based methods (see J. -R.Chang and Y. -S.Chen, "polyimide stereo matching network," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)), have higher accuracy at the expense of much higher computational complexity than conventional algorithms, and the potential risk of overfitting.

Disclosure of Invention

The technical problem to be solved by the invention is as follows: the existing binocular distance measurement method is difficult to meet the high-speed and high-energy-efficiency requirements of a mobile platform; there is a great deal of computational redundancy in existing computational processes.

In order to solve the technical problem, one technical solution of the present invention is to provide a method for a high-speed and high-energy-efficiency binocular vision hardware accelerator based on an image block level, which is characterized in that a calculation process from a fine granularity to a coarse granularity and then from the coarse granularity to the fine granularity is adopted, and the method specifically includes the following steps:

step 1, performing feature extraction on image data obtained by a left camera and a right camera in a binocular ranging system, wherein the feature extraction is performed by adopting the following method:

step 101, acquiring a left image and a right image which are obtained by shooting by a left camera and a right camera in a binocular ranging system, and acquiring a gray value of each pixel point in the left image and the right image;

102, comparing the gray values of central pixels in the windows of the left image and the right image with the gray values of a plurality of pixel points at other selected positions in the windows, and coding the comparison result into 0 or 1 to form a binary characteristic vector;

step 2, initializing, and giving an initial random label to the binary characteristic vector obtained in the step 1;

step 3, continuously expanding the size of the block until the block size is maximum, estimating the parallax inclination amount, and forming an initialized block-level label by the parallax inclination amount and the parallax to obtain a parallax block-level label with inclination;

step 4, correcting the estimation of the error label in initialization through multi-scale propagation, then optimizing the label, and continuously reducing the size of the block in the process to obtain the optimized block-level label;

and 5, obtaining a parallax and tilt label result of the final pixel level through pixel-by-pixel estimation, wherein the pixel-by-pixel estimation algorithm further comprises the following steps of:

step 501, increasing the size of the block reduced in step 4 by fifty percent to make each pixel covered by four blocks, wherein each pixel corresponds to the block-level label result obtained by the multi-scale propagation optimization in step 4 of the four blocks, and the four block-level label results are four candidate labels;

step 502, comparing the four candidate labels through the aggregation cost, thereby determining the label result of the last output pixel level.

Another technical solution of the present invention is to provide an image block-level-based binocular vision hardware accelerator with high speed and energy efficiency, which employs the above method, and is characterized by including a pixel-level disparity generation module FPDG, a block tag initialization module BLI, a multi-scale propagation module MSP, and a pixel-by-pixel estimation module ppeengine, where the feature-level disparity generation module FPDG and the block tag initialization module BLI are used to implement the

steps

1, 2, and 3, the multi-scale propagation module MSP is used to implement the step 4, and the pixel-by-pixel estimation module ppeengine is used to implement the step 5.

Preferably, the image data obtained by the left and right cameras in the binocular ranging system is input to the feature and pixel level disparity generating module FPDG,outputting a binary feature vector and a pixel-level initial disparity label by a feature and pixel-level disparity generation module FPDG; in the feature and pixel level disparity generation module FPDG, the RGB pixel values p of the left image _L And RGB pixel value p of right image _R Respectively inputting the gray values to the 6P \uCT units, outputting the corresponding gray values to the 6P \uCT units by the RGB2G units, and obtaining the binary characteristic vector f of the left image by the two 6P \uCT units _L And a right image binary feature vector f _R Outputting the pixel parallax initialization circuit PDI and the block label initialization module BLI; the Random gen unit generates four integer Random numbers with values between 0 and the maximum parallax level as the initial Random pixel level parallax label R _d1 To R _d4 Inputted to a pixel parallax initializing circuit PDI based on an initial random pixel level parallax label R _d1 To R _d4 Initialized pixel level parallax d obtained by using pixel level cost calculation unit _{p_i} And output to the block tag initialization module BLI.

Preferably, in the pixel disparity initializing circuit PDI, the pixel level cost calculating unit pairs 4 random initial random pixel level disparity labels R _d1 To R _d4 The cost value is calculated, the minimum cost corresponding to the random pixel level disparity is selected as the winner label, which is used as the initialized pixel level disparity d _{p_i} Is output from the cache to the block tag initialization module BLI.

Preferably, the left image binary feature vector f _L And a right image binary feature vector f _R After the block label initialization module BLI is input, the block label initialization module BLI utilizes the feature vector in the block label initialization module BLI to cache feature buffer for output and then outputs the binary feature vector f of the left image _L Right image binary feature vector f _R Outputting to a multi-scale propagation module MSP; and, the block tag initialization module BLI is based on the input left image binary feature vector f _L Right image binary feature vector f _R And initializing pixel level disparity d _{p_i} Obtaining block-level tilted parallax label S _{b_ini} And outputs to the multi-scale propagation module MSP; in-block label initialization moduleIn block BLI, left image binary feature vector f _L And a right image binary feature vector f _R Is input to the feature vector cache feature buffer; the block size is amplified step by step until the block size is maximum; reading the binary feature vector f of the left image from the feature vector buffer by the corresponding block level disparity calculation module BDI of each level of size _L And a right image binary feature vector f _R And obtaining the initialized block-level disparity label obtained by the block-level disparity calculating module BDI corresponding to the block with the previous size, thereby calculating and obtaining the initialized block-level disparity label corresponding to the block with the current size. Initializing pixel level disparity d _{p_i} Inputting a block-level parallax calculation module BDI corresponding to a block with the minimum size, wherein the output of the block-level parallax calculation module BDI corresponding to the block with the maximum size is a block-level parallax label which completes initialization of the maximum size; the oblique label initialization module SI reads the left image binary feature vector f from the feature vector buffer _L And a right image binary feature vector f _R And obtaining a block-level parallax label completing the initialization of the maximum size, and calculating to obtain an initialized block-level parallax label S with inclination _{b_ini} 。

Preferably, in the block-level disparity calculation module BDI, the left image binary feature vector f is combined with the left image binary feature vector f _L And a right image binary feature vector f _R And the corresponding candidate disparity d in the tag cache ₁ To d ₄ Inputting the disparity information into four parallel pixel-level cost calculation units, and obtaining the disparity d candidate by the pixel-level cost calculation units ₁ To d ₄ Corresponding four pixel cost values C _p1 To C _p4 Four pixel cost value C _p1 To C _p4 Then respectively inputting four parallel block-level cost calculation units to calculate and candidate parallax d ₁ To d ₄ Corresponding four block cost values C _b1 To C _b4 (ii) a Then the block-level disparity computation module BDI bases on the four block cost values C _b1 To C _b4 Passing the index value I corresponding to the minimum value of the block cost _{c_min} Determining a corresponding winner tag D from tag cache selection _{b_o} Outputting to the next stage; block pair of maximum sizeOutput D of the corresponding block-level disparity computation module BDI _{b_o} I.e. the block level disparity label that completes the initialization of the maximum size.

Preferably, the block level disparity tag D will complete the maximum size initialization _{b_i} Left image binary feature vector f _L And a right image binary feature vector f _R Inputting a tilt tag initialization module SI, wherein the tilt tag initialization module SI obtains five pixel cost values through five parallel pixel level cost calculation units by five preset parallax tilt values (0, x +, x-, y +, y-), the five pixel cost values are respectively input into the five parallel block level cost calculation units to calculate five block cost values corresponding to the parallax tilt values (0, x +, x-, y +, y-), then fitting is respectively carried out on the horizontal and vertical directions through a quadratic fitting unit, the tilt value corresponding to the minimum position of the fitted block cost values is the initialized parallax tilt value, and the horizontal and vertical tilt values and the corresponding parallax values are the output parallax tags S with tilt _{b_ini} 。

Preferably, in the multi-scale propagation module MSP, the left image binary feature vector f _L And a right image binary feature vector f _R Inputting the feature vector into a feature vector buffer; left image binary feature vector f in feature vector cache feature buffer _L And a right image binary feature vector f _R And block-level tags S obtained by a multi-scale propagation module MSP _MSP Outputting to a pixel-by-pixel estimation module PPE engine; the multi-scale transmission module MSP comprises a plurality of block transmission modules BP corresponding to different image block sizes, the image block sizes are gradually and continuously reduced, and finally the optimized block-level label S is obtained _MSP (ii) a Except that the block propagation module BP corresponding to the minimum-size image block is iterated once, the block propagation modules BP corresponding to the other-size image blocks are iterated for multiple times. Each level of size image block corresponds to a plurality of cascaded block transmission modules BP, so that a plurality of iterations of the block transmission module BP corresponding to the current level of size image block are completed; for the block propagation module BP corresponding to the current first-stage size image block, the input of the current iteration is the output of the last iteration and the left image II acquired from the feature vector cache feature bufferBinary feature vector f _L And a right image binary feature vector f _R . After the block propagation module BP corresponding to the image block with the previous stage size completes all iterations, inputting the obtained output into the block propagation module BP corresponding to the image block with the next stage size, and starting to perform first iteration; after all block propagation modules BP are cascaded, the optimization of block-level labels and the contraction of the sizes of the corresponding image blocks are completed; the input of the block propagation module BP corresponding to the maximum size image block in the first iteration is the parallax label S with inclination output by the block label initialization module BLI _{b_ini} ；

The input of each block propagation module BP is a disparity label with inclination at the upper stage and a left image binary characteristic vector f _L Right image binary feature vector f _R (ii) a Temporarily storing the first-level parallax label with inclination in a label buffer memory; set of labels S within corresponding neighborhood _C And coordinate vector (k) _x ,k _y And 1) taking the dot product as the pixel level parallax value d corresponding to the candidate label ₁ To d ₉ The part input to the cost calculation is used as a candidate label; the cost calculation part comprises nine parallel pixel-level cost calculation units, nine parallel block-level cost calculation units and a smooth cost calculation unit which calculate corresponding cost values; corresponding to the position I with the minimum total cost value of the block level _{c_min} As winner tag S _{b_o} And outputting the data to a next-stage block propagation module BP.

Preferably, the final result, i.e. the tilted disparity label S at pixel level, is output by the pixel-by-pixel estimation module ppeengine _final Which performs a conversion of the block-level tag into a pixel-level tag, wherein the block-level tag S is obtained by the multi-scale propagation module MSP _MSP Left image binary feature vector f _L And a right image binary feature vector f _R Inputting a pixel-by-pixel estimation module PPE (context), wherein a cost calculation unit adopted by the pixel-by-pixel estimation module PPE (context) is an aggregation cost calculation unit corresponding to a minimum aggregation cost position I _{c_min} Is output as a final output S _final ；

Candidate disparity d _c And left image binary feature vectorf _L And the right image binary feature vector f _R Inputting an aggregation cost calculation unit, wherein the binary feature vector f of the right image _R Inputting the candidate parallax into a right window array, selecting a window at a corresponding position from the right window array by a multiplexer according to the input candidate parallax, and then adding a left image binary characteristic vector f serving as a left window characteristic vector _L Performing exclusive or and accumulating the cost, wherein the accumulated result is the output aggregate cost value C _a 。

Preferably, in the pixel-level cost calculation unit, the right image binary feature vector f is divided into two or more sub-pixel regions _R Caching in the right feature array, and selecting a corresponding right image binary feature vector f in the right feature array through a multiplexer according to the input candidate parallax _R And the left image binary feature vector f _L Calculating corresponding pixel cost values Cp by performing XOR;

the block-level cost calculating unit is used for finishing accumulation of pixel-level costs in the block; the summation is divided into horizontal and vertical pixel cost values C input first _p Entering a transverse accumulation column, and transversely adding the latest input cost value and subtracting the oldest cost value to complete the transverse addition; when the transverse ending signal is effective, storing the transverse addition result into a line buffer, and then reading out the transverse addition result to a longitudinal accumulation column; when the block end signal is effective, longitudinal accumulation is carried out, and the accumulated result is the output block cost value C _b 。

The invention provides a high-performance and energy-saving binocular distance measuring processor design method based on a patch Match. An FPGA-based concept verification binocular ranging processor is developed, peak performance of 1920x 1080@165.7FPS is achieved under 128 parallax levels, and power consumption is 3.35W. The proposed design is superior in energy and resource efficiency to the state-of-the-art FPGA-based stereo matching processors. When the disparity level is increased to 256, the hardware resource increment of the proposed design is much lower than that of the existing WTA-based design. In addition, unlike existing dedicated stereo matching processors that only output disparity information, the proposed design of the present invention can also derive plane tilt, which may be beneficial for subsequent tasks such as 3D reconstruction. The invention can reduce the energy consumption and the area of the detection system on the basis of not influencing the detection precision.

Compared with the existing detection method and detection circuit, the invention has the following differences:

1) A binocular distance measurement algorithm based on the patch PatchMatch is provided, and a random search strategy is adopted to avoid estimation of all parallax levels. In the proposed algorithm, rectangular superpixels, called blocks, are used as basic computational elements, saving a lot of computation compared to the pixel level PatchMactch. In addition, in order to utilize different block sizes, the invention further provides a coarse-to-fine multi-scale propagation (MSP) scheme for tag updating.

2) Based on the proposed algorithm, the invention proposes a dedicated hardware architecture to effectively exploit the advantages of the algorithm. In the proposed architecture, memory requirements are minimized to improve real-time performance and energy resource efficiency.

Drawings

Fig. 1 is a schematic diagram of a calculation flow for the proposed binocular ranging algorithm;

FIG. 2 is a schematic diagram of a feature extraction computation method;

FIG. 3 is a calculation of a pixel level estimate;

FIG. 4 is a hardware block diagram of a binocular vision system;

fig. 5 is a functional block diagram of a pixel level disparity initialization circuit;

FIG. 6 is a circuit schematic of a pixel level cost computation unit;

fig. 7 is a circuit schematic of a block level disparity calculation module;

FIG. 8 is a circuit schematic of a block cost computation module;

FIG. 9 is a schematic circuit diagram of a tilt tag initialization;

FIG. 10 is a circuit schematic of a block propagation module;

FIG. 11 is a circuit schematic of a pixel level result estimation;

fig. 12 is a circuit schematic of the aggregate cost calculation.

Detailed Description

The invention will be further illustrated with reference to the following specific examples. It should be understood that these examples are for illustrative purposes only and are not intended to limit the scope of the present invention. Further, it should be understood that various changes or modifications of the present invention can be made by those skilled in the art after reading the teaching of the present invention, and these equivalents also fall within the scope of the claims appended to the present application. The illustrative embodiments provided herein focus on binocular images captured by binocular cameras while supporting different resolutions.

As shown in fig. 1, the algorithm of the binocular vision processor provided by the present invention is a calculation process from fine granularity to coarse granularity, and then from coarse granularity to fine granularity, and specifically includes the following steps:

step 1, extracting characteristics of image data obtained by a left camera and a right camera in a binocular ranging system. With reference to fig. 2, the image feature calculation method adopted by the present embodiment further includes the following steps:

102, comparing the gray values of the central pixels in the windows of the left image and the right image with the gray values of a plurality of pixel points at other selected positions in the windows, and coding the comparison result into 0 or 1 to form a binary characteristic vector.

In the image feature calculation method, the cost calculation is obtained by performing exclusive OR on binary feature vectors of two pixels and then adding the binary feature vectors bit by bit, and the larger the cost value is, the larger the difference between the two pixels is.

And 2, initializing, and giving an initial random label to the binary feature vector obtained in the step 1.

And 3, continuously expanding the size of the block until the block size is maximum, estimating the parallax inclination amount, and forming an initialized block-level label by the parallax inclination amount and the parallax to obtain the inclined parallax block-level label.

And 4, correcting the error label estimation in the initialization through multi-scale propagation, then optimizing the label, and continuously reducing the size of the block in the process to obtain the optimized block-level label.

And 5, obtaining the parallax and tilt label result of the final pixel level through pixel-by-pixel estimation. With reference to fig. 3, the algorithm for pixel-by-pixel estimation further comprises the following steps:

step 501, increasing the size of the block reduced in step 4 by fifty percent to make each pixel covered by four blocks, wherein each pixel corresponds to a block-level label result obtained by the multi-scale propagation optimization in step 4 of the four blocks, and the four block-level label results are four candidate labels;

The invention also provides a binocular vision hardware accelerator implemented based on the algorithm, which has data flow consistent with the steps of algorithm processing, and comprises a pixel-level disparity generation module FPDG, a block label initialization module BLI, a multi-scale propagation module MSP and a pixel-by-pixel estimation module ppeengine, wherein the feature-and-pixel-level disparity generation module FPDG and the block label initialization module BLI are used for implementing the

steps

1, 2 and 3, the multi-scale propagation module MSP is used for implementing the step 4, and the pixel-by-pixel estimation module ppeengine is used for implementing the step 5.

Image data obtained by a left camera and a right camera in the binocular ranging system are input into a feature and pixel level parallax generation module FPDG, and a binary feature vector and a pixel level initial parallax label are output by the feature and pixel level parallax generation module FPDG. In the feature and pixel level disparity generation module FPDG, the RGB pixel values p of the left image _L And RGB pixel value p of right image _R The gray values are respectively input into two RGB2G units, and the corresponding gray values to the 6P _CTunit are output by the RGB2G units. The 6P _CTunit completes the image feature calculation method shown in FIG. 2, and two 6P _CTunits obtain the left image binary feature vector f _L And a right image binary feature vector f _R Output to pixel disparity initialization circuit PDI and block tag initializationIn block BLI. The Random gen unit generates four integer Random numbers with values between 0 and the maximum parallax level as the initial Random pixel level parallax label R _d1 To R _d4 And input to the pixel parallax initializing circuit PDI as shown in fig. 5. Pixel parallax initialization circuit PDI is based on initial random pixel level parallax label R _d1 To R _d4 Initialized pixel level disparity d obtained by using pixel level cost calculation unit _{p_i} And output to the block tag initialization module BLI.

In the pixel parallax initialization circuit PDI, a pixel level cost calculation unit pairs 4 random initial random pixel level parallax labels R _d1 To R _d4 The cost value is calculated, the minimum cost corresponding to the random pixel level disparity is selected as the winner label, which is used as the initialized pixel level disparity d _{p_i} Is output from the cache to the block tag initialization module BLI. In the figure, C _p Pixel-level cost value, I, calculated for a pixel-level cost calculation unit _{c_min} Is the index position corresponding to the minimum cost.

As shown in fig. 6, in the pixel-level cost calculation unit, the right image binary feature vector f is added _R Caching in the right feature array, and selecting a corresponding right image binary feature vector f in the right feature array through a multiplexer according to the input candidate parallax _R And the left image binary feature vector f _L And performing XOR to calculate the corresponding pixel cost value Cp.

Left image binary feature vector f _L And a right image binary feature vector f _R After the block label initialization module BLI is input, the block label initialization module BLI utilizes the feature vector in the block label initialization module BLI to cache feature buffer for output and then outputs the binary feature vector f of the left image _L Right image binary feature vector f _R And outputting to a multi-scale propagation module MSP. And, the block label initialization module BLI is based on the input left image binary feature vector f _L Right image binary feature vector f _R And initializing pixel level disparity d _{p_i} Obtaining block-level tilted parallax label S _{b_ini} And output to the multi-scale propagation module MSP. As shown in fig. 7, at the beginning of the block labelIn the initialization block BLI, the left image binary feature vector f _L And a right image binary feature vector f _R Is input to the feature vector buffer feature buffer. The block size is progressively enlarged until the block size is maximum. Each level of block size reads the left image binary feature vector f from the feature vector buffer by the corresponding block level disparity computation module BDI _L And a right image binary feature vector f _R And obtaining the initialized block-level disparity label obtained by the block-level disparity calculating module BDI corresponding to the block with the previous size, thereby calculating and obtaining the initialized block-level disparity label corresponding to the block with the current size. Initializing pixel level disparity d _{p_i} The block-level disparity calculation module BDI corresponding to the block with the minimum size is input, the output of the block-level disparity calculation module BDI corresponding to the block with the maximum size is the block-level disparity tag for which the maximum size initialization is completed, and in fig. 4, the number represents the size of the image block corresponding to the current block. As shown in fig. 9, the oblique tag initialization module SI reads the left image binary feature vector f from the feature vector buffer feature buffer _L And a right image binary feature vector f _R And obtaining the block-level parallax label with the initialized maximum size, and calculating to obtain the initialized block-level parallax label S with inclination _{b_ini} 。

The block-level disparity calculation module BDI is functionally divided into label planning and cost calculation. In the block-level disparity calculation module BDI, the left image binary feature vector f _L And a right image binary feature vector f _R And the corresponding candidate disparity d in the tag cache ₁ To d ₄ Inputted into four parallel pixel-level cost calculation units as shown in fig. 6, and the candidate disparities d are obtained by the pixel-level cost calculation units ₁ To d ₄ Corresponding four pixel cost values C _p1 To C _p4 Four pixel cost value C _p1 To C _p4 Then four parallel block-level cost calculation units are respectively input to calculate and candidate parallax d ₁ To d ₄ Corresponding four block cost values C _b1 To C _b4 . Then the block-level disparity calculation module BDI is based on four block cost values C _b1 To C _b4 ' TongtongIndex value I corresponding to minimum value of block cost _{c_min} Determining a corresponding winner tag D from tag cache selection _{b_o} And outputting to the next stage. Output D of block-level disparity calculation module BDI corresponding to block with maximum size _{b_o} I.e. the block level disparity label that completes the initialization of the maximum size.

As shown in fig. 8, the block-level cost computation unit is used to complete the accumulation of pixel-level costs within the block. The summation is divided into horizontal and vertical pixel cost values C input firstly _p Into the horizontal accumulation column, horizontal summation is accomplished by adding the latest input cost value and subtracting the oldest cost value. When the horizontal end signal is active, the result of the horizontal summation is stored in a row buffer and then read out to the vertical accumulation column. When the block end signal is effective, longitudinal accumulation is carried out, and the accumulated result is the output block cost value C _b 。

As shown in fig. 9, the block level disparity tag D that will complete the maximum size initialization _{b_i} Left image binary feature vector f _L And a right image binary feature vector f _R Inputting a tilt tag initialization module SI, wherein the tilt tag initialization module SI obtains five pixel cost values through five parallel pixel-level cost calculation units as shown in fig. 6 by five preset parallax tilt values (0, x +, x-, y +, y-), the five pixel cost values are respectively inputted into five parallel block-level cost calculation units as shown in fig. 8 to calculate five block cost values corresponding to the parallax tilt values (0, x +, x-, y +, y-), then fitting is respectively performed in the horizontal and vertical directions through a quadratic fitting unit, the tilt value corresponding to the minimum position of the fitted block cost values is the initialized parallax tilt value, the horizontal and vertical tilt values and the corresponding parallax values are the output parallax tag S with tilt _{b_ini} 。

Left image binary feature vector f in multi-scale propagation module MSP _L And a right image binary feature vector f _R And inputting the data into a feature vector buffer. Left image binary feature vector f in feature vector cache feature buffer _L And a right image binary feature vector f _R And blocks obtained by multi-scale propagation module MSPClass label S _MSP Output to the pixel-by-pixel estimation module PPE engine. Referring to fig. 10, the multi-scale propagation module MSP includes a plurality of block propagation modules BP corresponding to different image block sizes, and the image block sizes are continuously reduced step by step to finally obtain the optimized block-level tags S _MSP . Except that the block propagation module BP corresponding to the minimum-size image block is iterated once, the block propagation modules BP corresponding to the other-size image blocks are iterated for multiple times. And each level of size image block corresponds to a plurality of cascaded block propagation modules BP, so that a plurality of iterations of the block propagation module BP corresponding to the current level of size image block are completed. For the block propagation module BP corresponding to the current first-level size image block, the input of the current iteration is the output of the last iteration and the binary feature vector f of the left image acquired from the feature vector cache feature buffer _L And a right image binary feature vector f _R . And after the block propagation module BP corresponding to the image block with the previous size finishes all iterations, inputting the obtained output into the block propagation module BP corresponding to the image block with the next size, and starting to perform first iteration. After all the block propagation modules BP are cascaded, the optimization of the block-level labels and the contraction of the sizes of the corresponding image blocks are completed. The input of the block propagation module BP corresponding to the maximum size image block in the first iteration is the parallax label S with inclination output by the block label initialization module BLI _{b_ini} 。

As shown in fig. 10, the input of each block propagation module BP is a disparity label with tilt at the previous stage and a left image binary feature vector f _L Right image binary feature vector f _R . The parallax label with the inclination of the first level is temporarily stored in the label buffer. Set of labels S within corresponding neighborhood _C And coordinate vector (k) _x ,k _y And 1) taking the dot product as the pixel level parallax value d corresponding to the candidate label ₁ To d ₉ Is input to the part of the cost calculation as a candidate tag. The cost calculation part includes nine parallel pixel-level cost calculation units as shown in fig. 6, and nine parallel block-level cost calculation units and one smooth cost calculation unit as shown in fig. 8 to calculate corresponding cost values. Corresponding to the position I with the minimum total cost value at the block level _{c_min} As a winner labelLabel S _{b_o} And outputting the data to a next-stage block propagation module BP. Wherein, C _pu As a pixel-level unary cost term, C _bu Is a block-level unary cost term, C _s To smooth the cost value, C _TB Is the block-level total cost value, C _TB ＝C _bu +C _s 。

Finally, the final result, namely the tilted parallax label S with pixel level, is output by a pixel-by-pixel estimation module PPE engine _final Which completes the conversion of the block-level tag to the pixel-level tag. The basic structure of the pixel-by-pixel estimation module PPE engine is the same as that shown in fig. 5. Block-level tags S obtained by multi-scale propagation module MSP _MSP Left image binary feature vector f _L And a right image binary feature vector f _R Inputting a pixel-by-pixel estimation module PPE engine, wherein a cost calculation unit adopted by the pixel-by-pixel estimation module PPE engine is an aggregation cost calculation unit corresponding to a minimum aggregation cost position I _{c_min} Is output as a final output S _final In the figure, C _a Is the aggregate cost value. As shown in fig. 12, the candidate disparity d _c And a left image binary feature vector f _L And right image binary feature vector f _R Inputting an aggregation cost calculation unit, wherein the binary feature vector f of the right image _R Inputting the candidate parallax into a right window array, selecting a window at a corresponding position from the right window array by a multiplexer according to the input candidate parallax, and then adding a left image binary characteristic vector f serving as a left window characteristic vector _L Carrying out XOR and accumulation cost, wherein the accumulation result is the output aggregation cost value C _a 。

Claims

1. A method for a high-speed high-energy-efficiency binocular vision hardware accelerator based on an image block level is characterized in that a calculation flow from fine granularity to coarse granularity and then from coarse granularity to fine granularity is adopted, and the method specifically comprises the following steps:

and 5, obtaining a parallax and tilt label result of the final pixel level through pixel-by-pixel estimation, wherein the pixel-by-pixel estimation algorithm further comprises the following steps:

and step 502, comparing the four candidate labels through the aggregation cost, so as to determine a label result of the last output pixel level.

2. An image block level based high-speed energy-efficient binocular vision hardware accelerator, adopting the method of claim 1, comprising a pixel level disparity generation module FPDG, a block label initialization module BLI, a multi-scale propagation module MSP, and a pixel-by-pixel estimation module ppeengine, wherein the feature and pixel level disparity generation module FPDG and the block label initialization module BLI are used to implement the steps 1, 2, and 3, the multi-scale propagation module MSP is used to implement the step 4, and the pixel-by-pixel estimation module ppeengine is used to implement the step 5.

3. The binocular vision hardware accelerator of claim 1, wherein image data obtained by left and right cameras in the binocular ranging system is input to a feature and pixel level disparity generating module FPDG, and a binary feature vector and a pixel level initial disparity tag are output by the feature and pixel level disparity generating module FPDG; in the feature and pixel level disparity generation module FPDG, the RGB pixel values p of the left image _L And RGB pixel value p of right image _R Respectively inputting the gray values to 6P \uCT units, outputting the corresponding gray values to 6P \uCT units by the RGB2G units, and obtaining the binary characteristic vector f of the left image by the two 6P \uCT units _L And a right image binary feature vector f _R Outputting the data to a pixel parallax initialization circuit PDI and a block label initialization module BLI; the Random gen unit generates four integer Random numbers with values between 0 and the maximum parallax level as the initial Random pixel level parallax label R _d1 To R _d4 Inputted to a pixel parallax initializing circuit PDI based on an initial random pixel level parallax label R _d1 To R _d4 Initialized pixel level parallax d obtained by using pixel level cost calculation unit _{p_i} And output to the block tag initialization module BLI.

4. The binocular vision hardware accelerator of claim 3, wherein in the pixel disparity initialization circuit PDI, the pixel level cost computation unit performs 4 random initial pairs of random pixel level disparity labels R _d1 To R _d4 The cost value is calculated, the minimum cost corresponding to the random pixel level disparity is selected as the winner label, which is used as the initialized pixel level disparity d _{p_i} Is output from the cache to the block tag initialization module BLI.

5. The image-block-level-based high-speed energy-efficient binocular vision hardware accelerator of claim 4, wherein the left image binary feature vector f _L And a right image binary feature vector f _R After the block label initialization module BLI is input, the block label initialization module BLI utilizes the feature vector in the block label initialization module BLI to cache feature buffer for output and then outputs the binary feature vector f of the left image _L Right image binary feature vector f _R Outputting to a multi-scale propagation module MSP; and, the block tag initialization module BLI is based on the input left image binary feature vector f _L Right image binary feature vector f _R And initializing pixel level disparity d _{p_i} Obtaining block-level tilted parallax label S _{b_ini} And outputs to the multi-scale propagation module MSP; in the block label initialization module BLI, the left image binary feature vector f _L And a right image binary feature vector f _R Is input to the feature vector cache feature buffer; the block size is enlarged step by step until the block size is maximum; reading the binary feature vector f of the left image from the feature vector buffer by the corresponding block level disparity calculation module BDI of each level of size _L And a right image binary feature vector f _R And obtaining the initialized block-level disparity tag obtained by the block-level disparity calculating module BDI corresponding to the block of the previous size, thereby calculating to obtain the initialized block-level disparity tag corresponding to the block of the current size. Initializing pixel level disparity d _{p_i} Inputting a block-level parallax calculation module BDI corresponding to a block with the minimum size, wherein the output of the block-level parallax calculation module BDI corresponding to the block with the maximum size is a block-level parallax label which completes initialization of the maximum size; the oblique label initialization module SI reads the binary feature vector f of the left image from the feature vector buffer cache _L And a right image binary feature vector f _R And obtaining the block-level parallax label with the initialized maximum size, and calculating to obtain the initialized block-level parallax label S with inclination _{b_ini} 。

6. Such as rightThe hardware accelerator of claim 5, wherein the binary feature vector f of the left image is processed by the block-level disparity computation module BDI _L And a right image binary feature vector f _R And the corresponding candidate disparity d in the tag cache ₁ To d ₄ Inputting the data into four parallel pixel-level cost calculation units, and obtaining the disparity d candidate by the pixel-level cost calculation units ₁ To d ₄ Corresponding four pixel cost values C _p1 To C _p4 Four pixel cost value C _p1 To C _p4 Then respectively inputting four parallel block-level cost calculation units to calculate and candidate parallax d ₁ To d ₄ Corresponding four block cost values C _b1 To C _b4 (ii) a Then the block-level disparity calculation module BDI is based on four block cost values C _b1 To C _b4 Passing the index value I corresponding to the minimum value of the block cost _{c_min} Determining a corresponding winner tag D from tag cache selection _{b_o} Outputting to the next stage; output D of block-level disparity calculation module BDI corresponding to block with maximum size _{b_o} I.e. the block level disparity label that completes the initialization of the maximum size.

7. The image block-level based high-speed energy-efficient binocular vision hardware accelerator of claim 6, wherein a block-level disparity label D that will complete maximum size initialization _{b_i} Left image binary feature vector f _L And a right image binary feature vector f _R Inputting a tilt label initialization module SI, wherein the tilt label initialization module SI obtains five pixel cost values through five parallel pixel level cost calculation units by five preset parallax tilt values (0, x +, x-, y +, y-), the five pixel cost values are respectively input into the five parallel block level cost calculation units to be calculated to obtain five block cost values corresponding to the parallax tilt values (0, x +, x-, y +, y-), then the five block cost values are respectively fitted in the horizontal and vertical directions through a quadratic fitting unit, the tilt value corresponding to the minimum position of the fitted block cost values is the initialized parallax tilt value, and the horizontal and vertical tilt values and the corresponding parallax values are the output tilt value with tiltOf (2) _{b_ini} 。

8. The image-block-level-based high-speed energy-efficient binocular vision hardware accelerator of claim 7, wherein in the multi-scale propagation module MSP, the left image binary feature vector f _L And a right image binary feature vector f _R Inputting the feature vector into a feature vector buffer; left image binary feature vector f in feature vector cache feature buffer _L And a right image binary feature vector f _R And block-level tags S obtained by a multi-scale propagation module MSP _MSP Outputting to a pixel-by-pixel estimation module PPE engine; the multi-scale transmission module MSP comprises a plurality of block transmission modules BP corresponding to different image block sizes, the image block sizes are gradually and continuously reduced, and finally the optimized block-level label S is obtained _MSP (ii) a Except that the block propagation module BP corresponding to the minimum-size image block is iterated once, the block propagation modules BP corresponding to the other size image blocks are iterated for a plurality of times. Each level of size image block corresponds to a plurality of cascaded block transmission modules BP, so that a plurality of iterations of the block transmission module BP corresponding to the current level of size image block are completed; for the block propagation module BP corresponding to the current first-level size image block, the input of the current iteration is the output of the last iteration and the binary feature vector f of the left image acquired from the feature vector cache feature buffer _L And a right image binary feature vector f _R . After the block propagation module BP corresponding to the image block with the previous stage size completes all iterations, inputting the obtained output into the block propagation module BP corresponding to the image block with the next stage size, and starting to perform first iteration; after all block propagation modules BP are cascaded, the optimization of block-level labels and the contraction of the sizes of the corresponding image blocks are completed; the input of the block propagation module BP corresponding to the maximum size image block in the first iteration is the parallax label S with inclination output by the block label initialization module BLI _{b_ini} ；

The input of each block propagation module BP is a disparity label with inclination at the upper stage and a left image binary characteristic vector f _L Right image binary feature vector f _R (ii) a Single stage belt tiltingThe oblique parallax labels are temporarily stored in a label cache; set of labels S within corresponding neighborhood _C And coordinate vector (k) _x ,k _y And 1) taking the dot product as the pixel level parallax value d corresponding to the candidate label ₁ To d ₉ The part input to the cost calculation is used as a candidate label; the cost calculation part comprises nine parallel pixel-level cost calculation units, nine parallel block-level cost calculation units and a smooth cost calculation unit which calculate corresponding cost values; corresponding to the position I with the minimum total cost value of the block level _{c_min} As winner tag S _{b_o} And outputting to a next-stage block propagation module BP.

9. The image-block-level-based high-speed energy-efficient binocular vision hardware accelerator according to claim 8, wherein a final result, namely a pixel-level tilted disparity label S, is output by a pixel-by-pixel estimation module PPE engine _final Which performs a conversion of block-level labels to pixel-level labels, wherein the block-level labels S are obtained by a multi-scale propagation module MSP _MSP Left image binary feature vector f _L And a right image binary feature vector f _R Inputting a pixel-by-pixel estimation module PPE (context), wherein a cost calculation unit adopted by the pixel-by-pixel estimation module PPE (context) is an aggregation cost calculation unit corresponding to a minimum aggregation cost position I _{c_min} Is output as a final output S _final ；

Candidate disparity d _c And a left image binary feature vector f _L And right image binary feature vector f _R Inputting an aggregation cost calculation unit, wherein the binary feature vector f of the right image _R Inputting the candidate parallax into a right window array, selecting a window at a corresponding position from the right window array by a multiplexer according to the input candidate parallax, and then adding a left image binary characteristic vector f serving as a left window characteristic vector _L Carrying out XOR and accumulation cost, wherein the accumulation result is the output aggregation cost value C _a 。

10. An image block level based high speed as claimed in claim 9The binocular vision hardware accelerator with high energy efficiency is characterized in that in the pixel-level cost calculation unit, a right image binary characteristic vector f is used _R Caching in the right feature array, and selecting a corresponding right image binary feature vector f in the right feature array through a multiplexer according to the input candidate parallax _R And the left image binary feature vector f _L Calculating the corresponding pixel cost value Cp by XOR;