WO2009014314A1 - Belief propagation based fast systolic array and method thereof - Google Patents

Belief propagation based fast systolic array and method thereof Download PDF

Info

Publication number
WO2009014314A1
WO2009014314A1 PCT/KR2008/003280 KR2008003280W WO2009014314A1 WO 2009014314 A1 WO2009014314 A1 WO 2009014314A1 KR 2008003280 W KR2008003280 W KR 2008003280W WO 2009014314 A1 WO2009014314 A1 WO 2009014314A1
Authority
WO
WIPO (PCT)
Prior art keywords
buffer
messages
systolic array
belief propagation
value
Prior art date
Application number
PCT/KR2008/003280
Other languages
French (fr)
Inventor
Hong Jeong
Sung Chan Park
Original Assignee
Postech Academy-Industry Foundation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Postech Academy-Industry Foundation filed Critical Postech Academy-Industry Foundation
Publication of WO2009014314A1 publication Critical patent/WO2009014314A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Definitions

  • the present invention relates to a belief propagation (BP) based fast systolic array and a method thereof; and, more particularly, to a systolic array that can perform parallel computation with a compact memory by using hierarchical BP based characteristics while reducing a total memory size when the number of iterations is small, and a method thereof.
  • BP belief propagation
  • Fig. 1 illustrates an MRF (Markov Random Field) network for stereo matching and a conventional BP update rule.
  • MRF Markov Random Field
  • a data cost D p (d p ) and an edge cost V(d p ,d q ) are allocated to a hidden state d p of each node and a hidden state d p ,d q of each edge on the MRF graph, respectively.
  • an approximation solution for a MAP (Maximum A posteriori) solution i.e., a state that minimizes the sum of all of the costs on the MRF network, can be computed by using the BP, a.s in Equation 1.
  • Equation 1 shows an energy cost model in the MRF network.
  • N ⁇ represents neighboring nodes
  • N b (p)/q represents nodes neighboring to a node p except for a node g
  • m' fll/ (d tl ) represents a message transmitted from the node p to the node q.
  • the message ⁇ (d,,) is calculated by adding messages transmitted to the node p from the nodes neighboring to the node p except for the node q, the data cost of the node p, and the edge cost of an edge from the node p to the node q.
  • is a normalization parameter and corresponds to an average of all state costs of messages of each node.
  • the message m ⁇ ' id,) is calculated at each iteration and transmitted from the node p to the node q.
  • the MAP state d p which is a disparity- value, can be estimated by adding messages transmitted to the node q from the neighboring nodes at the final iteration T and determining a state having a minimum cost for each node p.
  • Fig. 2 illustrates a layer structure in which a layer corresponds to an iteration of a message at each node shown in Fig. 1.
  • the MRF network structure shown in Fig. 1 is constructed a ⁇ a dynamic Bayesian network in which a layer is stacked each time message is repeatedly calculated at each node, the number of iterations *t' corresponding to a layer index "I' .
  • p(l) represents the coordinate of a node at an 2-th iteration layer
  • Figs. 3A and 3B show the vertical -rearrangement result of Po(I) nodes according to Equation 4, i.e., a layer- transformed FBP (Fast Belief Propagation) structure and a message update sequence .
  • Equation 4 i.e., a layer- transformed FBP (Fast Belief Propagation) structure and a message update sequence .
  • the node p(l) and the node p(l-l) differ from each other by an offset -[1O] 1 on the layer structure .
  • nodes are grouped, and nodes at the same layer in a group are then parallel-processed to obtain messages thereof.
  • messages of nodes at the previous layer in a group are read from a local buffer of the group, and messages of nodes within the adjacent group are read from a layer buffer in which messages of nodes in the previously processed group are stored.
  • the final iteration message is calculated as the layer buffer is right-shifted, i.e., as the layer buffer moves in a positive direction of the p 0 axis. That is, messages of nodes in a group are calculated in parallel and stored in the local buffer. The messages stored in. the local buffer are used to process nodes at the next (upper) layer in the group. Further, the messages stored in the local buffer are to be stored in the layer buffer for use in processing messages of nodes in a next group. Accordingly, the messages can be processed by using a small layer buffer and a small local buffer.
  • the present invention provides a BP based fast systolic array that can perform parallel computation with a compact memory by using hierarchical BP based characteristics while reducing a total memory size when the number of iterations is small, and a method thereof.
  • a belief propagation based fast systolic array wherein a hierarchical dynamic Bayesian network of nodes corresponding to pixels of input left and right image pixel data is generated in consideration of an iteration axis and scale levels, and messages on the generated dynamic Bayesian network are updated in a specific axis direction on a Markov random field.
  • a belief propagation based fast systolic array method including:
  • step (b) outputting a disparity image fast and in parallel using the pixel data stored at the step (a) .
  • parallel calculation i9 performed with a compact memory by using hierarchical BP based characteristics while the total memory size is reduced when the number of iterations is small.
  • fast parallel processing can be carried out by using a compact distributed memory provided in an existing VLSI (Very Large Scale Integration) chip and a parallel processor for accessing the memory.
  • VLSI Very Large Scale Integration
  • the message update of the present invention can be performed by simple integer calculation, and thus, the fast systolic array of the present invention can be manufactured as a compact parallel VLSI chip, e.g.
  • Fig. 1 illustrates an MRF network for stereo matching and a conventional BP update rule
  • Fig. 2 illustrates a layer structure in which a layer corresponds to an iteration of a message at each node shown in Fig . 1 ;
  • Figs .3A and 3B respectively illustrate an explanatory view of a layer-transformed FBP (Fast Belief Propagation) structure and a message update sequence
  • Fig. 4 illustrates a dynamic Bayesian network having a hierarchical BP structure
  • Figs .5A to 5D respectively illustrate a layer-transformed hierarchical structure
  • Figs. 6A to 6D respectively illustrate a sequence diagram when Figs. 5A to 5D are viewed from a different angle
  • Fig. 7 illustrates a detailed structure of a layer buffer and a local buffer
  • Figs. 8A and 8B illustrate the message update processes in the structure before layer transformation and in the layer-transformed structure, respectively;
  • Fig. 9 illustrates a data cost reading process in the layer-transformed hierarchical FBP structure shown in Figs. 5A to 5D;
  • Fig. 10 illustrates a configuration of a BP based fast systolic array for use in stereo matching in accordance with the present invention
  • Fig. 11 illustrates systolic array architecture of an FBP stereo matching module
  • Fig. 12 illustrates a detail view of a processing element (PE) group
  • Fig. 13 illustrates architecture of a data cost module
  • Fig. 14 illustrates a detail view of a module A in the data cost module shown in Fig. 13;
  • Fig. 15 illustrates a detail view of a module B in the data cost module shown in Fig. 13
  • Fig. 16 illustrates a buffer distribution structure of PEs in a PE group
  • Fig. 17 illustrates a detail view of a PE module
  • Figs. 18A and 18B respectively illustrate a forward processor in the PE module shown in Fig. 17
  • Figs. 19A and 19B respectively illustrate a backward processor in the PE module shown in Fig. 17;
  • Fig. 20 illustrates an FBP stereo matching sequence on nodes of a FBP stereo matching module
  • Fig. 21 illustrates a flowchart of the FBP stereo matching sequence
  • Fig.22 illustrates a sequential calculation procedure for message update within a group in the FBP stereo matching sequence of Fig. 21;
  • Fig. 23 illustrates a parallel calculation procedure for message update within a group in the FBP stereo matching sequence of Fig. 21;
  • Fig.24 illustrates a calculation sequence in the data cost module
  • Fig .25 illustrates a local index for buffer update in Figs . 22 and 23; and Fig. 26 illustrates a comparison result of an error rate with other real-time stereo matching systems.
  • Fig. 4 illustrates a dynamic Bayesian network having a hierarchical BP structure . As shown in Fig. 4, as the number of nodes increases from a coarse level to a fine level, iteration layers are formed. The number of nodes at a level k is N ⁇ /2 k x N 0 /2 k on an Ni x N 0 MRF network .
  • a k is an offset generated by a scale difference between levels k and k-1 with respect to p k"1 at the coarsest level .
  • Figs. 6 ⁇ to 6D respectively illustrate a sequence diagram when Figs . 5A to 5D are viewed from a different angle.
  • Fig. 9 illustrates a data cost reading process in the layer-transformed hierarchical FBP structure shown in Figs. 5A to 5D.
  • Fig. 7 all nodes on the p k (l k ) axis are grouped and a processor is positioned to each node.
  • the processors within a group perform parallel processing, such that the MRF network is scanned in the p 0 k (l k ) axis direction.
  • messages of nodes at the previous layer in a group are read from a local buffer of the previous layer, and messages of nodes within a group of a previous line are read from a layer buffer.
  • a message at the final iteration is calculated as the layer buffer is right-shifted, i.e. , as the layer buffer moves in a positive direction of the pl(l k ) axis.
  • the messages of the nodes in the group are calculated in parallel and stored in the local buffer to be then used in processing a next (upper) layer. Further, the messages are also stored in the layer buffer to be then used in processing messages of a next group.
  • the same result as of the hierarchical BP structure can be obtained using a small layer buffer and a small local buffer.
  • Fig. 10 illustrates a configuration of a BP based fast systolic array for use in stereo matching in accordance with the present invention.
  • the BP based fast systolic array includes : an image buffer 10 that receives and temporarily stores left and right image pixel data input by a raster scan method; and an FBP stereo matching module 13 that outputs a disparity image fast and in parallel by using the left and right pixel data output from the image buffer 10.
  • the FBP stereo matching module 13 includes a plurality of PE (Processing Element) groups which exchanges messages and pixel data with each other.
  • the FBP stereo matching module 13 supports fast parallel processing.
  • each PE group includes: a data cost module 13a that receives pixel data and calculates data costs; a plurality of multiplexers (MUX) 13b that receives the data costs from the data cost module 13a and messages from neighboring PE groups and selects a desired output; a plurality of processing elements PE 13c that calculates a new message by using the output of the MUXs 13b; a plurality of local buffers 13d that stores a result value of the PEs 13c; and a layer buffer 13e that stores the result value of the local buffers 13d again.
  • MUX multiplexers
  • Each PE 13c includes: an adder that sequentially reads and adds the data costs and the messages of the previous layer by states; a forward processor that receives the output of the adder and outputs a forward processor cost; a forward stack that receives and stores the forward processor cost; a backward processor that receives an output value of the forward stack and outputs a backward processor cost; a backward stack that stores an output value of the backward processor; a normalizer that receives an output value of the backward stack and calculates a final message; and a buffer that stores an output value of the normalizer.
  • the forward processor includes : a first forward processor that initializes a first delay buffer, reads an input cost value at each step, compares the input cost value with a value obtained by adding a constant value to a previous value of the first delay buffer, stores a minimum value among them in the first delay buffer, and outputs the minimum value; and a second forward processor that initializes a second delay buffer, calculates a minimum value of an input cost of the second delay buffer, and outputs a value obtained by adding a constant value to the minimum value .
  • the backward processor includes: a first backward processor that initializes a first delay buffer, reads an input cost value at each step, compares the input cost value with a value obtained by adding a constant value to the value of the first delay buffer, stores a minimum value among them in the first delay buffer, compares an output value of the first delay buffer with an output value of the forward processor, and outputs a minimum value among them,- and a second backward processor that initializes a second delay buffer to ⁇ 0', stores in the second delay buffer a value obtained by adding the output value of the first delay buffer to the value of the second delay buffer at each step, and shifts and outputs the value of the second delay buffer by a specific value.
  • the normalizer outputs a value obtained by subtracting the value calculated by the second backward processor from the value calculated by the first backward processor, thereby calculating a message.
  • the PE group has 2 K'X PEs 13c in total (see Figs. 11 and 12) . Accordingly, Ni/2 K'1 PE groups are needed in an N 1 x N 0 image. Since the number of nodes at each level varies according to the coarse-to-fine scale characteristics, when the FBP stereo matching sequence operates at level k within the PE group, only N 1 /2 k PEs operate in parallel. As shown in Fig. 16, each PE has a local buffer for each level and a layer buffer, and accesses to these buffers through the MUX. As shown in Fig. 13, the data cost module is a logic for calculating data costs and performs a function as in Equation 7.
  • a module A in the data cost module includes: registers that store left and right pixel data g r (po, Pi+d) and g 1 (po, Pi); and a logic that calculates an absolute difference between the register values. That is, a data cost D'p(d) is an output value of the module A. The right pixel data is shifted by a shift logic to a neighboring register by a value d in Equation 7, and thus the data cost D' p (d) becomes output for each value d.
  • a module B in the data cost module is a logic that performs an operation of Equation 8 to calculate the data cost D p (d) . As shown in Fig. 13, two neighboring data costs D' p (d) are added at each level k, and thus the final data cost D k (d) is calculated by adding 2 k scan lines using a register and an accumulator.
  • the data cost D p k (d) in the accumulator is initialized first, and left and right scan lines corresponding to each e 0 e[0, 2 k -l] are loaded. Then, the value of Equation 9 is accumulated to the data cost D p h (d) .
  • each data cost for the value d needs to be calculated.
  • the FBP stereo matching sequence in the FBP stereo matching module using the data costs on the N 0 x Ni MRF network is as follows .
  • the processors at all of the nodes on the axis are grouped for each ⁇ Q k (l k ) , and performs the Message_update function in parallel within the group via the depth-first tree sequence. Then, the State_estimation function is performed at a final layer L 0 to determine the disparity value.
  • Fig. 21 illustrates a flowchart of the FBP stereo matching sequence.
  • Fig. 22 and Fig. 23 respectively illustrate a sequential calculation procedure and a parallel calculation procedure for message update within a group in the FBP stereo matching sequence of Fig. 21.
  • Fig. 25 illustrates a local index for buffer update in Figs. 22 and 23.
  • N h N b (h+(a k -I)[I O] 7' )RSLANT(s+(a k -l)[l ⁇ f)
  • N h N b (h+(a k -Y)[I Qf)RSLANT(s+(a k -l)[l ⁇ f) b. Buffer_update in layer buffer, for next group processing a) in case of data cost b) in case of message
  • the edge cost Vh S (d h , d s ) can be calculated without using a memory when a truncated linear function —d s ,K V ) with parameters ⁇ v and K v is used.
  • a node on the MRF network corresponds to nodes having an offset -[I 0] ⁇ between different iteration layers in the layer-transformed structure, as shown in Equation 5. Accordingly, N b (h) /s is changed to N b (h- [10] ⁇ ) / (s- [10] ⁇ ) by the layer transformation.
  • the Message calculation function accesses the layer buffer and the local buffer as follows.
  • the data cost and the message of the node at the previous layer are read from the local buffer. If -2 ⁇ u 0 ⁇ 0, the node belongs to the previous group and the data cost and the message of the node at the previous layer are read from the layer buffer.
  • a message that is calculated by the function is stored in the local buffer.
  • p(l) p(l- ⁇ )-[l Of .
  • the State estimation function outputs d o ,, Ojl , using a message M° h (d h ,L°) of an L 0 layer at a level 0.
  • the messages of nodes satisfying -2 ⁇ u ⁇ 0 are read from the layer buffer.
  • the size ⁇ - ⁇ of the layer buffer for all of the messages is ⁇ ASB ⁇ N* bits.
  • the message memory size is
  • AT-I K-X layer buffer is ⁇ SB ⁇ N* bits , and the local buffer is ⁇ SBN*
  • the matching module is ⁇ 5SB(L k +I)Nf bits.
  • an FBP scanning sequence may be implemented by a VLSI chip in which a plurality of processors read messages from neighboring processors to perform parallel calculation, as shown in Figs. 8A and 8B.
  • a PC may sequentially read the messages to perform calculation.
  • the PE is a logic that calculates a new message m o (d s ) by using V hs (dh, d s ) and m sum (dh) .
  • the present invention suggests a new PE architecture which can reduce, when a message has the state size of "A", the calculation amount from 0(A 2 ) to O (3A) by using the distance transform characteristics disclosed in Patent Reference 1.
  • the PE has the forward processor, the backward processor, and the normalizer, as shown in Fig. 17.
  • the new PE architecture is suitable for VLSI implementation because of its simple calculation procedure using only an adder, a subtracter, a shifter, and a comparator.
  • m b (t) mm(D s (t),m f ( ⁇ ))
  • D 3 (t) mm(m f (A ⁇ l-t),D 3 (t-l)+C v )
  • the forward processor outputs m sum (t) , which is the sum of the message and the data cost, as m f (t) .
  • ⁇ rif(t) is stored in the stack and used by the backward processor to output ⁇ n b (t) .
  • the normalizer receives m b (t) and calculates m o (t).
  • Figs. 18A and 18B respectively illustrate a forward processor in the PE module shown in Fig. 17.
  • the input cost m sum (t) represents sequential input of a vector where t is in a range from 0 to A-I.
  • a first forward processor shown in Fig. 18A initializes a delay buffer D 1 C-I) to ⁇ B" and adds the input cost at each step.
  • the newly calculated value D x (t) is compared with Di(t-1)+C o calculated at the previous step and the minimum value is calculated as ⁇ i f (t) at the current step.
  • a second forward processor shown in Fig.18B calculates the minimum value of m sum (t) by using a delay buffer D 2 (t) , and adds K v to the minimum value to output nri f (-1) .
  • Figs. 19A and 19B respectively illustrate a backward processor in the PE module shown in Fig. 17.
  • a first backward processor shown in Fig. 19A initializes a delay buffer D 3 (-1) to "B” and reads the state value rri f (t) of the forward cost at each step, m f (t) is compared with D 3 (t-1) +C V calculated at the previous step, and the minimum value is set as D 3 (t) at the current step. D 3 (t) is compared with an input parameter m f (-l) again, and a smaller value is calculated and output as m b (t) .
  • a delay buffer D 4 (t) is initialized to "0" at the beginning, and m b (t) is added thereto at each step.
  • D 4 (A-I) is right-shifted by "A” and then output.
  • the normalizer in the PE module shown in Fig. 17 subtracts the output value ⁇ ib(-l) of the second backward processor from the output value m b (t) of the first backward processor, and finally outputs m o (t) . Accordingly, in case of a Middlebury test image, when the total number of scale levels is 4 and L k is allocated with (5, 5, 10, 5) in a coarse-to-fine manner, the present invention shows an excellent low error result, as shown in Fig. 26. Particularly, in case of a 436 x 383 image, the memory size is

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Optimization (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Analysis (AREA)
  • Algebra (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computing Systems (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Probability & Statistics with Applications (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Image Processing (AREA)
  • Image Analysis (AREA)

Abstract

In a belief propagation based fast systolic array, a hierarchical dynamic Bayesian network of nodes corresponding to pixels of input left and right image pixel data is generated in consideration of an iteration axis and scale levels. Further, messages on the generated dynamic Bayesian network are updated in a specific axis direction on a Markov random field.

Description

BELIEF PROPAGATION BASED PAST SYSTOLIC ARRAY AND METHOD THEREOF
Field of the Invention
The present invention relates to a belief propagation (BP) based fast systolic array and a method thereof; and, more particularly, to a systolic array that can perform parallel computation with a compact memory by using hierarchical BP based characteristics while reducing a total memory size when the number of iterations is small, and a method thereof.
Background of the Invention
Fig. 1 illustrates an MRF (Markov Random Field) network for stereo matching and a conventional BP update rule.
In the conventional BP technique, when nodes corresponding to image pixels are regularly connected with each other as shown in Fig. l, a 2-dimensional (2D) MRF network having a size of N1 x N0 is defined. Referring to P. F. Felzenszwalb and D. R. Huttenlocher , "Efficient Belief Propagation for Early Vision", in Proc . 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, No. 1, pages 261-268, 2004 (hereinafter, referred to as "Patent Reference 1" ) , when BP is performed on an MRF using hierarchical data costs, a low error rate can be obtained with a small number of iterations. However, if an image is large, it takes long time to process the image due to a large number of nodes, and also a large message memory is needed.
In the Ni x N0 MRF network, a 2D vector is represented by X=Ex0 Xi]'1' using elements χα and X3., and a position of a node is represented by a 2D vector p= [pα pi] τ. Further, a data cost Dp (dp ) and an edge cost V(dp,dq) are allocated to a hidden state dp of each node and a hidden state dp,dq of each edge on the MRF graph, respectively. Then, an approximation solution for a MAP (Maximum A posteriori) solution, i.e., a state that minimizes the sum of all of the costs on the MRF network, can be computed by using the BP, a.s in Equation 1.
Equation 1 shows an energy cost model in the MRF network.
[Equation 1]
d=arg(/ παin E(d),
E(d)= X V{d^d11)^Dp{dp)
P.<l^h p*l>
A message calculation process is as in Equation 2
[Equation 2]
Figure imgf000003_0001
Here, N^ represents neighboring nodes, Nb(p)/q represents nodes neighboring to a node p except for a node g, and m'fll/(dtl) represents a message transmitted from the node p to the node q. Like the BP update rule shown in Fig. 1, the message ^(d,,) is calculated by adding messages transmitted to the node p from the nodes neighboring to the node p except for the node q, the data cost of the node p, and the edge cost of an edge from the node p to the node q. In Equation 2, α is a normalization parameter and corresponds to an average of all state costs of messages of each node. The message m^' id,) is calculated at each iteration and transmitted from the node p to the node q.
As in Equation 3, the MAP state dp, which is a disparity- value, can be estimated by adding messages transmitted to the node q from the neighboring nodes at the final iteration T and determining a state having a minimum cost for each node p.
[Equation 3]
Figure imgf000004_0001
As described above, in the conventional EP update rule, all messages are not stored in the memory, but the MRF network is scanned in a specific axis direction.
Fig. 2 illustrates a layer structure in which a layer corresponds to an iteration of a message at each node shown in Fig. 1. As shown in Fig. 2, the MRF network structure shown in Fig. 1 is constructed aε a dynamic Bayesian network in which a layer is stacked each time message is repeatedly calculated at each node, the number of iterations *t' corresponding to a layer index "I' . when p(l) represents the coordinate of a node at an 2-th iteration layer, a layer transformation equation of the dynamic Bayesian network that tilts the position of a node for each iteration in a scanning axis direction b= [1 o]T is represented as Equation 4.
[Equation 4]
Figure imgf000005_0001
(A=[I O]' )
Figs. 3A and 3B show the vertical -rearrangement result of Po(I) nodes according to Equation 4, i.e., a layer- transformed FBP (Fast Belief Propagation) structure and a message update sequence .
[Equation 5]
Figure imgf000005_0002
As shown in Equation 5, the node p(l) and the node p(l-l) differ from each other by an offset -[1O]1 on the layer structure .
In the dynamic Bayesian network structure, nodes are grouped, and nodes at the same layer in a group are then parallel-processed to obtain messages thereof. To be specific, messages of nodes at the previous layer in a group are read from a local buffer of the group, and messages of nodes within the adjacent group are read from a layer buffer in which messages of nodes in the previously processed group are stored.
As shown in Pigs. 3A and 3B, the final iteration message is calculated as the layer buffer is right-shifted, i.e., as the layer buffer moves in a positive direction of the p0 axis. That is, messages of nodes in a group are calculated in parallel and stored in the local buffer. The messages stored in. the local buffer are used to process nodes at the next (upper) layer in the group. Further, the messages stored in the local buffer are to be stored in the layer buffer for use in processing messages of nodes in a next group. Accordingly, the messages can be processed by using a small layer buffer and a small local buffer.
However, as described above, when the conventional BP technique is applied to stereo matching, a large number of iterations is needed, and thus, the size of the layer buffer for fast EP (FBP) is increased by the large number of iterations. Considering that semiconductor and information communications technologies are being rapidly developed at present, there is a need for a fast systolic array for BP based stereo matching that can reduce the size of the layer buffer using the characteristics of the hierarchical BP structure, and a method thereof.
In the hierarchical BP structure, convergence of messages is rapidly performed from a coarse level to a fine level within a abort iteration time by using data costs at different K scale levels. However, even if the rapid convergence is made due to the hierarchical structure, a large memory is still needed, in N0 x N1 left and right images, when the number of iterations at each level, the number of states, and the size of the state cost are if, S, and B bits, respectively, the size of a message memory becomes 4NiN0SB bits and the size of a data cost memory becomes N1N0SB bits. Therefore, the total memory size becomes 5N1N0SB. Further, since the number of nodes according to the scale level k xs (N!/2k) x (ND/2k) , the total computation amount becomes
∑SZ*(ΛV2x)(W0/2*).
Summary of the Invention
in view of the above, the present invention provides a BP based fast systolic array that can perform parallel computation with a compact memory by using hierarchical BP based characteristics while reducing a total memory size when the number of iterations is small, and a method thereof.
In accordance with an aspect of the present invention, there is provided a belief propagation based fast systolic array, wherein a hierarchical dynamic Bayesian network of nodes corresponding to pixels of input left and right image pixel data is generated in consideration of an iteration axis and scale levels, and messages on the generated dynamic Bayesian network are updated in a specific axis direction on a Markov random field. In accordance with another aspect of the present invention, there is provided a belief propagation based fast systolic array method, including:
(a) storing left and right image pixel data input by raster scanning; and
(b) outputting a disparity image fast and in parallel using the pixel data stored at the step (a) . In accordance with the present invention, parallel calculation i9 performed with a compact memory by using hierarchical BP based characteristics while the total memory size is reduced when the number of iterations is small. With the reduction in the memory size, fast parallel processing can be carried out by using a compact distributed memory provided in an existing VLSI (Very Large Scale Integration) chip and a parallel processor for accessing the memory. As well as the fast parallel processing can be carried out by using small amount of memory resources, the message update of the present invention can be performed by simple integer calculation, and thus, the fast systolic array of the present invention can be manufactured as a compact parallel VLSI chip, e.g. , PPGA (Field-Programmable Gate Array) or an ASIC (Application-specific Integrated Circuit) , having a small memory size . Therefore, a complex image processing system can be manufactured as an inexpensive and small device which performs fast real-time processing. Brief Description of the Drawings
The above features of the present invention will become apparent from the following description of embodiments, given in conjunction with the accompanying drawings, in which:
Fig. 1 illustrates an MRF network for stereo matching and a conventional BP update rule;
Fig. 2 illustrates a layer structure in which a layer corresponds to an iteration of a message at each node shown in Fig . 1 ;
Figs .3A and 3B respectively illustrate an explanatory view of a layer-transformed FBP (Fast Belief Propagation) structure and a message update sequence; Fig. 4 illustrates a dynamic Bayesian network having a hierarchical BP structure;
Figs .5A to 5D respectively illustrate a layer-transformed hierarchical structure;
Figs. 6A to 6D respectively illustrate a sequence diagram when Figs. 5A to 5D are viewed from a different angle;
Fig. 7 illustrates a detailed structure of a layer buffer and a local buffer;
Figs. 8A and 8B illustrate the message update processes in the structure before layer transformation and in the layer-transformed structure, respectively;
Fig. 9 illustrates a data cost reading process in the layer-transformed hierarchical FBP structure shown in Figs. 5A to 5D;
Fig. 10 illustrates a configuration of a BP based fast systolic array for use in stereo matching in accordance with the present invention;
Fig. 11 illustrates systolic array architecture of an FBP stereo matching module;
Fig. 12 illustrates a detail view of a processing element (PE) group; Fig. 13 illustrates architecture of a data cost module;
Fig. 14 illustrates a detail view of a module A in the data cost module shown in Fig. 13;
Fig. 15 illustrates a detail view of a module B in the data cost module shown in Fig. 13; Fig. 16 illustrates a buffer distribution structure of PEs in a PE group;
Fig. 17 illustrates a detail view of a PE module;
Figs. 18A and 18B respectively illustrate a forward processor in the PE module shown in Fig. 17; Figs. 19A and 19B respectively illustrate a backward processor in the PE module shown in Fig. 17;
Fig. 20 illustrates an FBP stereo matching sequence on nodes of a FBP stereo matching module;
Fig. 21 illustrates a flowchart of the FBP stereo matching sequence;
Fig.22 illustrates a sequential calculation procedure for message update within a group in the FBP stereo matching sequence of Fig. 21;
Fig. 23 illustrates a parallel calculation procedure for message update within a group in the FBP stereo matching sequence of Fig. 21;
Fig.24 illustrates a calculation sequence in the data cost module;
Fig .25 illustrates a local index for buffer update in Figs . 22 and 23; and Fig. 26 illustrates a comparison result of an error rate with other real-time stereo matching systems.
Detailed Description of the Invention
Embodiments of the present invention will be described in detail with reference to the accompanying drawings, which form a part hereof .
Fig. 4 illustrates a dynamic Bayesian network having a hierarchical BP structure . As shown in Fig. 4, as the number of nodes increases from a coarse level to a fine level, iteration layers are formed. The number of nodes at a level k is Nχ/2k x N0/2k on an Ni x N0 MRF network .
If the coordinate of a node at the level k and a k-level iteration layer on the dynamic Bayesian network are represented by Pk=(Po>Pι) and lke [0,Lk-l] , respectively, a layer transformation equation to tilt the position of a node at each iteration in a scanning axis direction b = [10] τ is as in Equation 6, when different scale characteristics at a previous level is taken into consideration.
[Equation 6] jt+l J=K-I
=ak-lk+2(pk+1(LM))
Figure imgf000012_0001
Here, ak is an offset generated by a scale difference between levels k and k-1 with respect to pk"1 at the coarsest level .
If the nodes on pl{lk) are vertically re-arrayed in the layer structure according to Equation 6, a sequence in the layer-transformed hierarchical structure shown in Figs. 5A to 5D is obtained. Figs. 6Α to 6D respectively illustrate a sequence diagram when Figs . 5A to 5D are viewed from a different angle. Fig. 9 illustrates a data cost reading process in the layer-transformed hierarchical FBP structure shown in Figs. 5A to 5D. Next, as shown in Fig. 7, all nodes on the pk(lk) axis are grouped and a processor is positioned to each node. Then, the processors within a group perform parallel processing, such that the MRF network is scanned in the p0 k(lk) axis direction. To be specific, messages of nodes at the previous layer in a group are read from a local buffer of the previous layer, and messages of nodes within a group of a previous line are read from a layer buffer. Further, as shown in Figs. 5A to 5D, a message at the final iteration is calculated as the layer buffer is right-shifted, i.e. , as the layer buffer moves in a positive direction of the pl(lk) axis. The messages of the nodes in the group are calculated in parallel and stored in the local buffer to be then used in processing a next (upper) layer. Further, the messages are also stored in the layer buffer to be then used in processing messages of a next group. As a result, the same result as of the hierarchical BP structure can be obtained using a small layer buffer and a small local buffer.
Fig. 10 illustrates a configuration of a BP based fast systolic array for use in stereo matching in accordance with the present invention.
Referring to Fig. 10, the BP based fast systolic array includes : an image buffer 10 that receives and temporarily stores left and right image pixel data input by a raster scan method; and an FBP stereo matching module 13 that outputs a disparity image fast and in parallel by using the left and right pixel data output from the image buffer 10.
As shown in Fig. 11, the FBP stereo matching module 13 includes a plurality of PE (Processing Element) groups which exchanges messages and pixel data with each other. The FBP stereo matching module 13 supports fast parallel processing. As shown in Fig. 12, each PE group includes: a data cost module 13a that receives pixel data and calculates data costs; a plurality of multiplexers (MUX) 13b that receives the data costs from the data cost module 13a and messages from neighboring PE groups and selects a desired output; a plurality of processing elements PE 13c that calculates a new message by using the output of the MUXs 13b; a plurality of local buffers 13d that stores a result value of the PEs 13c; and a layer buffer 13e that stores the result value of the local buffers 13d again. Each PE 13c includes: an adder that sequentially reads and adds the data costs and the messages of the previous layer by states; a forward processor that receives the output of the adder and outputs a forward processor cost; a forward stack that receives and stores the forward processor cost; a backward processor that receives an output value of the forward stack and outputs a backward processor cost; a backward stack that stores an output value of the backward processor; a normalizer that receives an output value of the backward stack and calculates a final message; and a buffer that stores an output value of the normalizer.
The forward processor includes : a first forward processor that initializes a first delay buffer, reads an input cost value at each step, compares the input cost value with a value obtained by adding a constant value to a previous value of the first delay buffer, stores a minimum value among them in the first delay buffer, and outputs the minimum value; and a second forward processor that initializes a second delay buffer, calculates a minimum value of an input cost of the second delay buffer, and outputs a value obtained by adding a constant value to the minimum value . The backward processor includes: a first backward processor that initializes a first delay buffer, reads an input cost value at each step, compares the input cost value with a value obtained by adding a constant value to the value of the first delay buffer, stores a minimum value among them in the first delay buffer, compares an output value of the first delay buffer with an output value of the forward processor, and outputs a minimum value among them,- and a second backward processor that initializes a second delay buffer to λ0', stores in the second delay buffer a value obtained by adding the output value of the first delay buffer to the value of the second delay buffer at each step, and shifts and outputs the value of the second delay buffer by a specific value.
The normalizer outputs a value obtained by subtracting the value calculated by the second backward processor from the value calculated by the first backward processor, thereby calculating a message.
Specifically, if the number of levels is K, the PE group has 2K'X PEs 13c in total (see Figs. 11 and 12) . Accordingly, Ni/2K'1 PE groups are needed in an N1 x N0 image. Since the number of nodes at each level varies according to the coarse-to-fine scale characteristics, when the FBP stereo matching sequence operates at level k within the PE group, only N1/2k PEs operate in parallel. As shown in Fig. 16, each PE has a local buffer for each level and a layer buffer, and accesses to these buffers through the MUX. As shown in Fig. 13, the data cost module is a logic for calculating data costs and performs a function as in Equation 7.
[Equation 7]
Figure imgf000016_0001
As shown in Fig. 14, a module A in the data cost module includes: registers that store left and right pixel data gr(po, Pi+d) and g1(po, Pi); and a logic that calculates an absolute difference between the register values. That is, a data cost D'p(d) is an output value of the module A. The right pixel data is shifted by a shift logic to a neighboring register by a value d in Equation 7, and thus the data cost D'p(d) becomes output for each value d. As shown in Fig. 15, a module B in the data cost module is a logic that performs an operation of Equation 8 to calculate the data cost Dp(d) . As shown in Fig. 13, two neighboring data costs D'p(d) are added at each level k, and thus the final data cost Dk(d) is calculated by adding 2k scan lines using a register and an accumulator.
[Equation 8]
Figure imgf000017_0001
To be specific, as shown in Fig. 24, the data cost Dp k(d) in the accumulator is initialized first, and left and right scan lines corresponding to each e0e[0, 2k-l] are loaded. Then, the value of Equation 9 is accumulated to the data cost Dp h(d) . Here, each data cost for the value d needs to be calculated.
[Equation 9]
B1=O
Meanwhile, the FBP stereo matching sequence in the FBP stereo matching module using the data costs on the N0 x Ni MRF network is as follows .
<FBP Stereo Matching Sequence>
Figure imgf000017_0002
for node pξ from 0 to N0^"1
Message_update ( />o , 0 , K- I) for aκ"2 ffrroomm 00 ttoo 11 P0 (U)=α +2(p0 ~L ) Message__update ( pξ 2 , aκ~2 , K- 2 )
for a0 from 0 to 1
Figure imgf000018_0001
Message_update( Po(O), a0, 0)
State_estimation ( /?°(0) , L0)
As described above, like an FBP stereo matching sequence on a node of an FBP stereo matching module shown in FIG. 20, even the finest level node above a node pϋ k~λ can be processed via a depth-first tree sequence due to the coarse-to-fine scale characteristics .
That is, the processors at all of the nodes on the
Figure imgf000018_0002
axis are grouped for each ρQ k(lk) , and performs the Message_update function in parallel within the group via the depth-first tree sequence. Then, the State_estimation function is performed at a final layer L0 to determine the disparity value.
Fig. 21 illustrates a flowchart of the FBP stereo matching sequence. Fig. 22 and Fig. 23 respectively illustrate a sequential calculation procedure and a parallel calculation procedure for message update within a group in the FBP stereo matching sequence of Fig. 21. Fig. 25 illustrates a local index for buffer update in Figs. 22 and 23.
Each function in the FBP stereo matching sequence will be described below. 1. Message_update(/?o(O) , ak, k) for each layer lk from 1 to Lk for each parallel processor he [(0,0), (0,Nk-l) a. Message_calculation in local buffer
Figure imgf000019_0001
if lk = 1, then
Figure imgf000019_0002
Nh=Nb(h+(ak-I)[I O]7')RSLANT(s+(ak-l)[l θf)
Figure imgf000019_0003
otherwise
Figure imgf000019_0004
Nh=Nb(h+(ak -Y)[I Qf)RSLANT(s+(ak -l)[l θf) b. Buffer_update in layer buffer, for next group processing a) in case of data cost
Figure imgf000019_0005
b) in case of message
Mc k d{d,lk -T)=Mk b(d,lk -1) (1) Downward propagation message: propagation offset
Figure imgf000019_0006
θ]r for α from 1 to 0
Figure imgf000019_0007
(2) Leftward and rightward propagation message: propagation offset
Figure imgf000020_0001
a=h,a=h+nb
Figure imgf000020_0002
2. State_estimation( p°(0) , L0) for each parallel processor he [(0,0), (0,N°-l)]
Figure imgf000020_0003
In the layer structure, if a message in a layer lk and a local index s within a group is expressed by Mk s(ds,lk) , this message corresponds to a message m *,/t_n ι(/iv(rf if/κ) at k level in the
MRF network . In order to calculate the estimated MAP state dp(L° +1) and the message MfJS(ds,lk) , the edge cost Vhs(dh, ds) , the data cost D^(dh) , and neighboring messages Mk h(dh,lk -V) are needed.
As described in Patent Reference 1, the edge cost VhS(dh, ds) can be calculated without using a memory when a truncated linear function
Figure imgf000020_0004
—ds ,KV) with parameters αv and Kv is used.
A case where the condition lk ≠ 1 is satisfied at each level is first taken into consideration.
In the message update and state estimation, it should be noted that a node on the MRF network corresponds to nodes having an offset -[I 0]τ between different iteration layers in the layer-transformed structure, as shown in Equation 5. Accordingly, Nb (h) /s is changed to Nb (h- [10] τ) / (s- [10] τ) by the layer transformation.
The Message calculation function accesses the layer buffer and the local buffer as follows. A local index U0 of a node is in a range -2 ≤ U0 ≤= 0. If a node is within a group, i.e., U0
= 0, the data cost and the message of the node at the previous layer are read from the local buffer. If -2 ≤ u0 < 0, the node belongs to the previous group and the data cost and the message of the node at the previous layer are read from the layer buffer.
Here, a message that is calculated by the function is stored in the local buffer.
Dh (d) may be read from a layer buffer D , γ(d) of a previous layer to the local buffer by p(l)=p(l-\)-[l Of . When the condition lk = 1 is satisfied at each level, the previous layer has a different scale level, and thus special regard needs to be paid as follows .
That is, if the coarsest level k is K-I, a message is initialized to "0". If k is not K-I, a previous level message M^(dh,,Lk+1) is read from the local buffer. Meanwhile, if the condition lk=l is satisfied, a data cost
Dh k(dh)=π k (dh,Jc)=D' k,χ oγ(dh,k) is read from the data cost module. Next, the Buffer_update function performs layer buffer update in such a manner that a local " satisfying the condition U0 = 0 is shifted to the next smallest index, i.e. , a layer buffer, like a data cost module calculation sequence shown in Fig. 24. The State estimation function outputs d o,,Ojl, using a message M°h(dh,L°) of an L0 layer at a level 0. Here, d 0 0+
becomes a disparity value corresponding to {pi -(Lk +1+ ∑LJ2J~k),p°)
of an output disparity image. The messages of nodes satisfying -2 ≤ u < 0 are read from the layer buffer. As shown in FigS . 5A to 5D, when u0 = -1, messages toward three neighboring nodes in three directions are required for N1 nodes in total, and when U0 = -2, a message in one direction is required. Accordingly, the number of messages that are stored for each layer is 4Ni in total. When the number of states is S and the size of the state cost is B bits, the size κ-\ of the layer buffer for all of the messages is ∑ASBϋN* bits.
4=0
Since the local buffer only stores messages in all
directions of the current layer, the message memory size is
JJff--II
Σ ∑4SBN? bits .
A=O'
As for the data cost size, only a case where h0 = -1 needs to be considered, as shown in Figs. 6A to 6D. Accordingly, the
AT-I K-X layer buffer is ∑SBϋN* bits , and the local buffer is ∑SBN*
Jt=O Jt=O bits. Therefore, the total memory size of the FBP stereo
matching module is ∑5SB(Lk +I)Nf bits. Here, the existing hierarchical BP memory size is 5NiN0SB bits. Accordingly, when Lk is sufficiently small, the memory size of the FBP stereo matching module becomes smaller κ-\ N0/∑(£Lk +l)/2k) times. The calculation speed becomes faster by fc=0 Nf times by N1 4 parallel processors at the level k, and thus a calculation speed is faster by approximately Ni times in total.
Meanwhile, an FBP scanning sequence may be implemented by a VLSI chip in which a plurality of processors read messages from neighboring processors to perform parallel calculation, as shown in Figs. 8A and 8B. Alternatively, a PC may sequentially read the messages to perform calculation.
Below, a PE calculation architecture will be described.
[Equation 10] mo(ds)=mindiι(Vhs(dh,ds)+msum(dh))
Figure imgf000023_0001
As shown in Equation 10, the PE is a logic that calculates a new message mo(ds) by using Vhs(dh, ds) and msum(dh) . The present invention suggests a new PE architecture which can reduce, when a message has the state size of "A", the calculation amount from 0(A2) to O (3A) by using the distance transform characteristics disclosed in Patent Reference 1. Here, the PE has the forward processor, the backward processor, and the normalizer, as shown in Fig. 17. Further, the new PE architecture is suitable for VLSI implementation because of its simple calculation procedure using only an adder, a subtracter, a shifter, and a comparator.
Below, "B" presents an allowable maximum value.
Forward Processor: Di(-l) = B, D2(-l) = B For t from 0 to A-I,
^(O=A(O,A(O=HUn(W^(O,A('-i)+cv) mf{-l)=D2{A-\)+K,D2{t)=mm{msum{t\D2{t-\))
Backward Processor: D3 (-1) = B, D4(-l) = 0 For t from 0 to A-I mb(t)=mm(Ds(t),mf(~\)),D3(t) = mm(mf(A~l-t),D3(t-l)+Cv) mb(-l)=D4(A-l)/A,D4(t)=mb(t)+D4(t-l)
Normalizer:
For t from 0 to A-I mo(t)=mb(t)-mb{-\)
As described above, in the PE module shown in Fig. 17, the forward processor outputs msum(t) , which is the sum of the message and the data cost, as mf(t) . Here, τrif(t) is stored in the stack and used by the backward processor to output τnb(t) . The normalizer receives mb(t) and calculates mo(t).
Figs. 18A and 18B respectively illustrate a forward processor in the PE module shown in Fig. 17. In Figs. 18A and 18B, the input cost msum(t) represents sequential input of a vector where t is in a range from 0 to A-I. A first forward processor shown in Fig. 18A initializes a delay buffer D1C-I) to λλB" and adds the input cost at each step. The newly calculated value Dx (t) is compared with Di(t-1)+Co calculated at the previous step and the minimum value is calculated as ιτif(t) at the current step. A second forward processor shown in Fig.18B calculates the minimum value of msum(t) by using a delay buffer D2 (t) , and adds Kv to the minimum value to output nrif (-1) . Figs. 19A and 19B respectively illustrate a backward processor in the PE module shown in Fig. 17.
A first backward processor shown in Fig. 19A initializes a delay buffer D3 (-1) to "B" and reads the state value rrif (t) of the forward cost at each step, mf (t) is compared with D3 (t-1) +CV calculated at the previous step, and the minimum value is set as D3 (t) at the current step. D3 (t) is compared with an input parameter mf(-l) again, and a smaller value is calculated and output as mb(t) . In a second backward processor shown in Fig. 19B, a delay buffer D4 (t) is initialized to "0" at the beginning, and mb(t) is added thereto at each step. At the final step, D4(A-I) is right-shifted by "A" and then output. The normalizer in the PE module shown in Fig. 17 subtracts the output value πib(-l) of the second backward processor from the output value mb(t) of the first backward processor, and finally outputs mo(t) . Accordingly, in case of a Middlebury test image, when the total number of scale levels is 4 and Lk is allocated with (5, 5, 10, 5) in a coarse-to-fine manner, the present invention shows an excellent low error result, as shown in Fig. 26. Particularly, in case of a 436 x 383 image, the memory size is
K-I reduced N0/∑((Lk +Y)/2k) = 28 times, and the calculation speed
M) becomes faster 436 times by using 436 parallel processors.
While the invention has been shown and described with respect to the embodiments , it will be understood by those skilled in the art that various changes and modification may be made without departing from the scope of the invention as defined in the following claims.

Claims

What is claimed is:
1. A belief propagation based fast systolic array, wherein a hierarchical dynamic Bayesian network of nodes corresponding to pixels of input left and right image pixel data is generated in consideration of an iteration axis and scale levels, and messages on the generated dynamic Bayesian network are updated in a specific axis direction on a Markov random field.
2. The belief propagation based fast systolic array of claim
1, comprising: an image buffer that stores the left and right image pixel data input by raster scanning; and a fast belief propagation (FBP) stereo matching module that outputs a disparity image fast and in parallel by using the pixel data output from the image buffer.
3. The belief propagation based fast systolic array of claim
2, wherein the FBP stereo matching module has a systolic array architecture including a plurality of parallel processing element groups, each processing element group calculating the messages and disparity values in parallel while transmitting the messages and the pixel data to a neighboring processing element group .
4. The belief propagation based fast systolic array of claim 3, wherein each of the processing element groups includes: a data cost module that receives the pixel data and calculates data costs; a plurality of multiplexers that receives the data costs and the messages from the data cost module and from a neighboring processing element group, respectively, and selects desired messages ; a plurality of processing elements that calculates new messages by using the desired messages selected by the multiplexers; a plurality of local buffers that stores the new messages calculated by the processing elements; and a plurality of layer buffers that stores the new messages stored in the local buffers.
5. The belief propagation based fast systolic array of claim
4, wherein the data cost module includes: a plurality of first modules , each first module calculating and outputting an absolute difference between the left and the right pixel data corresponding to each disparity value; and a plurality of second modules, each second module calculating final data costs at each scale level by using outputs of the first modules.
6. The belief propagation based fast systolic array of claim
5, wherein each of the first modules includes: a series of left registers that stores the left pixel data; a series of right registers that stores the right pixel data; and a logic that calculates the absolute difference by using outputs of the left and the right registers, wherein the right registers are shifted to make the output thereof .
7. The belief propagation based fast systolic array of claim 5, wherein each of the second modules includes: an adder that adds two data costs; a register that stores the addition result of the adder; and an accumulator that accumulates an output of the register.
8. The belief propagation based fast systolic array of claim 4 , wherein the data cost module includes : a plurality of first modules, each calculating and outputting an absolute difference between the left and the right pixel data; and a plurality of second modules, each calculating final data costs for each scale level, wherein each first module calculates the absolute difference by sequentially reading left and right scan lines required to calculate a data cost of a specific node at a specific scale level and storing a series of left and right pixel data of the left and right scan lines in registers of the first module, and wherein each second module calculates the final data costs by adding outputs of neighboring first modules according to the specific scale level and accumulating the addition result for each scan line .
9. The belief propagation based fast systolic array of claim 2, wherein the FBP stereo matching module has an FBP stereo matching sequence in which, in a layer-transformed hierarchical dynamic Bayesian network which is obtained by tilting in a scanning axis direction the positions of the nodes at each iteration on the dynamic Bayesian network, nodes on a line corresponding to the same coordinate on an axis in the Markov random field are processed in parallel and sequentially processed in the axis direction.
10. The belief propagation based fast systolic array of claim 9, wherein, in the FBP stereo matching sequence, for memory scanning of the layer-transformed hierarchical dynamic Bayesian network, upper layer messages are processed by a message update function in a depth-first tree order while nodes on the same coordinate of the scanning axis are processed in parallel at the coarsest level, and a disparity value is calculated by a state estimation function.
11. The belief propagation based fast systolic array of claim 10 , wherein the message update function is performed by the number of iteration layers at each level, and calls a message calculation, function, which is to calculate messages and store the messages in a local buffer, and a buffer_update function, which is to store the messages in the local buffer in a layer buffer to process a group of a next line .
12. The belief propagation based fast systolic array of claim 11, wherein data costs and messages of a previous layer read by the message calculation function are those processed in the previous layer or the group of the previous line.
13. The belief propagation based fast systolic array of claim 11, wherein the message calculation function reads data costs or messages of the previous layer in a group from the local buffer, and reads those out of the group from the layer buffer.
14. The belief propagation based fast systolic array of claim 11, wherein, when a message of a first layer at each level is calculated, the message calculation function sets messages of the coarsest level to zero, reads messages at a previous coarser level from the local buffer for other levels, and reads data costs from data cost modules that receive the pixel data and calculate the data cost.
15. The belief propagation based fast systolic array of claim 11, wherein the buffer update function stores messages and data costs in the local buffer in the layer buffer such that message and data costs of a current group are read from the layer buffer when the message calculation function is performed for a group of the next line on the network.
16. The belief propagation based fast systolic array of claim 10 , wherein the state estimation function reads messages and data costs from the local buffer and the layer buffer after a final iteration at the finest level, adds them for each state, and estimates a state corresponding to the minimum cost as the disparity value.
17. The belief propagation based fast systolic array of claim 3 , wherein, when the number of levels is K7 the processing element group has 21^"1 processing elements in total, Nχ/2k processing elements operate in parallel in an FBP stereo matching sequence at k level, and, wherein each processing element has a local buffer and a layer buffer at each level and access the buffers via a multiplexer.
18. The belief propagation based fast systolic array of claim 4, wherein the local buffer stores currently calculated messages in a group to allow the messages of the previous layer to be accessed for message calculation of a next layer.
19. The belief propagation based fast systolic array of claim 4, wherein the layer buffer stores, on a layer basis, messages of a current group required for message calculation of a group of a next line on the network.
20. The belief propagation based fast systolic array of claim 2, wherein the FBP stereo matching module performs an FBP stereo matching sequence by sequentially accessing buffers with a single processor.
21. The belief propagation based fast systolic array of claim 4, wherein each processing element includes: an adder that sequentially reads and adds the data costs and the messages of the previous layer on a state basis; a forward processor that receives an output of the adder to output a forward processor cost; a forward stack that receives and stores the forward processor cost; a backward processor that receives an output of the forward stack to output a backward processor cost; a backward stack that stores an output of the backward processor; a normalizer that receives an output of the backward stack and calculates a final message; and a buffer that stores an output of the normalizer.
22. The belief propagation based fast systolic array of claim 21, wherein the forward processor includes: a first forward processor that initializes a first delay buffer, reads an input cost at each step, compares the input cost with a value obtained by adding a constant value to a previous value of the first delay buffer, stores the minimum value in the first delay buffer, and outputs the minimum value,- and a second forward processor that initializes a second delay buffer to calculate a minimum value of an input cost, and outputs a value obtained by adding a constant value to the minimum value .
23. The belief propagation based fast systolic array of claim 21, wherein the backward processor includes: a first backward processor that initializes a first delay buffer, reads an input cost at each step, compares the input cost with a value obtained by adding a constant value to the value of the first delay buffer, stores a minimum value in the first delay buffer, compares an output of the first delay buffer with an output of the forward processor, and outputs a minimum value,- and a second backward processor that initializes a second delay buffer to zero, stores in the second delay buffer a value obtained by adding the output of the first delay buffer at each step, shifts the value of the second delay buffer by a specific number of bits, and outputs the sifted value.
24. The belief propagation based fast systolic array of claim 21, wherein the normalizer calculate the message by outputting a value obtained by subtracting the value calculated by the second backward processor from the value calculated by the first backward processor.
25. The belief propagation based fast systolic array of claim 23, wherein the normalizer calculate the message by outputting a value obtained by subtracting the value calculated by the second backward processor from the value calculated by the first backward processor.
26. The belief propagation based fast systolic array of claim 2, wherein the FBP stereo matching module is a VLSI chip that operates only with multiplexers, adders and subtracters for integer operation, comparators, and shifters by using systolic array architecture.
27. A belief propagation based fast systolic array method, comprising:
(a) storing left and right image pixel data input by raster scanning; and
(b) outputting a disparity image fast and in parallel using the pixel data stored at the step (a) .
28. The belief propagation based fast systolic array method of 27, wherein the step (b) is performed by a plurality of parallel processing element groups, each processing element group calculating the messages and disparity values in parallel while transmitting the messages and the pixel data to a neighboring processing element group.
29. The belief propagation based fast systolic array method of claim 28, wherein the step (b) includes: (al) receiving the pixel data and calculating data costs,- (bl) receiving the data costs calculated at the step (al) and receiving messages from neighboring processing element groups to select desired messages;
(cl) calculating new messages by using the message selected at the step (bl) ;
(dl) storing calculation result at the step (cl) in a local buffer; and
(el) storing the result stored at the step (dl) in a layer buffer.
30. The belief propagation based fast systolic array method of claim 29, wherein the step (cl) includes:
(ell) sequentially reading and adding the data costs and messages of a previous layer on a state basis; (cl2) receiving addition result at the step (ell) and outputting a forward processor cost; (cl3) receiving the forward processor cost output at the step (cl2) and storing the received forward processor cost in a forward stack;
(cl4) receiving an output of the forward stack and outputting a backward processor cost;
(cl5) storing an output of the step (cl4) in a backward stack;
(cl6) receiving an output of the backward stack and calculating a final message; and (cl7) storing an output calculated at the step (cl6) in a buffer.
31. The belief propagation based fast systolic array method of claim 30, wherein the step (cl2) includes: initializing a first delay buffer; reading an input cost at each step; comparing the input cost with a value obtained by adding a constant value to a previous value of the first delay buffer; storing a minimum value in the first delay buffer; outputting the minimum value; initializing a second delay buffer; calculating a minimum value of the input cost by using the second delay buffer; and outputting a value obtained by adding a constant value to the minimum value.
32. The belief propagation based fast systolic array method of claim 30, wherein the step (cl4) includes: initializing a first delay buffer; reading an input cost at each step; comparing the input cost with a value obtained by adding a constant value to the value of the first delay buffer; storing a minimum value in the first delay buffer; comparing an output of the first delay buffer with an output of the forward processor; outputting a minimum value; initializing a second delay buffer to zero; storing in the second delay buffer a value obtained by adding the output of the first delay buffer at each step; shifting the value of the second delay buffer by a specific number of bits; and outputting the shifted value.
PCT/KR2008/003280 2007-06-29 2008-06-12 Belief propagation based fast systolic array and method thereof WO2009014314A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020070065082A KR100920227B1 (en) 2007-06-29 2007-06-29 Belief propagation based fast systolic array apparatus and its method
KR10-2007-0065082 2007-06-29

Publications (1)

Publication Number Publication Date
WO2009014314A1 true WO2009014314A1 (en) 2009-01-29

Family

ID=40281534

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2008/003280 WO2009014314A1 (en) 2007-06-29 2008-06-12 Belief propagation based fast systolic array and method thereof

Country Status (2)

Country Link
KR (1) KR100920227B1 (en)
WO (1) WO2009014314A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101877129A (en) * 2010-06-08 2010-11-03 浙江工业大学 Minimal sum cache acceleration strategy based binocular stereo vision matching method for generalized confidence spread
WO2013077522A1 (en) * 2011-11-23 2013-05-30 Lg Innotek Co., Ltd. Apparatus and method for hierarchical stereo matching
CN104966303A (en) * 2015-07-21 2015-10-07 兰州理工大学 Disparity map refinement method based on Markov random field

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101094896B1 (en) 2010-02-16 2011-12-15 한국과학기술원 Apparatus and Method for realizing multimedia

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1175104A2 (en) * 2000-07-19 2002-01-23 Pohang University of Science and Technology Foundation Stereoscopic image disparity measuring system
KR20020032954A (en) * 2000-10-28 2002-05-04 김춘호 3D Stereosc opic Multiview Video System and Manufacturing Method
KR20060023714A (en) * 2004-09-10 2006-03-15 학교법인 포항공과대학교 System and method for matching stereo image
KR20070063063A (en) * 2005-12-14 2007-06-19 주식회사 제이앤에이치테크놀러지 Stereo vision system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1175104A2 (en) * 2000-07-19 2002-01-23 Pohang University of Science and Technology Foundation Stereoscopic image disparity measuring system
KR20020032954A (en) * 2000-10-28 2002-05-04 김춘호 3D Stereosc opic Multiview Video System and Manufacturing Method
KR20060023714A (en) * 2004-09-10 2006-03-15 학교법인 포항공과대학교 System and method for matching stereo image
KR20070063063A (en) * 2005-12-14 2007-06-19 주식회사 제이앤에이치테크놀러지 Stereo vision system

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101877129A (en) * 2010-06-08 2010-11-03 浙江工业大学 Minimal sum cache acceleration strategy based binocular stereo vision matching method for generalized confidence spread
WO2013077522A1 (en) * 2011-11-23 2013-05-30 Lg Innotek Co., Ltd. Apparatus and method for hierarchical stereo matching
US9390507B2 (en) 2011-11-23 2016-07-12 Lg Innotek Co., Ltd. Apparatus and method for hierarchical stereo matching
TWI576790B (en) * 2011-11-23 2017-04-01 Lg伊諾特股份有限公司 Apparatus and method for hierarchical stereo matching
CN104966303A (en) * 2015-07-21 2015-10-07 兰州理工大学 Disparity map refinement method based on Markov random field
CN104966303B (en) * 2015-07-21 2018-02-06 兰州理工大学 A kind of disparity map refined method based on Markov random field

Also Published As

Publication number Publication date
KR100920227B1 (en) 2009-10-05
KR20090001026A (en) 2009-01-08

Similar Documents

Publication Publication Date Title
US6862035B2 (en) System for matching stereo image in real time
US6456660B1 (en) Device and method of detecting motion vectors
US20010028681A1 (en) Motion estimator
EP3293700B1 (en) 3d reconstruction for vehicle
WO2009014314A1 (en) Belief propagation based fast systolic array and method thereof
US20090315976A1 (en) Message propagation- based stereo image matching system
Li et al. High throughput hardware architecture for accurate semi-global matching
US7545974B2 (en) Multi-layered real-time stereo matching method and system
JP6567381B2 (en) Arithmetic apparatus, method and program
Strand et al. Distance transforms for three-dimensional grids with non-cubic voxels
WO2009002031A2 (en) Belief propagation based fast systolic array system and message processing method using the same
US5859672A (en) Image motion detection device
Urhan et al. Single sub-image matching based low complexity motion estimation for digital image stabilization using constrained one-bit transform
Lee et al. MAP-based stochastic diffusion for stereo matching and line fields estimation
Park et al. VLSI architecture for MRF based stereo matching
Jeong et al. Fast stereo matching using constraints in discrete space
CN112508996A (en) Target tracking method and device for anchor-free twin network corner generation
CN111626368B (en) Image similarity recognition method, device and equipment based on quantum algorithm
Hariyama et al. VLSI processor for reliable stereo matching based on window-parallel logic-in-memory architecture
Randall et al. Investigations of the self organising tree map
Geiselmann et al. Hardware to solve sparse systems of linear equations over GF (2)
Lee et al. Hierarchical stochastic diffusion for disparity estimation
Park et al. A high-speed parallel architecture for stereo matching
Huq et al. Efficient BP stereo with automatic paramemeter estimation
KR20000032857A (en) Device for motion estimation

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08766242

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08766242

Country of ref document: EP

Kind code of ref document: A1