WO2009054683A1 - System and method for real-time stereo image matching - Google Patents

System and method for real-time stereo image matching Download PDF

Info

Publication number
WO2009054683A1
WO2009054683A1 PCT/KR2008/006267 KR2008006267W WO2009054683A1 WO 2009054683 A1 WO2009054683 A1 WO 2009054683A1 KR 2008006267 W KR2008006267 W KR 2008006267W WO 2009054683 A1 WO2009054683 A1 WO 2009054683A1
Authority
WO
WIPO (PCT)
Prior art keywords
value
decision value
processing elements
pixel
disparity
Prior art date
Application number
PCT/KR2008/006267
Other languages
French (fr)
Inventor
Hong Jeong
Sung Chan Park
Young Su Kim
Original Assignee
Postech Academy-Industry Foundation
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Postech Academy-Industry Foundation filed Critical Postech Academy-Industry Foundation
Publication of WO2009054683A1 publication Critical patent/WO2009054683A1/en

Links

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/239Image signal generators using stereoscopic image cameras using two 2D image sensors having a relative position equal to or related to the interocular distance
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/204Image signal generators using stereoscopic image cameras
    • H04N13/243Image signal generators using stereoscopic image cameras using three or more 2D image sensors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N13/00Stereoscopic video systems; Multi-view video systems; Details thereof
    • H04N13/20Image signal generators
    • H04N13/296Synchronisation thereof; Control thereof
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N2213/00Details of stereoscopic systems
    • H04N2213/001Constructional or mechanical details

Definitions

  • the present invention relates to a system and method for real-time stereo image matching using multiple cameras; and, more particularly, to a system and method which can accurately extract 3-dimensional (3D) distance information on a thin object spaced apart from a background by finding corresponding points in multiple scan lines of multiple images in real time.
  • 3D 3-dimensional
  • stereo matching is a core process of a real-time stereo image processing system.
  • the stereo matching refers to a re-creating process of 3D spatial information from a pair of 2D images by using geometric relations therebetween as shown in Fig. 1.
  • B and Z represent a focal length, a base line and a depth, respectively.
  • the stereo matching employs a method for finding a pixel on an image line corresponding to an epipolar line in each of a left and a right image, respectively, the pixels thus found corresponding to an identical point of
  • the depth Z is a geometric characteristic calculated from the disparity. That is, the disparity has distance information. Hence, 3D distance and shape information on an object in an observation space can be measured by calculating the disparity in real time from left and right images.
  • the real-time stereo image matching system functions, for example, as a visual device of a robot in industries, a road recognition device of an autonomous vehicle and a visual device of a toy robot or the like in home electronics. Further, the stereo image matching system can be used for producing a 3D map along with an artificial satellite.
  • the above-described prior art system may extract only a 3D distance value for the background without extracting a 3D distance value for the object due to the information of the background.
  • the present invention provides a system and method for realtime stereo image matching using multiple cameras.
  • 3D distance information on a thin object spaced apart from a background can be extracted accurately, and also, horizontal noises can be reduced. Accordingly, 3D distance and shape information on an object in an observation space can be measured more accurately, thereby improving matching reliability.
  • a system for real-time stereo image matching including: an image processor for converting image signals taken by each of a plurality of cameras into digital images; a plurality of input buffers for extracting interpixel data from the digital images provided from the image processor and reordering the interpixel data; a processing element array having a plurality of processing elements, wherein each of the processing elements sequentially receives the reordered interpixel data and performs image matching by using a disparity value outputted from the processing element of the upper line at previous clock to produce an optimal decision value or a disparity value; and an encoder for encoding the disparity value or the optimal decision value.
  • a method for real-time stereo image matching in a system including a processing element array having a plurality of processing elements, the method including: converting image signals taken by each of a plurality of cameras into digital images; extracting in- terpixel data from the digital images and reordering the interpixel data; producing an optimal decision value or a disparity value by sequentially providing to each of the processing elements the reordered interpixel data to perform image matching using a disparity value provided from the processing element of the upper stage at a previous clock; and encoding the disparity value or the optimal decision value by using differential coding.
  • FIG. 1 illustrates an explanatory view of a stereo matching using two cameras
  • FIG. 2 illustrates a block diagram of a systolic architecture for image matching using multiple cameras in accordance with an embodiment of the present invention
  • FIG. 3 illustrates a detailed block diagram of the image matching unit in Fig. 2;
  • FIG. 4 illustrates a detailed block diagram of the input buffers in Fig. 3;
  • FIG. 5 illustrates a detailed block diagram of the forward processor in Fig. 3;
  • FIG. 6 illustrates a detailed block diagram of the backward processor in Fig. 3;
  • Fig. 7 illustrates a table for explaining the operation of the encoder in Fig. 3;
  • FIG. 8 illustrates an exemplary view of the input buffer in a case where three cameras are employed
  • Fig. 9 illustrates a structure of a processing element array in a case where three cameras are employed
  • Fig. 10 illustrates an exemplary view of the forward processor in Fig. 9 in a case where three cameras are employed;
  • FIG. 11 illustrates a structure of a processing element array in a case where an N- number of cameras are employed.
  • Fig. 12 illustrates a flowchart for explaining parallel processing performed by an M- number of processing elements as clock t increments from 1 to 2(M-I).
  • FIG. 2 illustrates a block diagram of a systolic architecture for image matching using multiple cameras in accordance with an embodiment of the present invention.
  • An image matching system of the present invention includes an N-number of cameras 10-1 to 10-N, an image processor 20, and an image matching unit 30 and a user system 40.
  • Each of the N-number of cameras 10-1 to 10-N takes an image of an object, and provides an image signal to the image processor 20.
  • the image processor 20 converts image signals of objects from the respective N- number of cameras 10-1 to 10-N into digital signals to provide an N-number of image data to the image matching unit 30. Under the control of the image processor 20, the N-number of image data are also stored in a storage unit 50.
  • the image matching unit 30 sequentially receives pixel data on image lines corresponding to an identical epipolar line in the N-number of digital images provided from the image processor 20 to calculate disparity values.
  • the calculated disparity values are provided to the user system 40.
  • the image matching unit 30 repeatedly performs the above-described operations to thereby provide to the user system 40 the calculated disparity values for all epipolar lines.
  • the user system 40 is a system that uses distance data based on the disparity values provided from the image matching unit 30.
  • the user system 40 may include a visual device of a robot in industries, a road recognition device of an autonomous vehicle, a visual device of a toy robot or the like in home electronics and a 3D map building system along with an artificial satellite.
  • the image matching unit 30 includes an N-number of input buffers 31-1 to 31-n, a processing element unit 33 and an encoder 35.
  • Each of the N-number of input buffers 31-1 to 31-n extracts interpixel data from each of the N-number of digital images provided from the image processor 20 and reorders the interpixel data, respectively.
  • each of the input buffers 31-1 to 31-n includes a control unit 31a, two D flip-flops 31b and 31c and a calculator 3 Id.
  • the control unit 31a generates enable signals for determining whether or not the D flip-flops 31b and 31c will receive the pixel data, and provides the enable signals to the D flip-flops 31b and 31c. Further, the control unit 31a generates a pixel index and provides the pixel index to the calculator 3 Id.
  • Each of the D flip-flops 31b and 31c stores one pixel value based on the enable signal from the control unit 31a, and provides the pixel value to the calculator 3 Id.
  • the calculator 3 Id obtains weighted sum of two pixel values (i.e., pixel 1 a and pixel
  • each of the D flip-flops 31b and 31c may be of a register, and the calculator 3 Id may be of a multiplier, an adder and a divider.
  • the processing element unit 33 may be implemented with a number of processing elements in the form of a processing element array, the number of processing elements corresponding to a specific maximum disparity value.
  • Each of the processing elements which includes a forward processor 33a, a trellis queue (e.g., a stack) 33b and a backward processor 33c, can exchange information with its adjacent processing elements.
  • the processing element unit 33 is configured to operate at the maximum speed regardless of the number of processing elements.
  • the processing element unit 33 sequentially receives the reordered interpixel data and performs image matching by using the disparity values of a processing element on the upper line processed at a previous clock to produce decision values (to be described later) or the disparity values of the current processing line.
  • the forward processor 33a operates in synchronism with a clock.
  • the forward processor 33a receives proper pixel values from the image line corresponding to the epipolar line of each multiple images to calculate a decision value
  • V r denotes a decision value calculated by the forward processor 33a of a j-th processing element at a t-th clock), and then stores the decision value v* in the trellis queue 33b.
  • the forward processor 33a includes an absolute value calculator 33aa, a multiplexer 33ab, an adder 33ac and a flip-flop 33ad.
  • the absolute value calculator 33aa calculates a matching cost using the difference between the N-number of pixel data values.
  • the multiplexer 33ab determines, assuming that processing elements vertically adjacent in the processing element unit 33 are referred to as a top, a middle and a bottom processing element respectively, a minimum value among an accumulated cost
  • the multiplexer 33ab provides, to the trellis queue 33b, the decision value
  • decision values for the top, middle and bottom processing elements are 1, 0 and -1, respectively
  • decision value v* represents an originating path of the minimum value (i.e., the least cost) among the top, middle and bottom processing elements.
  • the adder 33ac adds the minimum value from the multiplexer 33ab and the matching cost at a current clock (t) from the absolute value calculator 33aa to calculate an accumulated cost
  • the flip-flop 33ad outputs the current accumulated cost
  • output data from the forward processor 33a of the j-th processing element at the time of a clock t includes an accumulated cost
  • the trellis queue 33b stores the decision value
  • the backward processor 33c carries out an operation on the decision value read from the trellis queue 33b to calculate an optimal disparity value.
  • the optimal disparity value is provided to the encoder 35 in synchronism with the clock.
  • the backward processor 33c includes an OR gate 33ca, a one-bit activation D flip-flop 33cb, a demultiplexer 33cc and a tri-state buffer 33cd.
  • the OR gate 33ca receives activation signals
  • Vl O i Vl O i ) ⁇ ( i + F M ⁇ M ) from the adjacent processing elements (i.e., (j-l)th and (j+l)th processing element processors, respectively) and an activation signal
  • ⁇ CO indicates an activation bit value of the backward processor 33c of the j-th processing element at the t-th clock).
  • the OR gate 33ca then performs a logical OR operation of the activation signals a j ⁇ O 1 )5( 1 - V t ⁇ j ⁇ ) '
  • the D flip-flop 33cb temporarily stores the activation bit from the OR gate 33ca while providing an activation bit
  • the demultiplexer 33cc outputs activation signals
  • the tri-state buffer 33cd provides, to the encoder 35, an optimal decision value
  • V Z which represents increment/decrement of disparity depending on the activation bit from the D flip-flop 33cb.
  • the tri-state buffer 33cd outputs the input value as it is, and, the tri-state buffer 33cd becomes a high impedance state elsewhere and there is no output.
  • Fig. 7 illustrates a table for explaining the operation of the encoder 35 in Fig. 3.
  • the encoder 35 employs a differential coding to obtain an output thereof, thereby increasing the compression rate. That is, the encoder 35 does not output the concatenation of the optimal decision values and
  • the decision value representing variations of the path has only three values, e.g., "01”, “10” and “00” for representing “upward”, “downward” and “no change” respectively, and thus “ 11" is a dummy data. Further, since “0110” and “1001” are nearly nonexistent pair of optimal decision values geometrically, they can be considered to be the same as "0000". In consideration of the characteristics of the output disparity value described above, a high compression rate can be obtained by differential coding and exclusion of dummy data.
  • Fig. 8 illustrates an exemplary view of the input buffer in a case where three cameras are employed.
  • a pixel value is inputted to the input buffer once for every two clocks.
  • the D flip-flop 31c is always in an enable state, and the D flip-flop 31b is enabled after the pixel value has been changed and a lapse of one clock.
  • FIG. 9 illustrates a structure of a processing element array in a case where three cameras are employed. As shown in Fig. 9, images of first to third cameras are inputted to each element. Specifically, the images are sequentially inputted in a manner that the image of the first camera is inputted toward a top processing element producing a low disparity value, and the image of the third camera is inputted toward a bottom processing element producing a high disparity value.
  • FIG. 10 illustrates an exemplary view of the forward processor fp in Fig. 9 in a case where three cameras are employed. Pixel values such as
  • FIG. 11 illustrates a structure of a processing element array in a case where an N- number of cameras are employed. As shown in Fig. 11, each of the processing elements has the same structure having a forward processor fp, a stack and a backward processor bp as in Fig. 9. [68] One pixel of each N-number of images is inputted to the forward processor in each processing element. An input pixel value
  • D is a disparity level (if it is infinitely distant (
  • N is the number of cameras (or images) and x is the index of a camera at center. Since the index
  • C is the same as the index of a result disparity, in order to represent the index of the result disparity, the index
  • one processing element array performs parallel computation in synchronism with a time clock to produce a result corresponding to one disparity index for every two time clocks. That is, the index
  • the index of the input pixel value x from the i-th camera in the j-th processing element at the time clock t becomes t AT+ 1 -2/ T + 2(TNM) J (where
  • JC i is an image register value, i.e., an interpixel value, of the j-th processing element of the i-th camera at the clock t
  • JC having the above-described index is inputted to the forward processor fp.
  • a total N-number of pixels i.e., one pixel on an image line corresponding to an epipolar line for each N-number of camera images, are inputted to each processing element.
  • Each processing element exchanges cost values
  • the forward processor fp provides to the stack a direction value -1, 0 or 1.
  • the direction value represents which processing element among the (j-l)-th, j-th and (j+l)-th processing elements has the least cost at (t-l)-th clock, and the values -1, 0 and 1 mean that the processing element having the least cost is the (j-l)-th, j-th and (j+l)-th elements, respectively. Therefore, the direction value informs from which processing element the cost is provided.
  • Fig. 12 illustrates a flowchart for explaining parallel processing performed by an M- number of processing elements as clock t increments from 1 to 2(M-I).
  • the processing elements are initialized in step S1201.
  • all cost registers and image registers in the processing element array are initialized, and then the clock is set to 1.
  • an image pixel from each N-number of cameras is inputted to each processing element in step S 1203.
  • the M-number of processing elements operates in parallel in step S 1205 (e.g., the operation of the j-th processing element is shown in Fig. 12).
  • the forward processor 33a processes the interpixel data of a current image line provided thereto and the backward processor 33c reads a decision value of a previous image line processed by the forward processor 33a from the trellis queue 33b for processing thereof. It is judged, in step S 1207, whether the parallel processing of the step S 1205 is performed until clock 2(M-I). If not, the clock increments by one in step S 1209 and the process returns to the step S 1203. On the other hand, if performed until clock 2(M-I), the processed result is provided to the encoder 35 in step S1211.
  • M denotes the number of pixels on an image line (e.g.,

Abstract

A system for real-time stereo image matching includes: an image processor for converting image signals taken by each of a plurality of cameras into digital images; a plurality of input buffers for extracting interpixel data from the digital images provided from the image processor and reordering the interpixel data; a processing element array having a plurality of processing elements to produce an optimal decision value or a disparity value; and an encoder for encoding the disparity value or the optimal decision value. Each of the processing elements sequentially receives the reordered interpixel data and performs image matching by using a disparity value outputted from the processing element of the upper stage at previous clock.

Description

Description
SYSTEM AND METHOD FOR REAL-TIME STEREO IMAGE
MATCHING
Technical Field
[1] The present invention relates to a system and method for real-time stereo image matching using multiple cameras; and, more particularly, to a system and method which can accurately extract 3-dimensional (3D) distance information on a thin object spaced apart from a background by finding corresponding points in multiple scan lines of multiple images in real time. Background Art
[2] As well-known in the art, stereo matching is a core process of a real-time stereo image processing system. The stereo matching refers to a re-creating process of 3D spatial information from a pair of 2D images by using geometric relations therebetween as shown in Fig. 1.
[3] In Fig. 1, reference symbols
F
B and Z represent a focal length, a base line and a depth, respectively.
[4] Referring to Fig. 1, the stereo matching employs a method for finding a pixel on an image line corresponding to an epipolar line in each of a left and a right image, respectively, the pixels thus found corresponding to an identical point of
QC9Y9Z) in a 3D space. In this instance, a disparity d for the conjugate pixel pair is calculated as in Equation 1 : [5]
[6] MathFigure 1
[Math.l] j r ϊ a = x - x
[7]
[8] The depth Z is a geometric characteristic calculated from the disparity. That is, the disparity has distance information. Hence, 3D distance and shape information on an object in an observation space can be measured by calculating the disparity in real time from left and right images.
[9] The real-time stereo image matching system functions, for example, as a visual device of a robot in industries, a road recognition device of an autonomous vehicle and a visual device of a toy robot or the like in home electronics. Further, the stereo image matching system can be used for producing a 3D map along with an artificial satellite.
[10] However, when a thin object is positioned nearer than a background having a lot of information, the above-described prior art system may extract only a 3D distance value for the background without extracting a 3D distance value for the object due to the information of the background.
[11] Further, since only two cameras are used for the matching, the prior art system has a drawback in that horizontal noises are generated and does not show high matching reliability.
[12]
Disclosure of Invention
Technical Problem
[13] In view of the above, the present invention provides a system and method for realtime stereo image matching using multiple cameras. By using multiple cameras, 3D distance information on a thin object spaced apart from a background can be extracted accurately, and also, horizontal noises can be reduced. Accordingly, 3D distance and shape information on an object in an observation space can be measured more accurately, thereby improving matching reliability.
[14]
Technical Solution
[15] In accordance with an aspect of the present invention, there is provided a system for real-time stereo image matching, including: an image processor for converting image signals taken by each of a plurality of cameras into digital images; a plurality of input buffers for extracting interpixel data from the digital images provided from the image processor and reordering the interpixel data; a processing element array having a plurality of processing elements, wherein each of the processing elements sequentially receives the reordered interpixel data and performs image matching by using a disparity value outputted from the processing element of the upper line at previous clock to produce an optimal decision value or a disparity value; and an encoder for encoding the disparity value or the optimal decision value. [16] In accordance with another aspect of the present invention, there is provided a method for real-time stereo image matching in a system including a processing element array having a plurality of processing elements, the method including: converting image signals taken by each of a plurality of cameras into digital images; extracting in- terpixel data from the digital images and reordering the interpixel data; producing an optimal decision value or a disparity value by sequentially providing to each of the processing elements the reordered interpixel data to perform image matching using a disparity value provided from the processing element of the upper stage at a previous clock; and encoding the disparity value or the optimal decision value by using differential coding.
Advantageous Effects
[17] In accordance with the present invention, by using multiple cameras, 3D distance information on a thin object spaced apart from a background can be extracted accurately, and also, horizontal noises can be reduced. Accordingly, 3D distance and shape information on an object in an observation space can be measured more accurately, thereby improving matching reliability.
[18] Further, since the system in accordance with the present invention employs a highspeed parallel processing matching technique and a high-compression encoding technique, the system can be applied competitively to various applications as a small device.
[19]
Brief Description of the Drawings
[20] The above and other objects and features of the present invention will become apparent from the following description of embodiments, given in conjunction with the accompanying drawings, in which:
[21] Fig. 1 illustrates an explanatory view of a stereo matching using two cameras;
[22] Fig. 2 illustrates a block diagram of a systolic architecture for image matching using multiple cameras in accordance with an embodiment of the present invention;
[23] Fig. 3 illustrates a detailed block diagram of the image matching unit in Fig. 2;
[24] Fig. 4 illustrates a detailed block diagram of the input buffers in Fig. 3;
[25] Fig. 5 illustrates a detailed block diagram of the forward processor in Fig. 3;
[26] Fig. 6 illustrates a detailed block diagram of the backward processor in Fig. 3;
[27] Fig. 7 illustrates a table for explaining the operation of the encoder in Fig. 3;
[28] Fig. 8 illustrates an exemplary view of the input buffer in a case where three cameras are employed;
[29] Fig. 9 illustrates a structure of a processing element array in a case where three cameras are employed; [30] Fig. 10 illustrates an exemplary view of the forward processor in Fig. 9 in a case where three cameras are employed;
[31] Fig. 11 illustrates a structure of a processing element array in a case where an N- number of cameras are employed; and
[32] Fig. 12 illustrates a flowchart for explaining parallel processing performed by an M- number of processing elements as clock t increments from 1 to 2(M-I).
[33]
Best Mode for Carrying Out the Invention
[34] Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings, which form a part hereof.
[35] Fig. 2 illustrates a block diagram of a systolic architecture for image matching using multiple cameras in accordance with an embodiment of the present invention. An image matching system of the present invention includes an N-number of cameras 10-1 to 10-N, an image processor 20, and an image matching unit 30 and a user system 40.
[36] Each of the N-number of cameras 10-1 to 10-N takes an image of an object, and provides an image signal to the image processor 20.
[37] The image processor 20 converts image signals of objects from the respective N- number of cameras 10-1 to 10-N into digital signals to provide an N-number of image data to the image matching unit 30. Under the control of the image processor 20, the N-number of image data are also stored in a storage unit 50.
[38] The image matching unit 30 sequentially receives pixel data on image lines corresponding to an identical epipolar line in the N-number of digital images provided from the image processor 20 to calculate disparity values. The calculated disparity values are provided to the user system 40.
[39] The image matching unit 30 repeatedly performs the above-described operations to thereby provide to the user system 40 the calculated disparity values for all epipolar lines.
[40] The user system 40 is a system that uses distance data based on the disparity values provided from the image matching unit 30. The user system 40 may include a visual device of a robot in industries, a road recognition device of an autonomous vehicle, a visual device of a toy robot or the like in home electronics and a 3D map building system along with an artificial satellite.
[41] As shown in Fig. 3, the image matching unit 30 includes an N-number of input buffers 31-1 to 31-n, a processing element unit 33 and an encoder 35.
[42] Each of the N-number of input buffers 31-1 to 31-n extracts interpixel data from each of the N-number of digital images provided from the image processor 20 and reorders the interpixel data, respectively. As shown in Fig. 4, each of the input buffers 31-1 to 31-n includes a control unit 31a, two D flip-flops 31b and 31c and a calculator 3 Id.
[43] The control unit 31a generates enable signals for determining whether or not the D flip-flops 31b and 31c will receive the pixel data, and provides the enable signals to the D flip-flops 31b and 31c. Further, the control unit 31a generates a pixel index and provides the pixel index to the calculator 3 Id.
[44] Each of the D flip-flops 31b and 31c stores one pixel value based on the enable signal from the control unit 31a, and provides the pixel value to the calculator 3 Id.
[45] The calculator 3 Id obtains weighted sum of two pixel values (i.e., pixel 1 a and pixel
2 as shown in Fig. 3) received from the D flip-flops 31b and 31c according to the pixel index from the control unit 31a, and divides the weighted sum by sum of weights. For example, division by two, four and eight may be implemented by removing one, two and three least significant bit(s), respectively. The calculator 3 Id then provides the division result to the processing element unit 33. Here, each of the D flip-flops 31b and 31c may be of a register, and the calculator 3 Id may be of a multiplier, an adder and a divider.
[46] The processing element unit 33 may be implemented with a number of processing elements in the form of a processing element array, the number of processing elements corresponding to a specific maximum disparity value. Each of the processing elements, which includes a forward processor 33a, a trellis queue (e.g., a stack) 33b and a backward processor 33c, can exchange information with its adjacent processing elements. The processing element unit 33 is configured to operate at the maximum speed regardless of the number of processing elements.
[47] The processing element unit 33 sequentially receives the reordered interpixel data and performs image matching by using the disparity values of a processing element on the upper line processed at a previous clock to produce decision values (to be described later) or the disparity values of the current processing line.
[48] The forward processor 33a operates in synchronism with a clock. The forward processor 33a receives proper pixel values from the image line corresponding to the epipolar line of each multiple images to calculate a decision value
(where
Vr, denotes a decision value calculated by the forward processor 33a of a j-th processing element at a t-th clock), and then stores the decision value v* in the trellis queue 33b. As shown in Fig. 5, the forward processor 33a includes an absolute value calculator 33aa, a multiplexer 33ab, an adder 33ac and a flip-flop 33ad. [49] The absolute value calculator 33aa calculates a matching cost using the difference between the N-number of pixel data values. [50] The multiplexer 33ab determines, assuming that processing elements vertically adjacent in the processing element unit 33 are referred to as a top, a middle and a bottom processing element respectively, a minimum value among an accumulated cost
at a previous (t-l)-th clock stored in the top processing element, an accumulated cost
^-1O D at the previous (t-l)-th clock stored in the bottom processing element and an accumulated cost
at the previous (t-l)-th clock stored in the middle processing element and feed- backed via the flip-flop 33ad (where
denotes an accumulated cost of a j-th processing element at a t-th clock). The minimum value determined by the multiplexer 33ab is then provided to the adder 33ac. [51] Further, the multiplexer 33ab provides, to the trellis queue 33b, the decision value
(where decision values for the top, middle and bottom processing elements are 1, 0 and -1, respectively), wherein the decision value v* represents an originating path of the minimum value (i.e., the least cost) among the top, middle and bottom processing elements.
[52] The adder 33ac adds the minimum value from the multiplexer 33ab and the matching cost at a current clock (t) from the absolute value calculator 33aa to calculate an accumulated cost
of the current element. The accumulated cost
is then provided to the flip-flop 33ad. [53] The flip-flop 33ad outputs the current accumulated cost
provided from the adder 33ac at a next clock. In other words, output data from the forward processor 33a of the j-th processing element at the time of a clock t includes an accumulated cost
to be transmitted to the top and bottom elements and the decision value
to be stored in the trellis queue 33b and then transmitted to the backward processor 33c. [54] The trellis queue 33b stores the decision value
from the forward processor 33a and then transmits it to the backward processor 33c.
[55] The backward processor 33c carries out an operation on the decision value read from the trellis queue 33b to calculate an optimal disparity value. The optimal disparity value is provided to the encoder 35 in synchronism with the clock. As shown in Fig. 6, the backward processor 33c includes an OR gate 33ca, a one-bit activation D flip-flop 33cb, a demultiplexer 33cc and a tri-state buffer 33cd.
[56] The OR gate 33ca receives activation signals
and
Vl O i )δ( i + FMιM ) from the adjacent processing elements (i.e., (j-l)th and (j+l)th processing element processors, respectively) and an activation signal
feed-backed via the demultiplexer 33cc (where
^ CO indicates an activation bit value of the backward processor 33c of the j-th processing element at the t-th clock). The OR gate 33ca then performs a logical OR operation of the activation signals a O 1 )5( 1 - VtΛjΛ )'
Vl O i )δ( i + FMιM ) and
to provide an activation bit 8 to the D flip-flop 33cb. [57] The D flip-flop 33cb temporarily stores the activation bit
Figure imgf000010_0001
from the OR gate 33ca while providing an activation bit
at the previous clock to each of the buffer 33cd and the demultiplexer 33cc. [58] The demultiplexer 33cc outputs activation signals
and
, which are corresponding to the activation bit
provided from the D flip-flop 33cb, to the backward processors of the adjacent processing elements based on the decision value v^ provided from the trellis queue 33b (where
represents a decision value at the current matching point). That is, the forward direction of the activation bit
is set using the decision value
representing a path stored at the immediately previous matching point. Further, the demultiplexer 33cc outputs the activation signal o^ O DδC vt_x;y to the OR gate 33ca. [59] The tri-state buffer 33cd provides, to the encoder 35, an optimal decision value
VZ which represents increment/decrement of disparity depending on the activation bit
Figure imgf000010_0002
from the D flip-flop 33cb. Preferably, when the activation bit is " 1" the tri-state buffer 33cd outputs the input value as it is, and, the tri-state buffer 33cd becomes a high impedance state elsewhere and there is no output.
[60] Fig. 7 illustrates a table for explaining the operation of the encoder 35 in Fig. 3.
[61] Concatenation of optimal decision values
Figure imgf000011_0001
and
K from the backward processor 33c produces a disparity value. However, since the disparity value is gradually changed depending on the variation the optimal decision values
and
, the encoder 35 employs a differential coding to obtain an output thereof, thereby increasing the compression rate. That is, the encoder 35 does not output the concatenation of the optimal decision values
Figure imgf000011_0002
and
=+= vt from the backward processor 33c, but makes an output by using the variations of the optimal decision values
Figure imgf000011_0003
and
=+= vt as shown in Fig. 7, for example.
[62] Basically, representation of two optimal decision values requires four bits. However, if dummy data and nearly nonexistent pairs of optimal decision values are excluded, the two optimal decision values can be represented with only three bits. Further, one of eight binary representations using three bits can be assigned as a flag. As such, the encoder 35 compresses two optimal disparity values into three bits, thereby showing high compression effect.
[63] To be specific, the decision value representing variations of the path has only three values, e.g., "01", "10" and "00" for representing "upward", "downward" and "no change" respectively, and thus " 11" is a dummy data. Further, since "0110" and "1001" are nearly nonexistent pair of optimal decision values geometrically, they can be considered to be the same as "0000". In consideration of the characteristics of the output disparity value described above, a high compression rate can be obtained by differential coding and exclusion of dummy data.
[64] Fig. 8 illustrates an exemplary view of the input buffer in a case where three cameras are employed. In Fig. 8, a pixel value is inputted to the input buffer once for every two clocks. The D flip-flop 31c is always in an enable state, and the D flip-flop 31b is enabled after the pixel value has been changed and a lapse of one clock.
[65] Fig. 9 illustrates a structure of a processing element array in a case where three cameras are employed. As shown in Fig. 9, images of first to third cameras are inputted to each element. Specifically, the images are sequentially inputted in a manner that the image of the first camera is inputted toward a top processing element producing a low disparity value, and the image of the third camera is inputted toward a bottom processing element producing a high disparity value.
[66] Fig. 10 illustrates an exemplary view of the forward processor fp in Fig. 9 in a case where three cameras are employed. Pixel values such as
Figure imgf000012_0001
- anAdi]
Figure imgf000012_0002
are inputted to the absolute value calculator 33aa, and the sum of the differences of these pixel values is outputted as a matching cost. [67] Fig. 11 illustrates a structure of a processing element array in a case where an N- number of cameras are employed. As shown in Fig. 11, each of the processing elements has the same structure having a forward processor fp, a stack and a backward processor bp as in Fig. 9. [68] One pixel of each N-number of images is inputted to the forward processor in each processing element. An input pixel value
from the i-th camera has an index of
Figure imgf000013_0001
, where
D is a disparity level (if it is infinitely distant (
Z=oo ),
D equals zero, and if it is the closest,
D equals
M
(maximum disparity level)),
N is the number of cameras (or images) and x is the index of a camera at center. Since the index
C is the same as the index of a result disparity, in order to represent the index of the result disparity, the index
is used as a reference for obtaining the index of
JC i even if a center camera does not exist.
[69] In this structure, one processing element array performs parallel computation in synchronism with a time clock to produce a result corresponding to one disparity index for every two time clocks. That is, the index
X c at the time clock t becomes 2 . When the index
JC
C is t
, the index of the input pixel value x from the i-th camera in the j-th processing element at the time clock t becomes t AT+ 1 -2/ T + 2(TNM) J (where
JC i is an image register value, i.e., an interpixel value, of the j-th processing element of the i-th camera at the clock t
)• [70] A pixel value
JC having the above-described index is inputted to the forward processor fp. In operation, a total N-number of pixels, i.e., one pixel on an image line corresponding to an epipolar line for each N-number of camera images, are inputted to each processing element. Each processing element exchanges cost values
U with its adjacent processing elements. Further, in each processing element, the forward processor fp provides to the stack a direction value -1, 0 or 1. The direction value represents which processing element among the (j-l)-th, j-th and (j+l)-th processing elements has the least cost at (t-l)-th clock, and the values -1, 0 and 1 mean that the processing element having the least cost is the (j-l)-th, j-th and (j+l)-th elements, respectively. Therefore, the direction value informs from which processing element the cost is provided.
[71] Fig. 12 illustrates a flowchart for explaining parallel processing performed by an M- number of processing elements as clock t increments from 1 to 2(M-I). As shown in Fig. 12, first of all, the processing elements are initialized in step S1201. To be specific, all cost registers and image registers in the processing element array are initialized, and then the clock is set to 1. Thereafter, an image pixel from each N-number of cameras is inputted to each processing element in step S 1203. The M-number of processing elements operates in parallel in step S 1205 (e.g., the operation of the j-th processing element is shown in Fig. 12). In each processing element, the forward processor 33a processes the interpixel data of a current image line provided thereto and the backward processor 33c reads a decision value of a previous image line processed by the forward processor 33a from the trellis queue 33b for processing thereof. It is judged, in step S 1207, whether the parallel processing of the step S 1205 is performed until clock 2(M-I). If not, the clock increments by one in step S 1209 and the process returns to the step S 1203. On the other hand, if performed until clock 2(M-I), the processed result is provided to the encoder 35 in step S1211.
[72] A multi-camera cost
of the forward processor 33a of a j-th processing element at a t-th clock is calculated as in Equation 2 so that the effect of the entire cameras may exert uniformly on the cost:
[73] [74] MathFigure 2 [Math.2]
Figure imgf000015_0001
]
[75] [76] In Fig.12,
is a cost register value of the forward processor 33a of a j-th processing element at a t-th clock, where t N+ \ -2i .-i
'[ 2~ + 2(N- I) 3\ indicates an image register value of the j-th processing element of an i-th camera at the t-th clock (i.e., an interpixel value having an index of t N+ l -li T + 2(N- I) J ). In addition,
denotes a decision value which is provided from the forward processor 33a in the j-th processing element and stored in the trellis queue 33b at the t-th clock, and a ^t) represents an activation bit value of the backward processor 33c in the j-th processing element at the t-th clock. Further, a + (t- l )5(p+ Vf 1 + ) denotes an activation signal for the backward processor to be transmitted to the adjacent processing elements, where p= l and
P=- 1 when the activation signal is transmitted to an upper processing element and a lower processing element, respectively. Further more,
Λ
indicates a disparity value at the t-th clock, M denotes the number of pixels on an image line (e.g.,
M= 1024 in a 1024x768 image) and argminfuncQx) is to produce a parameter x for minimizing the function func(x)
[77] While the invention has been shown and described with respect to the embodiments, it will be understood by those skilled in the art that various changes and modification may be made without departing from the scope of the invention as defined in the following claims.

Claims

Claims
[1] A system for real-time stereo image matching, comprising: an image processor for converting image signals taken by each of a plurality of cameras into digital images; a plurality of input buffers for extracting interpixel data from the digital images provided from the image processor and reordering the interpixel data; a processing element array having a plurality of processing elements, wherein each of the processing elements sequentially receives the reordered interpixel data and performs image matching by using a disparity value outputted from the processing element of the upper stage at previous clock to produce an optimal decision value or a disparity value; and an encoder for encoding the disparity value or the optimal decision value.
[2] The system of claim 1, wherein each of the input buffers includes: a control unit for generating two enable signals and a pixel index; two D flip-flops for storing one pixel value, respectively, in response to the enable signals; and a calculator for calculating a pixel value to be provided to the processing elements by using the pixel index and two pixel values outputted from the two D flip-flops.
[3] The system of claim 2, wherein each of the D flip-flops includes a register.
[4] The system of claim 2, wherein the calculator includes a multiplier, an adder and a divider to obtain weighted sum of the two pixel values outputted from the two D flip-flops according to the pixel index and divide the weighted sum by sum of weights.
[5] The system of claim 1, wherein each of the processing elements includes: a forward processor for receiving a pixel value from an image line corresponding to an epipolar line of each of digital images to calculate a decision value; a trellis queue for storing the calculated decision value; and a backward processor for calculating the disparity value based on the decision value stored in the trellis queue, and the calculated disparity value being provided to the encoder.
[6] The system of claim 5, wherein the forward processor includes: an absolute value calculator for calculating a matching cost by using the difference between the pixel values from the input buffers; a multiplexer for determining a minimum value between accumulated matching costs of adjacent processing elements from the absolute value calculator and a previous accumulated matching cost to produce the decision value representing a path of the minimum value; an adder for adding the decision value from the multiplexer and the matching cost calculated by the absolute value calculator to calculate an accumulated matching cost; and a flip-flop for temporarily storing the calculated accumulated matching cost, wherein the accumulated matching cost is provided to the multiplexer as the previous accumulated matching cost at a next clock.
[7] The system of claim 5, wherein the backward processor includes: an OR gate for calculating logical sum of activation signals from adjacent processing elements and a feed-backed activation signal to produce an activation bit; a flip-flop for temporarily storing the activation bit from the OR gate until a next clock; a demultiplexer for outputting an activation signal corresponding to the activation bit from the flip-flop to backward processors of the adjacent processing elements according to the decision value from the trellis queue and to the OR gate as the feed-backed activation signal; and a buffer for outputting the optimal decision value representing increment/ decrement of the disparity value depending on the activation bit from the flip- flop.
[8] The system of claim 7, wherein the buffer outputs the input value as it is when the input value is " 1 " and the buffer becomes a high impedance state and outputs no value when the input value is not "1".
[9] The system of claim 1, wherein the encoder encodes the disparity value or the optimal decision value by using differential coding.
[10] A method for real-time stereo image matching in a system including a processing element array having a plurality of processing elements, the method comprising: converting image signals taken by each of a plurality of cameras into digital images; extracting interpixel data from the digital images and reordering the interpixel data; producing an optimal decision value or a disparity value by sequentially providing to each of the processing elements the reordered interpixel data to perform image matching using a disparity value produced from the processing element of the upper stage at a previous clock; and encoding the disparity value or the optimal decision value by using differential coding.
[11] The method of claim 10, wherein producing the optimal decision value or the disparity value includes: calculating a decision value by receiving a pixel value from an image line corresponding to an epipolar line of each of digital images; temporarily storing the calculated decision value; and calculating the disparity value based on the stored decision value. [12] The method of claim 11, wherein calculating the decision value includes: calculating a matching cost by using the difference between the pixel values; determining a minimum value between accumulated matching costs of adjacent processing elements and a previous accumulated matching cost; producing the decision value, wherein the decision value represents a path of the minimum value; adding the decision value and the matching cost to calculate an accumulated matching cost; and temporarily storing the calculated accumulated matching cost until providing the accumulated matching cost as the previous accumulated matching cost at a next clock. [13] The method of claim 11, wherein calculating the disparity value includes: calculating logical sum of activation signals from adjacent processing elements and a feed-backed activation signal to produce an activation bit; temporarily storing the activation bit; outputting an activation signal corresponding to the activation bit stored at a previous clock to the adjacent processing elements according to the decision value and feed-backing the activation signal; and outputting the optimal decision value representing increment/decrement of the disparity value depending on the activation bit. [14] The method of claim 10, wherein extracting interpixel data from the digital images and reordering the interpixel data includes: generating a pixel index; storing two pixel values; and calculating a pixel value to be provided to the processing elements by using the pixel index and the stored pixel values. [15] The method of claim 14, wherein calculating the pixel value includes: obtaining weighted sum of the two pixel values according to the pixel index; and dividing the weighted sum by sum of weights.
PCT/KR2008/006267 2007-10-25 2008-10-23 System and method for real-time stereo image matching WO2009054683A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020070107551A KR100926127B1 (en) 2007-10-25 2007-10-25 Real-time stereo matching system by using multi-camera and its method
KR10-2007-0107551 2007-10-25

Publications (1)

Publication Number Publication Date
WO2009054683A1 true WO2009054683A1 (en) 2009-04-30

Family

ID=40579721

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2008/006267 WO2009054683A1 (en) 2007-10-25 2008-10-23 System and method for real-time stereo image matching

Country Status (2)

Country Link
KR (1) KR100926127B1 (en)
WO (1) WO2009054683A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2012056437A1 (en) 2010-10-29 2012-05-03 École Polytechnique Fédérale De Lausanne (Epfl) Omnidirectional sensor array system
US9232123B2 (en) 2010-10-06 2016-01-05 Hewlett-Packard Development Company, L.P. Systems and methods for acquiring and processing image data produced by camera arrays

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103238339B (en) 2010-12-02 2015-12-09 尤特瑞登特生产公司 Check and follow the tracks of the system and method for stereoscopic video images
AU2013202775B2 (en) * 2012-06-01 2015-09-17 Ultradent Products, Inc. Stereoscopic video imaging

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20040069623A (en) * 2003-01-30 2004-08-06 학교법인 포항공과대학교 A multilayered real-time stereo matching system using the systolic array and method thereof
JP2005009883A (en) * 2003-06-16 2005-01-13 Calsonic Kansei Corp Stereoscopic camera apparatus
KR20060023714A (en) * 2004-09-10 2006-03-15 학교법인 포항공과대학교 System and method for matching stereo image
KR20060041060A (en) * 2004-11-08 2006-05-11 한국전자통신연구원 Apparatus and method for production multi-view contents

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20040069623A (en) * 2003-01-30 2004-08-06 학교법인 포항공과대학교 A multilayered real-time stereo matching system using the systolic array and method thereof
JP2005009883A (en) * 2003-06-16 2005-01-13 Calsonic Kansei Corp Stereoscopic camera apparatus
KR20060023714A (en) * 2004-09-10 2006-03-15 학교법인 포항공과대학교 System and method for matching stereo image
KR20060041060A (en) * 2004-11-08 2006-05-11 한국전자통신연구원 Apparatus and method for production multi-view contents

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9232123B2 (en) 2010-10-06 2016-01-05 Hewlett-Packard Development Company, L.P. Systems and methods for acquiring and processing image data produced by camera arrays
WO2012056437A1 (en) 2010-10-29 2012-05-03 École Polytechnique Fédérale De Lausanne (Epfl) Omnidirectional sensor array system
US10362225B2 (en) 2010-10-29 2019-07-23 Ecole Polytechnique Federale De Lausanne (Epfl) Omnidirectional sensor array system

Also Published As

Publication number Publication date
KR20090041843A (en) 2009-04-29
KR100926127B1 (en) 2009-11-11

Similar Documents

Publication Publication Date Title
CN109598754B (en) Binocular depth estimation method based on depth convolution network
EP0918439B1 (en) Device for converting two-dimensional video into three-dimensional video
US9661227B2 (en) Method, circuit and system for stabilizing digital image
WO2009101798A1 (en) Compound eye imaging device, distance measurement device, parallax calculation method and distance measurement method
WO1999053681A2 (en) Method and apparatus for measuring similarity using matching pixel count
JP5197683B2 (en) Depth signal generation apparatus and method
US20020012459A1 (en) Method and apparatus for detecting stereo disparity in sequential parallel processing mode
US20060055703A1 (en) Stereo image matching method and system using image multiple lines
Ttofis et al. High-quality real-time hardware stereo matching based on guided image filtering
US8427524B2 (en) Message propagation-based stereo image matching system
CN112070821A (en) Low-power-consumption stereo matching system and method for acquiring depth information
US5214751A (en) Method for the temporal interpolation of images and device for implementing this method
WO2009054683A1 (en) System and method for real-time stereo image matching
EP1445964A2 (en) Multi-layered real-time stereo matching method and system
CN111028273B (en) Light field depth estimation method based on multi-stream convolution neural network and implementation system thereof
US20220148143A1 (en) Image fusion method based on gradient domain mapping
Ustukov et al. Implementing one of stereovision algorithms on FPGA
CN112785634A (en) Computer device and synthetic depth map generation method
US20040228521A1 (en) Real-time three-dimensional image processing system for non-parallel optical axis and method thereof
JP4862004B2 (en) Depth data generation apparatus, depth data generation method, and program thereof
EP1830562A1 (en) Learning device, learning method, and learning program
CN108401125B (en) Video data processing method, device and storage medium
KR100795974B1 (en) Apparatus for realtime-generating a depth-map by processing streaming stereo images
CN107194334B (en) Video satellite image dense Stereo Matching method and system based on optical flow estimation
CN107590857A (en) For generating the apparatus and method of virtual visual point image

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 08840970

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 08840970

Country of ref document: EP

Kind code of ref document: A1