Background
Stereo matching is a classical problem in computer vision. The stereo matching technique is used to obtain a three-dimensional (3D) image from a stereo image. A stereoscopic image is a pair of two-dimensional (2D) images that are taken of the same object from different positions on the same straight line.
For example, if the stereo image is a pair of two-dimensional images, a left image and a right image, of the same scene captured by two cameras in front of the object, the difference in coordinates in the left image and the right image for the same point in space of the object is called disparity. And constructing a three-dimensional image according to the parallax acquired from the stereo image by the stereo matching technology.
The stereo matching technique in the first prior art will be described below. As shown in fig. 1, a technical solution adopted by the stereo matching technology in the first prior art mainly includes the following steps: step 101, dividing an image into a plurality of areas; 102, calculating an initial parallax value of each area image pixel; 103, performing parallax plane fitting on each region to obtain an initial parallax parameter of each region; 104, optimizing parallax parameters of each area; and 105, obtaining the parallax of each area according to the optimized parallax parameters. The following mainly describes steps 102, 103 and 104 in the prior art one.
And 102, calculating an initial parallax value of each area image pixel.
Presetting an initial parallax range [ d ]min,dmax]And a disparity step d'. Selecting the current image area being processed, i.e. the current area, selecting the first image pixel in this area, setting the disparity d of this pixel to dminThe matching residual c (x, y, d) is calculated as equation (r 1):
<math>
<mrow>
<mi>c</mi>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
<mo>,</mo>
<mi>d</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<mfrac>
<mn>1</mn>
<mn>9</mn>
</mfrac>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mo>-</mo>
<mn>1</mn>
</mrow>
<mn>1</mn>
</munderover>
<munderover>
<mi>Σ</mi>
<mrow>
<mi>j</mi>
<mo>=</mo>
<mo>-</mo>
<mn>1</mn>
</mrow>
<mn>1</mn>
</munderover>
<mo>|</mo>
<mi>I</mi>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>+</mo>
<mi>i</mi>
<mo>,</mo>
<mi>y</mi>
<mo>+</mo>
<mi>j</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>J</mi>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>+</mo>
<mi>i</mi>
<mo>+</mo>
<mi>d</mi>
<mo>,</mo>
<mi>y</mi>
<mo>+</mo>
<mi>j</mi>
<mo>)</mo>
</mrow>
<mo>|</mo>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mi>r</mi>
<mn>1</mn>
<mo>)</mo>
</mrow>
</mrow>
</math>
wherein (x, y) is image coordinate value, d is parallax
According to the parallax step length d', continuously increasing the parallax d, calculating c (x, y, d), and storing the minimum c (x, y, d) and the corresponding d until d is larger than or equal to dmaxThen d corresponding to the smallest c (x, y, d) is the initial parallax of the pixel.
And 103, performing parallax plane fitting on each region to obtain an initial parallax parameter of each region.
First, an initial parallax plane of each region is fitted according to the initial parallax value of each region. A method for determining an initial parallax plane of each region in the prior art includes: firstly, judging whether each pixel point is an out-of-office point according to the following method: initial parallax according to reference image pixel (x, y)
Obtaining corresponding image pixel (x ', y'), calculating initial parallax of (x ', y') relative to reference image
If it is not
<math>
<mrow>
<mover>
<mi>d</mi>
<mo>^</mo>
</mover>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
<mo>)</mo>
</mrow>
<mo>≠</mo>
<mover>
<mi>d</mi>
<mo>^</mo>
</mover>
<mrow>
<mo>(</mo>
<msup>
<mi>x</mi>
<mo>′</mo>
</msup>
<mo>,</mo>
<msup>
<mi>y</mi>
<mo>′</mo>
</msup>
<mo>)</mo>
</mrow>
<mo>,</mo>
</mrow>
</math>
Then pixel (x, y) is an out-of-range point. And in the process of determining the initial parallax plane, excluding the local outliers to enhance the robustness of the algorithm.
Then, selecting the image area currently processed, and determining whether the number of pixels included in the image area is greater than a set threshold, wherein in the prior art, only the image area with the number of pixels greater than the threshold is processed. If the number of pixels in the currently processed image area is not more than the threshold value, selecting the next image area and judging by adopting the same method; and if the number of pixels in the currently processed image area is larger than the threshold value, selecting a first image pixel point in the area, and starting to perform initial parallax plane fitting.
The disparity plane of each region is modeled according to the following disparity plane model formula (r 2):
d=c1x+c2y+c3 (r2)
wherein (x, y) is the coordinate value of the image pixel, d is the initial parallax corresponding to the image pixel, and (c)1,c2,c3) Is a disparity plane coefficient vector, i.e. a disparity parameter.
If the pixel is an outlier, the next pixel is selected. If the pixel is not an outlier, the pixel information is added to the linear system matrix of equation (r 3).
When all points in the region have been processed, the value of [ c ] can be obtained according to the formula (r3)1,c2,c3]TAnd obtaining an initial parallax plane of the area:
<math>
<mrow>
<mfenced open='(' close=')'>
<mtable>
<mtr>
<mtd>
<munder>
<mi>Σ</mi>
<mi>i</mi>
</munder>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
</mtd>
<mtd>
<munder>
<mi>Σ</mi>
<mi>i</mi>
</munder>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
</mtd>
<mtd>
<munder>
<mi>Σ</mi>
<mi>i</mi>
</munder>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
</mtd>
</mtr>
<mtr>
<mtd>
<munder>
<mi>Σ</mi>
<mi>i</mi>
</munder>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
</mtd>
<mtd>
<munder>
<mi>Σ</mi>
<mi>i</mi>
</munder>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
</mtd>
<mtd>
<munder>
<mi>Σ</mi>
<mi>i</mi>
</munder>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
</mtd>
</mtr>
<mtr>
<mtd>
<munder>
<mi>Σ</mi>
<mi>i</mi>
</munder>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
</mtd>
<mtd>
<munder>
<mi>Σ</mi>
<mi>i</mi>
</munder>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
</mtd>
<mtd>
<munder>
<mi>Σ</mi>
<mi>i</mi>
</munder>
<mn>1</mn>
</mtd>
</mtr>
</mtable>
</mfenced>
<mfenced open='(' close=')'>
<mtable>
<mtr>
<mtd>
<msub>
<mi>c</mi>
<mn>1</mn>
</msub>
</mtd>
</mtr>
<mtr>
<mtd>
<msub>
<mi>c</mi>
<mn>2</mn>
</msub>
</mtd>
</mtr>
<mtr>
<mtd>
<msub>
<mi>c</mi>
<mn>3</mn>
</msub>
</mtd>
</mtr>
</mtable>
</mfenced>
<mo>=</mo>
<mfenced open='(' close=')'>
<mtable>
<mtr>
<mtd>
<munder>
<mi>Σ</mi>
<mi>i</mi>
</munder>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<msub>
<mi>d</mi>
<mi>i</mi>
</msub>
</mtd>
</mtr>
<mtr>
<mtd>
<munder>
<mi>Σ</mi>
<mi>i</mi>
</munder>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
<msub>
<mi>d</mi>
<mi>i</mi>
</msub>
</mtd>
</mtr>
<mtr>
<mtd>
<munder>
<mi>Σ</mi>
<mi>i</mi>
</munder>
<msub>
<mi>d</mi>
<mi>i</mi>
</msub>
</mtd>
</mtr>
</mtable>
</mfenced>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mi>r</mi>
<mn>3</mn>
<mo>)</mo>
</mrow>
</mrow>
</math>
and fitting the initial parallax planes to the regions by the same method, forming a parallax plane set by the set of the initial parallax planes of the regions, and obtaining the parallax parameters of the regions according to the parallax plane set.
And 104, optimizing the parallax parameters of the areas.
A method for optimizing parallax parameters of each region in the prior art mainly includes three steps.
The method comprises the following steps of firstly, according to an obtained initial parallax plane, using a cycle to optimize the parallax plane.
The method for circularly optimizing the parallax plane comprises the following steps: setting the new plane and the old plane as initial parallax planes, setting the cycle count as 1, selecting the first pixel in the current processing area, and judging whether the pixel is a shielding pixel. If the pixel is a shielding pixel, selecting the next pixel; if the pixel is not an occlusion pixel, in order to enhance the robustness of the algorithm, a weighted least square technique is adopted to calculate a weighting coefficient according to the formula (r4), and the weighted information of the pixel is added into a linear system matrix of the formula (r 3):
<math>
<mrow>
<mi>w</mi>
<mrow>
<mo>(</mo>
<msub>
<mi>β</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>=</mo>
<msup>
<mi>e</mi>
<mrow>
<mo>-</mo>
<mn>2</mn>
<msub>
<mi>β</mi>
<mi>i</mi>
</msub>
</mrow>
</msup>
<mo>,</mo>
</mrow>
</math>
wherein,
<math>
<mrow>
<msub>
<mi>β</mi>
<mi>i</mi>
</msub>
<mo>=</mo>
<mo>|</mo>
<msub>
<mi>c</mi>
<mn>1</mn>
</msub>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<mo>+</mo>
<msub>
<mi>c</mi>
<mn>2</mn>
</msub>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
<mo>+</mo>
<msub>
<mi>c</mi>
<mn>3</mn>
</msub>
<mo>-</mo>
<mover>
<mi>d</mi>
<mo>^</mo>
</mover>
<mrow>
<mo>(</mo>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
<mo>,</mo>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
<mo>)</mo>
</mrow>
<mo>|</mo>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mi>r</mi>
<mn>4</mn>
<mo>)</mo>
</mrow>
</mrow>
</math>
when all points in the region have been processed, the value [ c ] can be obtained according to the formula r51,c2,c3]TAnd obtaining a new parallax plane:
<math>
<mrow>
<mfenced open='(' close=')'>
<mtable>
<mtr>
<mtd>
<munder>
<mi>Σ</mi>
<mi>i</mi>
</munder>
<msub>
<mrow>
<msub>
<mi>w</mi>
<mi>i</mi>
</msub>
<mi>x</mi>
</mrow>
<mi>i</mi>
</msub>
<msub>
<mi>x</mi>
<mi>i</mi>
</msub>
</mtd>
<mtd>
<munder>
<mi>Σ</mi>
<mi>i</mi>
</munder>
<msub>
<mrow>
<msub>
<mi>w</mi>
<mi>i</mi>
</msub>
<mi>x</mi>
</mrow>
<mi>i</mi>
</msub>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
</mtd>
<mtd>
<munder>
<mi>Σ</mi>
<mi>i</mi>
</munder>
<msub>
<mrow>
<msub>
<mi>w</mi>
<mi>i</mi>
</msub>
<mi>x</mi>
</mrow>
<mi>i</mi>
</msub>
</mtd>
</mtr>
<mtr>
<mtd>
<munder>
<mi>Σ</mi>
<mi>i</mi>
</munder>
<msub>
<mrow>
<msub>
<mi>w</mi>
<mi>i</mi>
</msub>
<mi>x</mi>
</mrow>
<mi>i</mi>
</msub>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
</mtd>
<mtd>
<munder>
<mi>Σ</mi>
<mi>i</mi>
</munder>
<msub>
<mi>w</mi>
<mi>i</mi>
</msub>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
</mtd>
<mtd>
<munder>
<mi>Σ</mi>
<mi>i</mi>
</munder>
<msub>
<mi>w</mi>
<mi>i</mi>
</msub>
<msub>
<mi>y</mi>
<mi>i</mi>
</msub>
</mtd>
</mtr>
<mtr>
<mtd>
<munder>
<mi>Σ</mi>
<mi>i</mi>
</munder>
<msub>
<mrow>
<msub>
<mi>w</mi>
<mi>i</mi>
</msub>
<mi>x</mi>
</mrow>
<mi>i</mi>
</msub>
</mtd>
<mtd>
<munder>
<mi>Σ</mi>
<mi>i</mi>
</munder>
<msub>
<mrow>
<msub>
<mi>w</mi>
<mi>i</mi>
</msub>
<mi>y</mi>
</mrow>
<mi>i</mi>
</msub>
</mtd>
<mtd>
<munder>
<mi>Σ</mi>
<mi>i</mi>
</munder>
<msub>
<mi>w</mi>
<mi>i</mi>
</msub>
<mn>1</mn>
</mtd>
</mtr>
</mtable>
</mfenced>
<mfenced open='(' close=')'>
<mtable>
<mtr>
<mtd>
<msub>
<mi>c</mi>
<mn>1</mn>
</msub>
</mtd>
</mtr>
<mtr>
<mtd>
<msub>
<mi>c</mi>
<mn>2</mn>
</msub>
</mtd>
</mtr>
<mtr>
<mtd>
<msub>
<mi>c</mi>
<mn>3</mn>
</msub>
</mtd>
</mtr>
</mtable>
</mfenced>
<mo>=</mo>
<mfenced open='(' close=')'>
<mtable>
<mtr>
<mtd>
<munder>
<mi>Σ</mi>
<mi>i</mi>
</munder>
<msub>
<mrow>
<msub>
<mi>w</mi>
<mi>i</mi>
</msub>
<mi>x</mi>
</mrow>
<mi>i</mi>
</msub>
<msub>
<mi>d</mi>
<mi>i</mi>
</msub>
</mtd>
</mtr>
<mtr>
<mtd>
<munder>
<mi>Σ</mi>
<mi>i</mi>
</munder>
<msub>
<mrow>
<msub>
<mi>w</mi>
<mi>i</mi>
</msub>
<mi>y</mi>
</mrow>
<mi>i</mi>
</msub>
<msub>
<mi>d</mi>
<mi>i</mi>
</msub>
</mtd>
</mtr>
<mtr>
<mtd>
<munder>
<mi>Σ</mi>
<mi>i</mi>
</munder>
<msub>
<mrow>
<msub>
<mi>w</mi>
<mi>i</mi>
</msub>
<mi>d</mi>
</mrow>
<mi>i</mi>
</msub>
</mtd>
</mtr>
</mtable>
</mfenced>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mi>r</mi>
<mn>5</mn>
<mo>)</mo>
</mrow>
</mrow>
</math>
wherein, wiIs a pixel (x)i,yi) Weight coefficient w (β) ofi)。
And judging whether the difference amplitude between the new parallax plane and the old parallax plane is smaller than a preset threshold value, if not, circularly counting the number and increasing, updating the old parallax plane coefficient to be the new parallax plane coefficient, selecting the first pixel in the area, and repeating the above processing until the difference amplitude between the new parallax plane and the old parallax plane is smaller than the threshold value. If the current disparity plane is smaller than the disparity plane set, judging whether a new disparity plane is already in the disparity plane set, and if not, adding the new disparity plane into the disparity plane set. The process is repeated until all the regions are processed, and a new set of disparity planes is obtained.
And secondly, clustering and layering the obtained new parallax plane of each region.
Selecting a first plane from the set of planes, selecting a first region in the plane, determining from the non-occluded pixels of the region the number of supported pixels for the plane that satisfy the formula (r 6):
beta i < threshold (r6)
It is determined whether the number of pixels in the region is greater than a given threshold, e.g., 40, and if not, a matching cost is calculated according to the formula (r 7):
<math>
<mrow>
<mi>C</mi>
<mrow>
<mo>(</mo>
<mi>S</mi>
<mo>,</mo>
<mi>P</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munder>
<mi>Σ</mi>
<mrow>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
<mo>)</mo>
</mrow>
<mo>∈</mo>
<mi>S</mi>
</mrow>
</munder>
<mi>c</mi>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
<mo>,</mo>
<mi>d</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mi>r</mi>
<mn>7</mn>
<mo>)</mo>
</mrow>
</mrow>
</math>
wherein S is an image region, P is a parallax plane, c1 P、c2 P、c3 Pis a parameter of P.
If greater than the threshold, then a match cost is calculated according to the equation (r 8):
<math>
<mrow>
<mi>C</mi>
<mrow>
<mo>(</mo>
<mi>S</mi>
<mo>,</mo>
<mi>P</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munder>
<mi>Σ</mi>
<mrow>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
<mo>)</mo>
</mrow>
<mo>∈</mo>
<mi>S</mi>
<mo>-</mo>
<mi>O</mi>
</mrow>
</munder>
<mi>c</mi>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>,</mo>
<mi>y</mi>
<mo>,</mo>
<mi>d</mi>
<mo>)</mo>
</mrow>
<msup>
<mi>e</mi>
<mrow>
<mn>1</mn>
<mo>-</mo>
<mfrac>
<mi>s</mi>
<mi>n</mi>
</mfrac>
</mrow>
</msup>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mi>r</mi>
<mn>8</mn>
<mo>)</mo>
</mrow>
</mrow>
</math>
where O is the shielded portion in the region S, n is the number of non-shielded pixels in the region S, and S is the number of supported pixels for the plane P in the region S.
Setting the minimum matching cost as a large number, judging whether the calculated matching cost is smaller than the minimum matching cost, and if so, taking the calculated matching cost as the minimum matching cost. The next disparity plane is selected and the above process is repeated until all disparity planes have been processed. The next region is selected and the above process is repeated until all regions have been processed.
And judging whether a joint area with the same parallax plane of adjacent areas exists or not, if so, taking the joint area as the last area, and repeating the initial parallax plane set and the subsequent processing of each area until no joint area exists.
And thirdly, further optimizing the parallax plane of each area by setting a label, and obtaining parallax parameters according to the parallax plane.
A method for obtaining a label of each region to an optimized parallax plane in the prior art includes: and (3) selecting the optimal parallax plane according to the initial labels of the regions and the corresponding parallax planes obtained by the calculation as the initial labels of the regions, and calculating a label efficiency function according to a formula (r 9):
E(f)=Edata(f)+Esmooth(f) (r9)
wherein, f is a labelAnd assigning each region S e R to a corresponding parallax plane f (S) e D, wherein R is a region set of the reference image, and D is an estimated parallax plane set. EdataCalculated according to the formula (r 10):
<math>
<mrow>
<msub>
<mi>E</mi>
<mi>data</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>f</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munder>
<mi>Σ</mi>
<mrow>
<mi>S</mi>
<mo>∈</mo>
<mi>R</mi>
</mrow>
</munder>
<mi>C</mi>
<mrow>
<mo>(</mo>
<mi>S</mi>
<mo>,</mo>
<mi>f</mi>
<mrow>
<mo>(</mo>
<mi>S</mi>
<mo>)</mo>
</mrow>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mi>r</mi>
<mn>10</mn>
<mo>)</mo>
</mrow>
</mrow>
</math>
Esmoothcalculated according to the formula (r 11):
<math>
<mrow>
<msub>
<mi>E</mi>
<mi>smooth</mi>
</msub>
<mrow>
<mo>(</mo>
<mi>f</mi>
<mo>)</mo>
</mrow>
<mo>=</mo>
<munder>
<mi>Σ</mi>
<mrow>
<mi>S</mi>
<mo>,</mo>
<msup>
<mi>S</mi>
<mo>′</mo>
</msup>
</mrow>
</munder>
<msub>
<mi>u</mi>
<mrow>
<mi>S</mi>
<mo>,</mo>
<msup>
<mi>S</mi>
<mo>′</mo>
</msup>
</mrow>
</msub>
<mi>δ</mi>
<mrow>
<mo>(</mo>
<mi>f</mi>
<mrow>
<mo>(</mo>
<msup>
<mi>S</mi>
<mo>′</mo>
</msup>
<mo>)</mo>
</mrow>
<mo>≠</mo>
<mi>f</mi>
<mrow>
<mo>(</mo>
<msup>
<mi>S</mi>
<mo>′</mo>
</msup>
<mo>)</mo>
</mrow>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mi>r</mi>
<mn>11</mn>
<mo>)</mo>
</mrow>
</mrow>
</math>
where S, S ' is the adjacent region, uS, S ' is proportional to the length of the common boundary between the regions S and S '. δ (f) (S) ≠ f (S ')) is 1 when f (S) ≠ f (S'); when f (S) ≠ f (S '), δ (f (S) ≠ f (S')) is 0.
Finding the tag with the smallest tag efficiency function E (f ') from f', marking the tag as f ', and judging whether the tag is E (f' ≦ E (f). If yes, setting f to be f', returning to the step of selecting the optimal parallax plane, and performing cyclic processing; if not, the optimal label is set to f and the process ends. And obtaining the parallax parameter according to the obtained optimal label.
In the process of implementing the invention, the inventor finds that at least the following problems exist in the prior art: in the prior art, when the parallax parameters of each region are optimized, the parallax parameters of the region are optimized only according to the energy information of the region being processed, and when the parallax parameters of one region are changed, the energy of the adjacent region of the region is also influenced, and the parallax optimization of each region is considered in an isolated manner in the prior art, so that more accurate parallax cannot be obtained; and once the selection of the shielding pixel points is inaccurate, clustering layering is inaccurate, so that the obtained optimal label is inaccurate, and the obtained parallax error is large.
Detailed Description
In order to solve the problem that in the prior art, when acquiring the parallax, the parallax error acquired after optimization is larger only according to the processed isolated energy information of the current region, embodiments of the present invention provide a parallax acquisition method and apparatus, which can acquire a more accurate parallax.
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the embodiments of the present invention will be described in detail with reference to the accompanying drawings, and the following description is only some embodiments of the present invention, and it will be obvious to those skilled in the art that other embodiments of the present invention can be obtained according to the embodiments without any creative effort.
As shown in fig. 2, a parallax obtaining method includes:
step 201, acquiring matching energy of each region in a matched image according to an initial parallax parameter, wherein the matched image comprises at least two regions;
step 202, in each area, determining at least one operation area for a current area, where the current area is an area being processed in each area;
step 203, obtaining an optimized parallax parameter of the current area according to the matching energy of the current area and each operation area;
and 204, acquiring the parallax of the current area according to the optimized parallax parameter of the current area.
In the embodiment of the present invention, when acquiring the parallax of each region in the matched image, and when optimizing the initial parallax acquired in the current region by using the matching energy of each region as a criterion function, the acquired optimized parallax parameter of the current region is more accurate according to the matching energy of the current region and the matching energy of each operation region, because the matching energies of different regions are dependent on each other and affect each other, so that the embodiment of the present invention can acquire more accurate parallax according to the optimized parallax parameter.
As shown in fig. 3, in the technical solution provided by the embodiment of the present invention, in step 201, the obtaining, according to an initial parallax parameter, matching energy of each region in a matched image, where the matched image includes at least two regions, further includes:
step 201a, obtaining an initial parallax of pixel points in each region;
step 201b, obtaining the initial parallax parameter of each region according to the initial parallax. The steps are explained in detail below.
The stereo image pair adopted in the embodiment of the invention is a left image and a right image which are shot by two cameras in standard configuration on the same object in the same scene.
The embodiment of the invention adopts a Mean-Shift algorithm (Mean-Shift) to divide the matched image into at least two areas.
Step 201a, obtaining the initial parallax of the pixel points in each area.
The method for obtaining the initial parallax of the pixel point in each region is many, and the embodiment of the present invention may obtain the initial parallax of the pixel point in each region by using the method described in the background art, or obtain the initial parallax of the pixel point in each region by using a self-adaptive correlation window algorithm, which is not limited herein. The adaptive correlation window algorithm is prior art and those skilled in the art can obtain the result of step 201a through the related contents disclosed in the prior art.
The embodiment of the present invention is described with respect to left and right perspective views taken by two cameras in a standard arrangement. As shown in fig. 4, fig. 4 shows the result of processing the Tsukuba stereo image pair according to the embodiment of the present invention, wherein fig. 4(a), (b) are the segmentation image and the initial disparity map of the corresponding left image, respectively.
Step 201b, obtaining the initial parallax parameter of each region according to the initial parallax.
The embodiment of the invention adopts a robust voting-based parallax plane fitting method to obtain the initial parallax parameters of each region. The obtained initial parallax often has a certain error, the voting-based parallax plane fitting method can effectively remove error points, namely outlier data, so that the fitting result is more accurate, and further more accurate parallax can be obtained. As shown in fig. 6, the disparity map is obtained by fitting a disparity plane based on voting. The experimental result shows that the noise in the initial disparity map can be effectively suppressed. Step 201b is described in detail below.
According to the formula d ═ c of the parallax plane model1x+c2y+c3Modeling the parallax plane of each of the regions, wherein c1,c2,c3As a parallax plane parameter, c1Is a first parameter, c2Is a second parameter, c3Is the third parameter.
The step of obtaining the initial parallax parameters of each region comprises the following steps:
s1, acquiring the first parameter c1The method specifically comprises the following steps:
s11, in each area, at least one row pixel point pair is selected in each same pixel row;
the selection of the one row pixel point pair can be an adjacent pair of points or an alternate pair of points.
S12, obtaining candidate first parameters according to the coordinate values of the pixel point pairs of each line and the initial parallax in each area;
and performing correlation calculation on the selected row pixel point pairs, wherein the method for performing correlation calculation on any row pixel point pair comprises the following steps: calculating the ratio of the parallax difference and the coordinate value difference of the pixel point pairs in the row, delta d/delta x, to obtain a plane parameter c1The same calculation is performed on all the selected pixel point pairs in the row to obtain a first parameter c1The respective candidate first parameters of (a);
s13, obtaining the first parameters of each area according to the candidate first parameter votes;
the candidate first parameters are the first parameters c in one dimension1Voting in the parameter space of (a) to generate a frequency number corresponding to (c)1And performing filtering processing on the histogram. The embodiment of the invention adopts Gaussian smooth filtering, and c corresponding to the peak point of the Gaussian smooth filtering, namely the point with the most votes in the voting1As the first parameter c1。
S2, acquiring the second parameter c2The method specifically comprises the following steps:
s21, in each area, at least one column pixel point pair is selected in each same pixel column;
the one column pixel point pair may be selected as an adjacent pair of points or as a pair of spaced points.
S22, in each area, obtaining each candidate second parameter according to the coordinate value and the initial parallax of each row of pixel point pairs;
performing correlation calculation on the selected row pixel point pairs, and performing phase calculation on any row pixel point pairThe method for calculating comprises the following steps: calculating the ratio of the parallax difference and the coordinate value difference of the pixel point pairs in the row, delta d/delta y, to obtain a plane parameter c2And performing the same calculation on all the selected row pixel point pairs to obtain a second parameter c2The respective candidate second parameters of (a).
S23, in each area, voting according to each candidate second parameter to obtain the second parameter;
the candidate second parameters are the one-dimensional second parameters c2Voting in the parameter space of (a) to generate a frequency number corresponding to (c)2And performing filtering processing on the histogram. The embodiment of the invention adopts Gaussian smooth filtering, and the peak point of the Gaussian smooth filtering, namely c corresponding to the point with the most votes in the voting2As a second parameter c2。
S3, acquiring the third parameter c3The method specifically comprises the following steps:
s31, fitting each corresponding candidate third parameter by using a parallax plane according to each candidate first parameter and each corresponding candidate second parameter in each region;
and substituting the candidate first parameters, the corresponding candidate second parameters, the pixel coordinates and the parallax into a parallax plane model formula d-c1x+c2y+c3And obtaining corresponding candidate third parameters.
And S32, obtaining the third parameters of the areas according to the candidate third parameter votes.
Voting the candidate third parameters in the parameter space of the one-dimensional third parameters to generate frequency numbers corresponding to the c3And performing filtering processing on the histogram. The embodiment of the invention adopts Gaussian smooth filtering, and the peak point of the Gaussian smooth filtering, namely c corresponding to the point with the most votes in the voting3As a third parameter c3。
Step 201, acquiring matching energy of each region in a matched image according to an initial parallax parameter, wherein the matched image comprises at least two regions.
The matching energy includes data energy, occlusion energy, and smoothing energy. The matching energy obtained when performing disparity optimization includes the data energy and a combination of at least one of the occlusion energy and the smoothing energy.
The matching energy obtained by the embodiment of the invention comprises the data energy, the shielding energy and the smoothing energy, namely the matching energy of any image area is Ei=Edata+Eocclude+EsmoothWherein E isiFor the matching energy of region i, the index i denotes the ith image region, EdataAs energy of data, EoccludeTo block energy, EsmoothTo smooth the energy.
The specific method for acquiring the matching energy of each region provided by the embodiment of the invention comprises the following steps:
and T1, acquiring data energy of each area.
For data energy EdataCalculated from the following equation (r 12):
<math>
<mrow>
<msub>
<mi>E</mi>
<mi>data</mi>
</msub>
<mo>=</mo>
<mi>Σ</mi>
<munder>
<mi>max</mi>
<mrow>
<mi>p</mi>
<mo>∈</mo>
<msub>
<mi>V</mi>
<mn>1</mn>
</msub>
<mi>and q</mi>
<mo>∈</mo>
<msub>
<mi>V</mi>
<mi>r</mi>
</msub>
</mrow>
</munder>
<mrow>
<mo>(</mo>
<mo>|</mo>
<mi>r</mi>
<mrow>
<mo>(</mo>
<mi>p</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>r</mi>
<mrow>
<mo>(</mo>
<mi>q</mi>
<mo>)</mo>
</mrow>
<mo>|</mo>
<mo>,</mo>
<mo>|</mo>
<mi>g</mi>
<mrow>
<mo>(</mo>
<mi>p</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>g</mi>
<mrow>
<mo>(</mo>
<mi>q</mi>
<mo>)</mo>
</mrow>
<mo>|</mo>
<mo>,</mo>
<mo>|</mo>
<mi>b</mi>
<mrow>
<mo>(</mo>
<mi>p</mi>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mi>b</mi>
<mrow>
<mo>(</mo>
<mi>q</mi>
<mo>)</mo>
</mrow>
<mo>|</mo>
<mo>)</mo>
</mrow>
<mo>-</mo>
<mo>-</mo>
<mo>-</mo>
<mrow>
<mo>(</mo>
<mi>r</mi>
<mn>12</mn>
<mo>)</mo>
</mrow>
</mrow>
</math>
wherein, V1And VrRespectively representing the visible pixel sets of the current region on the left and right images, p ∈ V1、q∈VrFor two corresponding pixels that match on the left and right images, r, g, b represent the RGB color values of the respective pixels. The RGB color values for each pixel are obtained as follows: and calculating a pixel q on the reference image (right image) corresponding to the pixel p on the matched image (left image) according to the estimated parallax plane parameters of the current area, and acquiring the RGB color value of the pixel q. Since the computed disparity is floating point, the pixel q is typically not exactly at an integer pixel position, and its RGB color values are typically not directly available from the image. In the embodiment of the invention, the value is obtained by interpolating the RGB color values of the four q adjacent pixels on the reference image. The data energy is calculated taking into account the visibility criterion, i.e. the pixels required to calculate the data energy are visible in both the left and right image pairs.
The formula for calculating the data energy provided by the technical solution of the embodiment of the present invention is not limited to the formula (r12), and all formulas that are approximately changed based on the formula (r12) belong to the technical solution provided by the embodiment of the present invention, for example, the sum of absolute values of the color component differences of two pixels is used, or the sum of squares of the color component differences of two pixels is used.
And T2, acquiring the shielding energy of each area.
For ease of understanding, brief descriptions of image occlusion will be provided. The occlusion is divided into two cases, left occlusion and right occlusion. As shown in fig. 7, two adjacent areas a and B on the matched image (left image) are considered, a to the right of B. When the parallax of the a region at the adjacent boundary is larger than that of the left adjacent region B thereof, then when a and B are mapped to the reference image (right image) according to the current parallax calculation result, a part of B ' (region D in the figure) will be occluded by a ' (left occlusion), and when the parallax of the a region at the adjacent boundary is smaller than that of the left adjacent region B thereof, a gap (region C in the figure) will appear between the two regions a ' and B ' mapped to the right image, which is equivalent to that a part of the a ' region on the right image is occluded by B (right occlusion). In the embodiment of the present invention, the shielding energy is set as follows: when left occlusion occurs, left occlusion energy is added to the corresponding left occlusion region; when right occlusion occurs, right occlusion energy is added to the corresponding right occlusion region. In practice, the detection of the occluded area can be performed as follows: firstly, mapping each relevant area on a left image to a right image according to a current parallax calculation result, and then checking the mapping condition on the right image; if a certain pixel is repeatedly mapped, marking the pixel as left occlusion, and adding left occlusion energy; if a pixel is not mapped, then the pixel is marked as a right occlusion and right occlusion energy is added.
In summary, the shielding energy E can be obtained from the following equation (r13)occludeComprises the following steps:
Eocclude=(|OccL|+|OccR|)λocc (r13)
wherein, OccL、OccRRespectively representing the number of left and right occlusion pixels, and λoccRepresenting the set occlusion penalty constant.
The formula for calculating the occlusion energy provided by the technical solution of the embodiment of the present invention is not limited to the formula (r13), and all formulas that are approximately changed based on the formula (r13) belong to the technical solution provided by the embodiment of the present invention, for example, different occlusion penalty constants are set for the left occlusion and the right occlusion, or only the left occlusion is processed, or only the right occlusion is processed.
And T3, acquiring the smoothing energy of each area.
The smoothing energy E can be obtained according to the following formula (r14)smoothComprises the following steps:
here, BCA set of boundary points representing the current region on the matched image, N and BCSet of boundary points on other regions of the neighborhood, p ∈ BCQ belongs to two adjacent pixel points in the four-connection meaning, d (p), d (q) are the parallax of the pixels p and q, and lambdaSThe smoothing penalty may be a constant or a function of the difference between the two pixel colors, for example, a 2-fold difference is set when the difference is small, and a 1-fold difference is set when the difference is large. Wherein the difference calculation may be the maximum of the absolute values of the color component differences of the two pixels, or the sum of the squares of the color component differences of the two pixels. The discriminant | d (p) -d (q) | ≧ threshold 1 is used to determine whether a boundary point on the current region is a point with discontinuous parallax. The threshold value 1 may take a number greater than 0, such as 0.5 or 1. As an additional condition, it is required that p is not an occlusion boundary point. The equation (r14) gives the sum of the smoothing penalty energies applied to boundary points on the current region that have discontinuous disparity and do not belong to occlusion boundaries.
Step 202, in each area, determining at least one operation area for a current area, where the current area is an area being processed in each area.
The operation area of the current area may be an image area adjacent to the current area, or an image area not adjacent to the current area, and the number of the selected operation areas is appropriate in consideration of calculation amount and optimization effect.
Step 203, obtaining the optimized parallax parameter of the current region according to the matching energy of the current region and the matching energy of each operation region, wherein the step specifically comprises:
u1, taking the parallax parameter of each operation area or the parallax parameter generated by the weighted combination of the parallax parameter of each operation area and the parallax parameter of the current area as a new parallax parameter of the current area, and acquiring each to-be-selected matching energy of the corresponding current area according to the new parallax parameter;
the embodiment of the invention takes and matches the current region (the energy E is matched with the current region)i(x) And i represents the ith area) as the optimum operation area. U1 will be described by taking the processing of each adjacent area and the current area as an example.
And taking the parallax parameter of each adjacent area of the current area or a new parallax parameter generated by the weighted combination of the parallax parameter of each adjacent area and the parallax parameter of the current area as the parallax parameter of the current area. And according to the new parallax parameter, re-acquiring the data points, the shielding points and the smooth point pixels of the current region and each operation region, updating the matching energy of the current region according to the pixel points, and obtaining the updated matching energy Ei' (x) is the respective candidate matching energies for the current region.
U2, acquiring the updated matching energy of each operation area according to the new parallax parameter;
according to the new parallax parameter, re-obtaining the data points, the shading points and the smooth point pixels of the current area and each operation area, updating the matching energy of each operation area according to the pixel points,obtaining updated matched energy E'j(x) And j is the label of the selected adjacent area.
U3, obtaining each collaborative optimization energy of each candidate matching energy and corresponding updating matching energy of each operation area
The embodiment of the invention sets the weighting coefficient lambda
k,w
ij,0≤λ
k≤1,0≤w
ij≦ 1 to construct the co-optimized energy. The collaborative optimization energy is as follows:
and U4, acquiring the optimized parallax parameters of the current area according to the collaborative optimization energies.
The smaller the collaborative optimization energy is, the more reasonable the parallax parameter corresponding to the collaborative optimization energy is. In the embodiment of the invention, the minimum value is selected from the collaborative optimization energies, and the parallax parameter adopted by the current area when the minimum collaborative optimization energy is obtained is used as the optimized parallax parameter of the current area.
And 204, acquiring the parallax of the current area according to the optimized parallax parameter of the current area.
And obtaining the parallax of the current area according to the optimized parallax parameters of the current area by a parallax plane model formula.
In step 203, a processing method for a current region is introduced, and in the embodiment of the present invention, iterative processing is successively performed on all regions in an image by using a collaborative optimization method according to the same method, so as to complete processing on the whole image, and iterative processing can be repeated on the whole image according to the same method for multiple times, so as to obtain a better parallax result. In the embodiment of the present invention, four iterations are performed, and as shown in fig. 8, the Tsukuba image is subjected to four collaborative optimizations, and then the disparity is corrected, where (a) to (b) are disparity maps obtained from the first to fourth iterations, and e in the map is the sum of collaborative energies of all regions of the entire image.
The embodiment of the present invention has been described by taking processing of Tsukuba images as an example, but the embodiment of the present invention is not limited to be applied only to a specific image, as shown in fig. 9, in order to obtain the processing results of the standard images Tsukuba, Venus, Teddy, and cons by using the method provided by the embodiment of the present invention, the results obtained by using the method provided by the embodiment of the present invention are very close to the real parallax, and an ideal parallax result can be obtained.
In the embodiment of the present invention, when acquiring the parallax of each region in the matched image, and when optimizing the initial parallax acquired in the current region by using the matching energy of each region as a criterion function, the acquired optimized parallax parameter of the current region is more accurate according to the matching energy of the current region and the matching energy of each operation region, because the matching energies of different regions are dependent on each other and affect each other, so that the embodiment of the present invention can acquire more accurate parallax according to the optimized parallax parameter.
An embodiment of the present invention further provides a parallax obtaining apparatus, which can obtain a more accurate parallax, as shown in fig. 10, the apparatus includes:
the matching energy acquisition unit is used for acquiring the matching energy of each region in a matched image according to an initial parallax parameter, and the matched image comprises at least two regions;
an operation area determining unit, configured to determine at least one operation area for a current area in each of the areas, where the current area is an area being processed in each of the areas;
an optimized parallax parameter obtaining unit, configured to obtain an optimized parallax parameter according to the current region and the matching energy of each operation region determined by the operation region determining unit;
and the parallax value acquisition unit is used for acquiring the parallax of the current area according to the optimized parallax parameter acquired by the optimized parallax parameter acquisition unit.
In the embodiment of the present invention, when acquiring the parallax of each region in the matched image, and when optimizing the initial parallax acquired in the current region by using the matching energy of each region as a criterion function, the acquired optimized parallax parameter of the current region is more accurate according to the matching energy of the current region and the matching energy of each operation region, because the matching energies of different regions are dependent on each other and affect each other, so that the embodiment of the present invention can acquire more accurate parallax according to the optimized parallax parameter.
As shown in fig. 11, the apparatus provided in the embodiment of the present invention further includes:
an initial parallax acquiring unit, configured to acquire an initial parallax of a pixel point in each region;
and a parallax parameter acquiring unit, configured to acquire the initial parallax parameter of each region according to the initial parallax acquired by the initial parallax acquiring unit.
The parallax parameter includes a first parameter, a second parameter, and a third parameter, and the parallax parameter acquiring unit includes:
a first parameter obtaining module, configured to select at least one row pixel point pair in each same pixel row in each region; then in each area, obtaining each candidate first parameter according to the pixel value and the initial parallax of each row of pixel point pairs; finally, voting according to the candidate first parameters to obtain the first parameters of the regions;
a second parameter obtaining module, configured to select at least one row pixel point pair in each same pixel row in each region; then in each region, obtaining each candidate second parameter according to the coordinate value and the initial parallax of each row of pixel point pairs; finally, voting according to the candidate second parameters to obtain the second parameters of the regions;
and a third parameter obtaining module, configured to fit, according to each candidate first parameter obtained by the first parameter obtaining module and each corresponding candidate second parameter obtained by the second parameter obtaining module, each corresponding candidate third parameter by using a disparity plane, and obtain a third parameter of each region according to each candidate third parameter vote.
In the embodiment of the invention, the initial parallax parameters of each region are obtained by the parallax parameter obtaining unit by adopting a robust voting-based parallax plane fitting method. Because the obtained initial parallax often has certain errors, the parallax parameter obtaining unit can effectively remove error points, namely outlier data, through the voting-based parallax plane fitting method, so that the fitting result is more accurate, and further, the fact that more accurate parallax can be obtained is ensured,
the optimized parallax parameter acquiring unit includes:
a matching energy processing module, configured to use the parallax parameter of each operation region or a parallax parameter generated by a weighted combination of the parallax parameter of each operation region and the parallax parameter of the current region as a new parallax parameter of the current region, and obtain each matching energy to be selected of the corresponding current region according to the new parallax parameter,
the operation area energy updating module is used for acquiring updated matching energy of each operation area according to the new parallax parameter;
the collaborative optimization energy acquisition module is used for calculating each piece of collaborative optimization energy of each to-be-selected matching energy and the corresponding matching energy of each operation area according to a preset weighting coefficient;
and the optimized parallax parameter acquisition module is used for acquiring the optimized parallax parameters of the current area according to the collaborative optimization energies.
According to the embodiment of the invention, the optimized parallax parameter acquisition unit is utilized to successively carry out iterative processing on all regions in one image by adopting a collaborative optimization method, so that the processing on the whole image is completed, and the iterative processing can be repeated for a plurality of times on the whole image according to the same method, so that a better parallax result can be obtained.
The matching energy in the embodiment of the invention comprises data energy, shielding energy and smoothing energy. The matching energy obtained when performing parallax optimization includes the data energy and a combination of at least one of occlusion energy and smoothing energy, and the matching energy obtaining unit has three conditions:
in a first case, when the matching energy includes data energy and occlusion energy, the matching energy obtaining unit includes:
the first pixel point acquisition module is used for acquiring data point pixels and shielding point pixels in each region;
the data energy acquisition module is used for acquiring the data energy according to the data point pixel acquired by the first pixel point acquisition module;
and the shielding energy acquisition module is used for acquiring shielding energy according to the shielding point pixels acquired by the first pixel point acquisition module.
In a second case, when the matching energy includes data energy and smoothing energy, the matching energy obtaining unit includes:
the second pixel point acquisition module is used for acquiring data point pixels and smooth point pixels in each region;
the data energy acquisition module is used for acquiring data energy according to the data point pixels acquired by the second pixel point acquisition module;
and the smooth energy acquisition module is used for acquiring smooth energy according to the smooth point pixels acquired by the second pixel point acquisition module.
In a third case, when the matching energy includes data energy, occlusion energy, and smoothing energy, the matching energy obtaining unit includes:
a third pixel point obtaining module, configured to obtain a data point pixel, a blocking point pixel, and a smooth point pixel in each region;
the data energy acquisition module is used for acquiring the data energy according to the data point pixel acquired by the third pixel point acquisition module;
the shielding energy acquisition module is used for acquiring the shielding energy according to the shielding point pixel acquired by the third pixel point acquisition module;
and the smooth energy acquisition module is used for acquiring the smooth energy according to the smooth point pixel acquired by the third pixel point acquisition module.
In the embodiment of the invention, a third condition is adopted, the matching energy comprises data energy, shielding energy and smoothing energy, and the parallax obtained during parallax optimization is more accurate.
The specific working methods of the units and modules in the device embodiment of the present invention may refer to the method embodiment of the present invention, and are not described herein again.
In the embodiment of the present invention, when acquiring the parallax of each region in the matched image, and when optimizing the initial parallax acquired in the current region by using the matching energy of each region as a criterion function, the acquired optimized parallax parameter of the current region is more accurate according to the matching energy of the current region and the matching energy of each operation region, because the matching energies of different regions are dependent on each other and affect each other, so that the embodiment of the present invention can acquire more accurate parallax according to the optimized parallax parameter.
Those skilled in the art will appreciate that all or part of the steps in the above embodiments may be implemented by hardware associated with program instructions. The software corresponding to the embodiment can be stored in a computer storage readable medium.
There are, of course, many possible embodiments of the invention and many modifications and variations may be made by one skilled in the art without departing from the spirit and scope of the embodiments of the invention without departing from the spirit and scope of the invention as defined in the appended claims.