CN115018934A - Three-dimensional image depth detection method combining cross skeleton window and image pyramid - Google Patents

Three-dimensional image depth detection method combining cross skeleton window and image pyramid Download PDF

Info

Publication number
CN115018934A
CN115018934A CN202210792911.7A CN202210792911A CN115018934A CN 115018934 A CN115018934 A CN 115018934A CN 202210792911 A CN202210792911 A CN 202210792911A CN 115018934 A CN115018934 A CN 115018934A
Authority
CN
China
Prior art keywords
image
pixel
points
window
disparity map
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202210792911.7A
Other languages
Chinese (zh)
Other versions
CN115018934B (en
Inventor
刘之涛
夏越
苏宏业
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202210792911.7A priority Critical patent/CN115018934B/en
Publication of CN115018934A publication Critical patent/CN115018934A/en
Application granted granted Critical
Publication of CN115018934B publication Critical patent/CN115018934B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/80Analysis of captured images to determine intrinsic or extrinsic camera parameters, i.e. camera calibration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/761Proximity, similarity or dissimilarity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20016Hierarchical, coarse-to-fine, multiscale or multiresolution image processing; Pyramid transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20021Dividing image into blocks, subimages or windows
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20228Disparity calculation for image-based rendering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30244Camera pose

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a three-dimensional image depth detection method combining a cross-shaped skeleton window and an image pyramid. Acquiring left and right images in real time by using a binocular camera, and calculating an initial disparity map; calculating gradient information in X and Y directions to obtain a descriptor; constructing a cross skeleton window by combining the color similarity and the distance constraint of the gray level image, and selecting the branch with the minimum length as the minimum arm length; constructing an image pyramid by performing Gaussian downsampling on the stereo image for multiple times; and sequentially updating the adjacent higher layers by the lower layers by using the initial disparity map and the cross skeleton window, and processing layer by layer upwards to obtain the final disparity map so as to obtain the depth of the stereo image. The invention adopts the cross skeleton window to improve the parallax precision of the support points and is easy to parallelize; meanwhile, more support points are obtained in the weak texture area by combining the image pyramid and utilizing multi-scale information, so that the phenomenon of cross-edge connection is improved; and the parallax searching range of high resolution is narrowed, the mismatching support points are reduced, and the time of high resolution is effectively shortened.

Description

Three-dimensional image depth detection method combining cross skeleton window and image pyramid
Technical Field
The invention relates to a binocular stereo matching method of depth estimation and stereo matching direction, and relates to a stereo image depth detection method combining a cross skeleton window and an image pyramid.
Background
Depth estimation is one of the most important issues in computer vision. The depth estimation from the binocular stereo image pair is the core theme of low-level vision, the key task of the method is to find the corresponding relation of space pixels in the image pair, namely stereo matching, and then the three-dimensional geometric information and the depth information of a scene are obtained by utilizing the imaging principle and triangulation. The stereo matching determines the pixel coordinate information of the target point in the image pair, and calculates the parallax value of the target point, which is the most challenging research content in the binocular stereo vision system and is also the core research content. However, due to the fact that the received illumination is inconsistent, the binocular stereo matching accuracy is greatly affected due to the fact that the binocular stereo matching accuracy is affected by the problems of no texture, weak texture, shielding and the like, and how to design a stereo matching algorithm capable of effectively avoiding interference has great challenge.
The conventional stereo matching algorithm generally consists of 4 steps: matching cost calculation, cost aggregation, parallax calculation and parallax refinement. Generally, the stereo matching algorithm mainly comprises a global algorithm and a local algorithm. Global algorithms typically solve the optimization problem by minimizing a global objective function, which contains data and smoothing terms. Many techniques are used to solve this NP problem with difficulty and efficiency, but are computationally expensive and are rarely used in real-time systems. The local stereo matching algorithm utilizes the pixel information in the neighborhood to carry out constraint, so the calculation amount is low, and the efficiency is higher than that of the global algorithm. However, local algorithms are susceptible to image noise and matching ambiguities may occur in poorly textured or repetitive textured areas. The high-efficiency and high-precision stereo matching algorithm plays a key role in many practical applications, such as robot navigation, automatic driving, unmanned aerial vehicles and other fields. It is still a challenge how to efficiently obtain high precision parallax in a large image pair. ELAS is an efficient high-resolution stereo method, and can complete stereo matching or depth estimation in linear time. However, cross-edge connection is easy to occur, the influence of mismatching points is large, and the parallax precision needs to be improved in the weak texture region.
Disclosure of Invention
In view of the above, in order to solve the problems in the background art, the present invention is directed to a method CS-ELAS for detecting depth of a stereo image by combining a cross skeleton window and an image pyramid.
In order to achieve the above purpose, the technical scheme adopted by the invention is as follows:
step 1: acquiring left and right images in real time through a binocular camera, wherein the left and right images form an original stereo image, and calculating a disparity map between the left and right images according to a variation relation between a target and a background in the left and right images and taking the disparity map as an initial disparity map of the stereo image; the disparity map is a map representing the distance relationship between the object and the background in the left and right images.
Step 2: calculating gradient information of the stereoscopic image in the X direction and the Y direction, and obtaining the gradient information of each pixel point in the X direction and the Y direction, wherein the X direction is along the transverse direction of the image, and the Y direction is along the longitudinal direction of the image, so as to obtain a descriptor of the pixel point;
and step 3: constructing a cross skeleton window by combining the color similarity and the distance constraint of the gray level image, wherein the cross skeleton window is provided with four branches, the branch with the minimum length in the four branches is selected from the cross skeleton window, the length of the branch with the minimum length is used as the minimum arm length of the cross skeleton window, and the minimum arm length is used as the radius of a matching window for subsequently calculating the matching similarity of each pair of left and right image pairs;
and 4, step 4: performing multiple Gaussian downsampling on an original stereo image to construct an image pyramid;
and 5: and sequentially updating the stereo images of the adjacent higher layers of the image pyramid by using the initial disparity map and the cross skeleton window and the stereo images of the lower layers of the image pyramid, processing the image pyramid layers upwards, and obtaining a final disparity map so as to obtain the depth of the stereo images.
According to the invention, the image pyramid is processed layer by combining the parallax map and the cross skeleton window, so that the depth of the three-dimensional image can be rapidly and accurately obtained.
The step 2 specifically comprises the following steps:
a gradient window is established for the X-direction and the Y-direction, respectively:
the gradient window in the X direction is 7X7, the gradient information in the X direction of 24 pixel points is selected in the gradient window, and the 24 pixel points comprise 22 pixel points which are uniformly selected in eight directions along the neighborhood of the central pixel of the gradient window and a central pixel point which is selected twice;
the gradient window in the Y direction is 5x5, the gradient information in the Y direction of 8 pixel points is selected in the gradient window, and the 8 pixel points comprise four-corner pixel points and four-side center pixel points of the gradient window;
aiming at each pixel point, obtaining gradient information in the X direction and gradient information in the Y direction through gradient windows in the X direction and the Y direction, and forming a descriptor of the pixel point by the gradient information in the X direction and the gradient information in the Y direction of the pixel point;
the processing for the stereo image mentioned in the method of the present invention is to perform the same processing for each of the left and right images in the stereo image.
In the step 4, the image pyramid comprises an original stereo image and the stereo image after each Gaussian down-sampling; the original stereo image has the highest resolution, and the highest layer of the image pyramid is formed; the resolution of the stereo image after each Gaussian down-sampling is reduced, the stereo image after each Gaussian down-sampling forms one layer of an image pyramid, and the resolution of the stereo image after the last Gaussian down-sampling is the lowest, so that the lowest layer of the image pyramid is formed.
The step 5 specifically comprises the following steps:
step 5.1:
aiming at the stereo image at the lowest layer of the image pyramid, processing according to the disparity map at the lowest layer of the image pyramid and a cross skeleton window to obtain a support point set and confidence degrees of the lowest layer of the image pyramid as all initial support points and confidence degrees thereof by taking the initial disparity map as the disparity map at the lowest layer of the image pyramid;
step 5.2:
for the stereo image of the current layer of the image pyramid, performing Delaunay triangulation by taking a support point with a confidence coefficient higher than a preset first confidence coefficient threshold value as a vertex of the triangulation to establish a plurality of triangular parallax planes;
in the current disparity map, performing linear interpolation on all pixels in the triangular disparity plane through the triangular disparity plane and three vertex information of the triangular disparity plane to update the disparity map, simultaneously processing according to the updated disparity map and the cross skeleton window to obtain the matching similarity and confidence coefficient of each pixel in the stereo image, forming a confidence coefficient map by the confidence coefficients of all the pixels, and obtaining a confidence coefficient map of the stereo image at the current layer of the image pyramid;
step 5.3:
obtaining a disparity threshold value prior image and a confidence value prior image with higher resolution by respective nearest neighbor interpolation processing according to a disparity map of a current layer of an image pyramid and a confidence value map of a stereo image, selecting pixel points with confidence values higher than a preset second confidence value threshold value in the confidence value prior image as supplementary support points of a higher layer of the image pyramid, and assigning and updating the disparity values of the supplementary support points in the disparity threshold value prior image to the updated disparity map as the disparity values of the pixel points corresponding to the supplementary support points, thereby obtaining the disparity map of the higher layer of the image pyramid;
step 5.4:
processing the stereo image of the higher layer of the image pyramid according to the disparity map of the higher layer of the image pyramid and the cross skeleton window to obtain a support point set of the higher layer of the image pyramid and a confidence coefficient of the support point set;
and step 5.5: supplementing the supplementary support points obtained in the step 5.3 into the support point set obtained in the step 5.4 to obtain a union set;
step 5.6: and returning to the step 5.2, continuously repeating the step 5.2 to the step 5.5 to perform iterative updating until a higher layer of the image pyramid is obtained, namely the highest resolution is obtained, obtaining a disparity map of the higher layer of the image pyramid, namely the disparity map of the highest resolution, obtaining the disparity map of the highest layer of the image pyramid layer as a final disparity map, and obtaining the depth of the stereoscopic image by using the final disparity map.
In the step 5.2, processing is performed according to the updated disparity map and the cross skeleton window to obtain the matching similarity and the confidence of each pixel in the stereo image, specifically:
5.2.1, taking one of the stereo images as a current image and the other image as a reference image, and traversing each pixel point in the current image as a current pixel point according to the following modes:
then, a pixel point corresponding to the parallax of the current pixel point in the reference image is found by using the updated parallax image, the matching similarity between the current pixel point and the pixel point corresponding to the parallax of the current pixel point is calculated by using the descriptor of the current pixel point and the pixel point corresponding to the parallax of the current pixel point to serve as the matching similarity of the current pixel point, and then the confidence of the current pixel point is calculated according to the matching similarity of the current pixel point;
and 5.2.2, interchanging the current image and the reference image, and repeating the step 5.2.1 to obtain the matching similarity and the confidence coefficient of each pixel of the two images in the stereo image.
The step 5.1 and the step 5.4 are both specifically as follows:
any one of the stereo images is used as a current image, and the other one is used as a reference image;
uniformly sampling the current image along the horizontal and vertical coordinates by using pixel points with fixed step length to obtain candidate support points, and traversing each candidate support point according to the following modes:
in the current image, firstly, a matching window neighborhood is established by taking a candidate support point as a central pixel and taking the minimum arm length of a cross framework window as the radius of the neighborhood, and 9 key points are selected in the matching window neighborhood; the 9 key points comprise four-corner pixel points, four-side center pixel points and center pixel points of a matching window neighborhood.
Then, a disparity map is utilized to find pixel points corresponding to the disparity of the key points in the reference image, descriptors of the key points and the pixel points corresponding to the disparity of the key points are utilized to calculate to obtain matching similarity between the key points and the pixel points corresponding to the disparity of the key points, the matching similarity is used as the matching similarity of the key points, and the matching similarity of all the key points is added to obtain the matching similarity of the candidate support points;
then, judging the matching similarity of the candidate support points:
if the matching similarity is larger than a preset similarity threshold, reserving the candidate support points as support points;
if the matching similarity is not greater than a preset similarity threshold, discarding the candidate support points, and not taking the candidate support points as the support points;
and traversing each candidate support point to obtain all support points, constructing a support point set by all the support points, and calculating the confidence of the support points according to the matching similarity of the support points.
The parallax correspondence is determined by the parallax relation in the parallax map.
In said step 5.1 and step 5.4, the central pixel (u) is calculated n ,v n ) When the matching similarity of the candidate support points is obtained, the matching similarity of all the key points is added to obtain the matching similarity of the candidate support points, specifically, a descriptor of 9 pixel points is selected in a (2a +1) x (2a +1) neighborhood of the candidate support points according to the following formula to calculate the similarity of the candidate support points, and the similarity maximization is equal to the cost function minimization to calculate:
E(u n ,v n ,d n )=∑||f(u (l) ,v (l) )-f(u (r) ,v (r) )|| 1 if, if
Figure BDA0003731021280000041
Wherein the content of the first and second substances,
Figure BDA0003731021280000042
Figure BDA0003731021280000043
wherein a is the minimum arm length of the cross skeleton window obtained in the step 3, n represents the pixel subscript of the image, and u represents the pixel subscript of the image n ,v n Respectively representing the abscissa, d, of the nth pixel of the image n Then the pixel point is traversed by the parallax value, E (u) n ,v n ,d n ) Representing a pixel (u) n ,v n ) Parallax is d n Is matched with the cost,u (r) ,v (r) Respectively representing the horizontal and vertical coordinates of the selected key points in the current image, l representing the current image, u (r) ,v (r) Respectively representing the pixel abscissa and ordinate of the corresponding key point in the reference image, r representing the reference image, | | 1 Representing the L1 norm.
In the step 4, the image is downsampled to obtain images with different resolutions, an image pyramid is constructed, and the lowest resolution image obtained by downsampling is used as the (s-0) -th layer image.
And upsampling the disparity map and the confidence map of the image pyramid lower resolution layer through nearest neighbor interpolation to be used as a priori of the image pyramid higher resolution layer.
Firstly, the parallax value of the lower resolution layer of the image pyramid is utilized to reduce the parallax searching range when the high resolution computing support point is calculated, and the computing cost of the higher resolution layer of the image pyramid is also reduced; and secondly, taking the parallax value of the pixel point with higher reliability in the lower resolution layer of the image pyramid than that in the higher resolution layer of the image pyramid as the parallax value of the point in the higher resolution layer of the image pyramid, and updating the support point set.
The concrete expression is as follows:
Figure BDA0003731021280000051
if it is not
Figure BDA0003731021280000052
And (u) m ,v m )∈S
Wherein S represents a set of support points, m represents a subscript of a support point in the set, and (u) represents a subscript of a support point in the set m ,v m ) Pixel coordinate, u, representing the m-th support point m Denotes the abscissa, v m Represents the ordinate; s represents the s-th layer of the image pyramid, with increasing image resolution as s increases;
Figure BDA0003731021280000053
representing the m-th support of the s-th layer with confidence after one nearest neighbor upsamplingThe confidence value at the point pixel or pixels,
Figure BDA0003731021280000054
representing a confidence value at the mth support point pixel of the s +1 th layer image;
Figure BDA0003731021280000055
the disparity map of the s-th layer represents the disparity value at the m-th support point pixel after being upsampled by the nearest neighbor,
Figure BDA0003731021280000056
representing the disparity value at the mth support point pixel of the s +1 th layer image.
The weak texture regions in the image have fewer support points, and cross-edge connection is easy to occur, so that a plurality of obvious triangular regions exist in the parallax map result. And if the support point of the mismatching occurs, the parallax value error obtained by utilizing triangulation and linear interpolation is obviously increased.
The invention adopts the cross skeleton window to improve the parallax precision of the support points, and the support points are easy to parallelize. Meanwhile, by combining the image pyramid and utilizing multi-scale information, more support points are obtained in the weak texture area, and the phenomenon of cross-edge connection is improved. And the parallax searching range of high resolution is narrowed, and the time of high resolution is effectively shortened.
The invention has the beneficial effects that:
(1) according to the method, the cross-shaped framework window is constructed for each pixel, the window size during matching similarity calculation is expanded in a self-adaptive mode, edge information can be better sensed, and the parallax accuracy of the support points can be improved. Meanwhile, the number of matching support points is reduced, and the occurrence of mismatching support points is reduced.
(2) According to the invention, information with different resolutions is combined from coarse to fine through an image pyramid. Firstly, a disparity map and a confidence map obtained by low resolution are subjected to nearest neighbor upsampling to serve as a disparity threshold prior and a confidence prior when a high resolution calculation support point is used, so that the disparity search range can be reduced, and the calculation amount can be reduced. And meanwhile, the parallax value of the pixel point with higher confidence value in lower resolution and higher resolution is used as the parallax value of the point in the higher resolution layer of the image pyramid, and the support point set of the image pyramid is updated, so that more support points can be obtained in the weak texture region.
Drawings
FIG. 1 is a construction diagram of a descriptor of the present invention.
FIG. 2 is a graph comparing the number of support points according to the present invention
FIG. 3 is a system flow diagram of the present invention.
Detailed Description
The invention will be further described with reference to the following figures and specific examples, but the scope of the invention is not limited thereto.
As shown in fig. 3, an embodiment of the present invention is as follows:
step 1: acquiring left and right images in real time through a binocular camera, wherein the left and right images form an original stereo image, and calculating a disparity map between the left and right images according to a change relation between a target and a background in the left and right images and using the disparity map as an initial disparity map of the stereo image;
step 2: and filtering the left and right stereo images subjected to epipolar line correction to respectively obtain gradient images in the X direction and the Y direction. Respectively from the central pixel point (u) n ,v n ) The descriptor f (u) of the feature point is constructed by selecting 24 gradient values in the X direction in the 7X7 neighborhood and 8 gradient values in the Y direction in the 5X5 neighborhood n ,v n ) The selection is as shown in FIG. 1. As can be seen from fig. 1, the gradient information is selected in each direction of the central pixel, so that the change in different directions can be sensed better, and the matching accuracy is improved.
And step 3: and constructing a cross skeleton window for each pixel, respectively performing extension operation in four directions of the upper direction, the lower direction, the left direction and the right direction of the central pixel point by taking color similarity and distance constraint as rules to obtain arm length values in the four directions, and taking the minimum arm length as a. Calculating the center pixel (u) n ,v n ) When the matching similarity is obtained, descriptors of 9 pixel points are selected in the (2a +1) × (2a +1) neighborhood to calculate the similarity of the matching pair. Matching energy function:
E(u n ,v n ,d n )=∑||f(u (l) ,v (l) )-f(u (r) ,v (r) )|| 1 if, if
Figure BDA0003731021280000061
Wherein the content of the first and second substances,
Figure BDA0003731021280000062
Figure BDA0003731021280000063
where a is the minimum arm length of the cross-frame window of the center pixel, n represents the pixel index of the image, i.e., the nth pixel, and u represents the minimum arm length of the cross-frame window of the center pixel n ,v n Respectively representing the abscissa, d, of the nth pixel in the image n The value of the parallax traversed by the pixel point, E (u) n ,v n ,d n ) Representing a pixel (u) n ,v n ) Parallax is d n Matching cost of u (l) ,v (l) Respectively representing the horizontal and vertical coordinates of the pixels of the key points selected near the central pixel in the current image, l representing the current image, u representing the current image (r) ,v (r) Respectively representing the pixel abscissa and ordinate of the corresponding keypoint in the reference image, r representing the reference image, | | 1 Representing the L1 norm.
In a specific implementation, the parameters are kept consistent when the arm lengths of the images with different resolutions are calculated. The similarity maximization is converted into an energy function minimization problem. The method for calculating the support points by comparing the sizes of the cross skeleton adaptive expansion matching window and the fixed window is tested on all pictures on the Middlebury2014 full-resolution training data set, wherein the number of the support points is shown in fig. 2, and the evaluation result is shown in table 1:
table 1 comparison of support points
Algorithm bad1.0 avgerr
Fixed window 0.1204 4.6272
Cross skeleton window 0.0916 3.7828
Wherein bad1.0 represents the percentage of the pixel number with the difference between the parallax value obtained by the experiment and the real parallax value larger than 1.0 in the pixel number of the full image, and averr represents the average value of the absolute difference between the parallax value obtained by the experiment and the real value. The abscissa axis in fig. 2 represents the image name in the data set, and the ordinate axis represents the number of support points. As can be seen from table 1 and fig. 2, the number of support points obtained by using the cross-shaped frame window is slightly reduced, the error rate is reduced, and the parallax accuracy is improved, as compared with the support points obtained by fixing the size of the matching window.
And 5: and upsampling the low-resolution disparity map and the confidence map through nearest neighbor interpolation to be used as a high-resolution priori. Firstly, the parallax value obtained by the lower resolution image is utilized to reduce the parallax searching range when the higher resolution image is used for calculating the support point, and the calculating cost of the higher resolution image is also reduced; and secondly, taking the parallax value of the pixel point with high reliability in higher resolution in lower resolution as the parallax value of the point in high resolution, and updating the support point set.
In a specific implementation, the candidate support point (u) of higher resolution (s +1 layer) m ,v m ) Disparity search range of
Figure BDA0003731021280000071
The calculation is as follows:
Figure BDA0003731021280000072
wherein [ D ] min ,D max ]Is a parallax search range, D, set in advance for parallax calculation min To set minimum value of parallax, D max For the maximum value of parallax set, the value was set to [0,790 ] in the experiment](ii) a t is a constant value set, and is set to 10 in the experiment; s represents the s-th layer of the image pyramid, with increasing image resolution as s increases; m represents the index of the support point in the set of support points, (u) m ,v m ) Pixel coordinate, u, representing the m-th support point m Representing the abscissa, v m Represents the ordinate;
Figure BDA0003731021280000073
the confidence map representing the s-th layer image has confidence values at the m-th support point pixel after nearest neighbor upsampling,
Figure BDA0003731021280000081
the disparity map representing the s-th layer image is subjected to the disparity value at the m-th support point pixel after the nearest neighbor up-sampling.
Taking the pixel point information with higher belief ratio than high resolution in the low resolution layer as a support point to update a support point set obtained by a high resolution image:
Figure BDA0003731021280000082
if it is not
Figure BDA0003731021280000083
And (u) m ,v m )∈S
Wherein S represents a set of support points,
Figure BDA0003731021280000084
representing the confidence value at the m support point pixel of the s +1 th layer image,
Figure BDA0003731021280000085
representing the disparity value at the mth support point pixel of the s +1 th layer image.
As shown in fig. 2, after the image pyramid is combined on the basis of the cross skeleton, the number of the support points obtained by calculation is reduced, and the probability of occurrence of the mismatching points is reduced.
The general flow of the invention is shown in fig. 3, according to which the algorithm evaluation is performed. Table 2 shows the evaluation and comparison results of the algorithm, and it can be known that, on the one hand, the accuracy and precision of the parallax are improved, and on the other hand, the characteristics of high speed and high efficiency are maintained in the Middlebury and KITTI two public binocular stereo data sets.
Table 2 data set evaluation results and comparisons
Figure BDA0003731021280000086
Wherein bad1.0 represents the percentage of the number of pixels with the difference between the parallax value obtained by the experiment and the real parallax value larger than 1.0, and bad2.0 represents the percentage of the number of pixels with the difference between the parallax value obtained by the experiment and the real parallax value larger than 2.0; the averr represents the average value of absolute differences between the parallax value and the true value obtained by the experiment; time represents the time spent. Out-Noc represents the percentage of the number of pixels with the parallax error larger than the threshold value 3 in the non-shielding area, and Out-All represents the percentage of the number of pixels with the parallax error larger than the threshold value 3 in All areas; the Avg-Noc represents the average value of the absolute difference values of the parallaxes of the non-shielding areas, the Avg-All represents the average value of the absolute difference values of the parallaxes of All the areas, and the px represents the unit of the pixel.
Table 3 and table 4 show the results of evaluating the dense disparity and the sparse disparity of each algorithm in the Middlebury2014 data set. Where H denotes evaluation using a half-resolution image and F denotes evaluation using a full-resolution image. The thickened evaluation result of the algorithm provided by the invention is improved in the evaluation of dense parallax and sparse parallax.
Table 3 Middlebury2014 training dataset dense disparity map non-occlusion region bad1.0
Figure BDA0003731021280000087
Figure BDA0003731021280000091
Table 4 Middlebury2014 training dataset sparse disparity map non-occlusion region bad1.0
ArtL Motor Piano Pipes Playrm PlaytP Recyc Shelvs Teddy Vintge
SGM(F) 3.59 17 8.63 7.9 13.6 7.17 8.15 15.5 3.6 7.24
SNCC(H) 8.63 20.1 8.36 10.5 17.2 10.1 12.1 19.7 4.99 10.4
CS-ELAS(F) 5.97 15.4 13.2 9.54 16.5 12.2 13.3 20.9 5.49 14.8
SGM-Forest(H) 8.23 8.4 17.2 8.23 21.7 15.1 14 39.1 6.06 26
DISCO(H) 7.36 14.5 15.8 17.4 22.3 12.3 11.9 24 9.17 29.2
LS-ELAS(F) 4.94 21.6 21.3 13.4 28.4 15.7 25.2 31.5 6.49 29.7
ELAS(F) 5.94 24.3 33.2 18.5 37.7 19.7 36.3 51.3 11.3 27.2
DecStereo(F) 27.4 25.3 24.7 26.1 30.2 22.8 30.8 45.1 13 40.9
From all the evaluation results, the cross skeleton provided by the invention not only reduces the number of image support points and the occurrence probability of mismatching points, but also reduces the average value of absolute parallax error values and improves the parallax precision of the support points; the image pyramid provided by the invention combines high-resolution and low-resolution image information, increases support points of a weak texture area, improves the whole parallax, and improves the disclosed binocular stereo data set.

Claims (7)

1. A three-dimensional image depth detection method combining a cross skeleton window and an image pyramid is characterized by comprising the following steps:
step 1: acquiring left and right images in real time through a binocular camera, wherein the left and right images form an original stereo image, and calculating a disparity map between the left and right images according to a change relation between a target and a background in the left and right images and using the disparity map as an initial disparity map of the stereo image;
step 2: calculating gradient information of the three-dimensional image in the X direction and the Y direction to obtain descriptors of pixel points;
and step 3: constructing a cross skeleton window by combining the color similarity and the distance constraint of the gray level image, selecting a branch with the minimum length from the four branches from the cross skeleton window, and taking the length of the branch with the minimum length as the minimum arm length of the cross skeleton window;
and 4, step 4: performing multiple Gaussian downsampling on an original stereo image to construct an image pyramid;
and 5: and sequentially updating the stereo images of the adjacent higher layers of the image pyramid by using the initial disparity map and the cross skeleton window and the stereo images of the lower layers of the image pyramid, processing the image pyramid layers upwards, and obtaining a final disparity map so as to obtain the depth of the stereo images.
2. The method for detecting the depth of the stereoscopic image by combining the cross skeleton window and the image pyramid as claimed in claim 1, wherein: the step 2 specifically comprises the following steps:
a gradient window is established for the X-direction and the Y-direction, respectively:
the gradient window in the X direction is 7X7, the gradient information in the X direction of 24 pixel points is selected in the gradient window, and the 24 pixel points comprise 22 pixel points which are uniformly selected in eight directions along the neighborhood of the central pixel of the gradient window and a central pixel point which is selected twice;
the gradient window in the Y direction is 5x5, the gradient information in the Y direction of 8 pixel points is selected in the gradient window, and the 8 pixel points comprise four-corner pixel points and four-side center pixel points of the gradient window;
and aiming at each pixel point, obtaining gradient information in the X direction and gradient information in the Y direction through gradient windows in the X direction and the Y direction, and forming a descriptor of the pixel point by the gradient information in the X direction and the gradient information in the Y direction of the pixel point.
3. The method for detecting the depth of the stereoscopic image by combining the cross skeleton window and the image pyramid as claimed in claim 1, wherein:
in the step 4, the image pyramid comprises an original stereo image and a stereo image after each Gaussian downsampling; the original stereo image has the highest resolution, and the highest layer of the image pyramid is formed; the resolution of the stereo image after each Gaussian down-sampling is reduced, the stereo image after each Gaussian down-sampling forms one layer of an image pyramid, and the resolution of the stereo image after the last Gaussian down-sampling is the lowest, so that the lowest layer of the image pyramid is formed.
4. The method for detecting the depth of the stereoscopic image by combining the cross skeleton window and the image pyramid as claimed in claim 1, wherein: the step 5 specifically comprises the following steps:
step 5.1:
aiming at the stereo image at the lowest layer of the image pyramid, processing according to the disparity map at the lowest layer of the image pyramid and a cross skeleton window by taking an initial disparity map as the disparity map at the lowest layer of the image pyramid to obtain a support point set and a confidence coefficient of the lowest layer of the image pyramid;
step 5.2:
for the stereo image of the current layer of the image pyramid, performing Delaunay triangulation by taking a support point with a confidence coefficient higher than a preset first confidence coefficient threshold value as a vertex of the triangulation to establish a plurality of triangular parallax planes;
in the current disparity map, performing linear interpolation on all pixels in the triangular disparity plane through the triangular disparity plane and three vertexes thereof to update the disparity map, simultaneously performing processing according to the updated disparity map and a cross skeleton window to obtain matching similarity and confidence coefficient of each pixel in the stereo image, forming a confidence coefficient map by the confidence coefficients of all the pixels, and obtaining a confidence coefficient map of the stereo image at the current layer of the image pyramid;
step 5.3:
obtaining a disparity threshold value prior image and a confidence value prior image with higher resolution by respective nearest neighbor interpolation processing according to a disparity map of a current layer of an image pyramid and a confidence value map of a stereo image, selecting pixel points with confidence values higher than a preset second confidence value threshold value in the confidence value prior image as supplementary support points of a higher layer of the image pyramid, and assigning and updating the disparity values of the supplementary support points in the disparity threshold value prior image to the updated disparity map as the disparity values of the pixel points corresponding to the supplementary support points, thereby obtaining the disparity map of the higher layer of the image pyramid;
step 5.4:
processing the stereo image of the higher layer of the image pyramid according to the disparity map of the higher layer of the image pyramid and the cross skeleton window to obtain a support point set of the higher layer of the image pyramid and a confidence coefficient of the support point set;
step 5.5: supplementing the supplementary support points obtained in the step 5.3 into the support point set obtained in the step 5.4 to obtain a union set;
step 5.6: and returning to the step 5.2, continuously repeating the step 5.2-the step 5.5 to perform iterative updating until a higher layer of the image pyramid is obtained, obtaining a disparity map of the higher layer of the image pyramid, obtaining a disparity map of the highest layer of the image pyramid layer as a final disparity map, and obtaining the depth of the stereoscopic image by using the final disparity map.
5. The method for detecting the depth of the stereoscopic image by combining the cross skeleton window and the image pyramid as claimed in claim 4, wherein: in the step 5.2, processing is performed according to the updated disparity map and the cross skeleton window to obtain the matching similarity and the confidence of each pixel in the stereo image, specifically:
5.2.1, taking one of the stereo images as a current image and the other image as a reference image, and traversing each pixel point in the current image as a current pixel point according to the following modes:
then, a pixel point corresponding to the parallax of the current pixel point in the reference image is found by using the updated parallax image, the matching similarity between the current pixel point and the pixel point corresponding to the parallax thereof is calculated by using the descriptor of the current pixel point and the pixel point corresponding to the parallax thereof and is used as the matching similarity of the current pixel point, and then the confidence of the current pixel point is calculated according to the matching similarity of the current pixel point;
and 5.2.2, interchanging the current image and the reference image, and repeating the step 5.2.1 to obtain the matching similarity and the confidence coefficient of each pixel of the two images in the stereo image.
6. The method for detecting the depth of the stereoscopic image by combining the cross skeleton window and the image pyramid as claimed in claim 4, wherein: the step 5.1 and the step 5.4 are both specifically as follows:
any one of the stereo images is used as a current image, and the other one is used as a reference image; uniformly sampling the current image along the horizontal and vertical coordinates by a fixed step length to obtain candidate support points, and traversing each candidate support point according to the following modes: in the current image, firstly, a matching window neighborhood is established by taking a candidate support point as a central pixel and taking the minimum arm length of a cross framework window as the radius of the neighborhood, and 9 key points are selected in the matching window neighborhood; then, a disparity map is utilized to find pixel points corresponding to the disparity of the key points in the reference image, descriptors of the key points and the pixel points corresponding to the disparity of the key points are utilized to calculate to obtain matching similarity between the key points and the pixel points corresponding to the disparity of the key points, the matching similarity is used as the matching similarity of the key points, and the matching similarity of all the key points is added to obtain the matching similarity of the candidate support points;
then, judging the matching similarity of the candidate support points:
if the matching similarity is larger than a preset similarity threshold, reserving the candidate support points as support points;
if the matching similarity is not greater than a preset similarity threshold, discarding the candidate support points, and not taking the candidate support points as the support points;
and traversing each candidate support point to obtain all support points, constructing a support point set by all the support points, and calculating the confidence of the support points according to the matching similarity of the support points.
7. The method for detecting the depth of the stereoscopic image by combining the cross skeleton window and the image pyramid as claimed in claim 4, wherein: in said step 5.1 and step 5.4, the central pixel (u) is calculated n ,v n ) When the matching similarity of the candidate support points is obtained, the matching similarities of all the keypoints are added to obtain the matching similarity of the candidate support points, and the matching similarity is specifically calculated according to the following formula:
E(u n ,v n ,d n )=∑||f(u (l) ,v (l) )-f(u (r) ,v (r) )|| 1 if, if
Figure FDA0003731021270000031
Wherein the content of the first and second substances,
Figure FDA0003731021270000032
Figure FDA0003731021270000041
wherein a is the minimum arm length of the cross skeleton window obtained in the step 3, n represents the pixel subscript of the image, and u represents the pixel subscript of the image n ,v n Respectively representing the abscissa, d, of the nth pixel of the image n Then the pixel point is traversed by the parallax value, E (u) n ,v n ,d n ) Representing a pixel (u) n ,v n ) Parallax is d n Matching cost of u (l) ,v (l) Respectively representing the horizontal and vertical coordinates of the selected key points in the current image, l representing the current image, u (r) ,v (r) Respectively representing the pixel abscissa and ordinate of the corresponding keypoint in the reference image, r representing the reference image, | | 1 Representing the L1 norm.
CN202210792911.7A 2022-07-05 2022-07-05 Stereoscopic image depth detection method combining cross skeleton window and image pyramid Active CN115018934B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210792911.7A CN115018934B (en) 2022-07-05 2022-07-05 Stereoscopic image depth detection method combining cross skeleton window and image pyramid

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210792911.7A CN115018934B (en) 2022-07-05 2022-07-05 Stereoscopic image depth detection method combining cross skeleton window and image pyramid

Publications (2)

Publication Number Publication Date
CN115018934A true CN115018934A (en) 2022-09-06
CN115018934B CN115018934B (en) 2024-05-31

Family

ID=83079485

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210792911.7A Active CN115018934B (en) 2022-07-05 2022-07-05 Stereoscopic image depth detection method combining cross skeleton window and image pyramid

Country Status (1)

Country Link
CN (1) CN115018934B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117132761A (en) * 2023-08-25 2023-11-28 京东方科技集团股份有限公司 Target detection method and device, storage medium and electronic equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106355570A (en) * 2016-10-21 2017-01-25 昆明理工大学 Binocular stereoscopic vision matching method combining depth characteristics
WO2018098891A1 (en) * 2016-11-30 2018-06-07 成都通甲优博科技有限责任公司 Stereo matching method and system
CN110070557A (en) * 2019-04-07 2019-07-30 西北工业大学 A kind of target identification and localization method based on edge feature detection
CN110378916A (en) * 2019-07-03 2019-10-25 浙江大学 A kind of TBM image based on multitask deep learning is slagged tap dividing method
CN112991420A (en) * 2021-03-16 2021-06-18 山东大学 Stereo matching feature extraction and post-processing method for disparity map
CN113362457A (en) * 2021-08-10 2021-09-07 成都信息工程大学 Stereoscopic vision measurement method and system based on speckle structured light
CN113421230A (en) * 2021-06-08 2021-09-21 浙江理工大学 Vehicle-mounted liquid crystal display light guide plate defect visual detection method based on target detection network
US20210390725A1 (en) * 2019-01-23 2021-12-16 Shanghaitech University Adaptive stereo matching optimization method and apparatus, device and storage medium
US20220198694A1 (en) * 2020-01-10 2022-06-23 Dalian University Of Technology Disparity estimation optimization method based on upsampling and exact rematching

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106355570A (en) * 2016-10-21 2017-01-25 昆明理工大学 Binocular stereoscopic vision matching method combining depth characteristics
WO2018098891A1 (en) * 2016-11-30 2018-06-07 成都通甲优博科技有限责任公司 Stereo matching method and system
US20210390725A1 (en) * 2019-01-23 2021-12-16 Shanghaitech University Adaptive stereo matching optimization method and apparatus, device and storage medium
CN110070557A (en) * 2019-04-07 2019-07-30 西北工业大学 A kind of target identification and localization method based on edge feature detection
CN110378916A (en) * 2019-07-03 2019-10-25 浙江大学 A kind of TBM image based on multitask deep learning is slagged tap dividing method
US20220198694A1 (en) * 2020-01-10 2022-06-23 Dalian University Of Technology Disparity estimation optimization method based on upsampling and exact rematching
CN112991420A (en) * 2021-03-16 2021-06-18 山东大学 Stereo matching feature extraction and post-processing method for disparity map
CN113421230A (en) * 2021-06-08 2021-09-21 浙江理工大学 Vehicle-mounted liquid crystal display light guide plate defect visual detection method based on target detection network
CN113362457A (en) * 2021-08-10 2021-09-07 成都信息工程大学 Stereoscopic vision measurement method and system based on speckle structured light

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
CHENGLONG XU等: "Accurate and Efficient Stereo Matching by Log-Angle and Pyramid-Tree", 《IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY》, 15 December 2020 (2020-12-15) *
YUE XIA等: "Efficient Large Scale Stereo Matching based on Cross-Scale", 《2022 17TH INTERNATIONAL CONFERENCE ON CONTROL, AUTOMATION, ROBOTICS AND VISION (ICARCV)》, 10 January 2023 (2023-01-10) *
唐灿;唐亮贵;刘波;: "图像特征检测与匹配方法研究综述", 南京信息工程大学学报(自然科学版), no. 03, 28 May 2020 (2020-05-28) *
陈卉;胡立坤;黄钰雯;: "采用高斯混合模型及树结构的立体匹配算法", 计算机工程与应用, no. 20, 15 October 2017 (2017-10-15) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117132761A (en) * 2023-08-25 2023-11-28 京东方科技集团股份有限公司 Target detection method and device, storage medium and electronic equipment

Also Published As

Publication number Publication date
CN115018934B (en) 2024-05-31

Similar Documents

Publication Publication Date Title
CN108776989B (en) Low-texture planar scene reconstruction method based on sparse SLAM framework
CN110211169B (en) Reconstruction method of narrow baseline parallax based on multi-scale super-pixel and phase correlation
CN112435262A (en) Dynamic environment information detection method based on semantic segmentation network and multi-view geometry
CN108876861B (en) Stereo matching method for extraterrestrial celestial body patrolling device
CN111340922A (en) Positioning and mapping method and electronic equipment
CN113592026A (en) Binocular vision stereo matching method based on void volume and cascade cost volume
EP3293700A1 (en) 3d reconstruction for vehicle
CN113963117B (en) Multi-view three-dimensional reconstruction method and device based on variable convolution depth network
CN102903111B (en) Large area based on Iamge Segmentation low texture area Stereo Matching Algorithm
CN111553845A (en) Rapid image splicing method based on optimized three-dimensional reconstruction
CN115018934B (en) Stereoscopic image depth detection method combining cross skeleton window and image pyramid
CN113887624A (en) Improved feature stereo matching method based on binocular vision
CN113313740B (en) Disparity map and surface normal vector joint learning method based on plane continuity
CN111429571A (en) Rapid stereo matching method based on spatio-temporal image information joint correlation
CN107122782B (en) Balanced semi-dense stereo matching method
CN111179327B (en) Depth map calculation method
Le Besnerais et al. Dense height map estimation from oblique aerial image sequences
CN114998412B (en) Shadow region parallax calculation method and system based on depth network and binocular vision
Nefian et al. A Bayesian formulation for sub-pixel refinement in stereo orbital imagery
CN112348859A (en) Asymptotic global matching binocular parallax acquisition method and system
Mondal et al. Performance review of the stereo matching algorithms
Unger et al. Efficient stereo matching for moving cameras and decalibrated rigs
CN110414337B (en) Target attitude detection system and detection method thereof
Sanfourche et al. Height estimation using aerial side looking image sequences
WO2011080669A1 (en) System and method for reconstruction of range images from multiple two-dimensional images using a range based variational method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant