US20160048970A1 - Multi-resolution depth estimation using modified census transform for advanced driver assistance systems - Google Patents
Multi-resolution depth estimation using modified census transform for advanced driver assistance systems Download PDFInfo
- Publication number
- US20160048970A1 US20160048970A1 US14/827,897 US201514827897A US2016048970A1 US 20160048970 A1 US20160048970 A1 US 20160048970A1 US 201514827897 A US201514827897 A US 201514827897A US 2016048970 A1 US2016048970 A1 US 2016048970A1
- Authority
- US
- United States
- Prior art keywords
- pixels
- depth
- pixel
- pair
- cost
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 claims abstract description 77
- 230000003044 adaptive effect Effects 0.000 claims abstract description 10
- 230000004931 aggregating effect Effects 0.000 claims description 11
- 238000001914 filtration Methods 0.000 claims description 3
- 238000007670 refining Methods 0.000 claims 4
- 238000004422 calculation algorithm Methods 0.000 abstract description 23
- 238000005457 optimization Methods 0.000 abstract description 9
- 238000004220 aggregation Methods 0.000 abstract description 8
- 230000002776 aggregation Effects 0.000 abstract description 8
- 230000006870 function Effects 0.000 description 15
- 239000011159 matrix material Substances 0.000 description 12
- 238000005286 illumination Methods 0.000 description 7
- 238000005070 sampling Methods 0.000 description 6
- 230000008901 benefit Effects 0.000 description 5
- 238000004364 calculation method Methods 0.000 description 5
- 238000007792 addition Methods 0.000 description 4
- 238000013507 mapping Methods 0.000 description 4
- 238000001514 detection method Methods 0.000 description 3
- 238000004088 simulation Methods 0.000 description 3
- YMHOBZXQZVXHBM-UHFFFAOYSA-N 2,5-dimethoxy-4-bromophenethylamine Chemical compound COC1=CC(CCN)=C(OC)C=C1Br YMHOBZXQZVXHBM-UHFFFAOYSA-N 0.000 description 2
- 241000545067 Venus Species 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000011156 evaluation Methods 0.000 description 2
- 238000003909 pattern recognition Methods 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000012952 Resampling Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000003086 colorant Substances 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000010191 image analysis Methods 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 230000035945 sensitivity Effects 0.000 description 1
- 238000010008 shearing Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/50—Depth or shape recovery
- G06T7/55—Depth or shape recovery from multiple images
- G06T7/593—Depth or shape recovery from multiple images from stereo images
-
- G06T7/0075—
-
- G06T5/003—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
- G06T5/77—Retouching; Inpainting; Scratch removal
-
- G06T7/0085—
-
- H04N13/0271—
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10004—Still image; Photographic image
- G06T2207/10012—Stereo images
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
- G06T2207/10021—Stereoscopic video; Stereoscopic image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10028—Range image; Depth image; 3D point clouds
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20024—Filtering details
- G06T2207/20028—Bilateral filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20172—Image enhancement details
- G06T2207/20192—Edge enhancement; Edge preservation
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N13/00—Stereoscopic video systems; Multi-view video systems; Details thereof
- H04N2013/0074—Stereoscopic image analysis
- H04N2013/0081—Depth or disparity estimation from stereoscopic image signals
Definitions
- This invention relates generally to determining or providing depth in or for three-dimensional images (3D) and, more particularly, to depth estimation for computer vision applications such as 3D scene reconstruction and/or stereo-based object detection.
- Depth is considered as one of the most important cues in perceiving the three dimensional characteristics of objects in a scene captured by cameras.
- the value which represents the distance between each object in the scene to the focal point of the camera is called depth and an image storing these values for all the pixels is referred to as a depth map.
- Depth maps are essential in a variety of applications such as view synthesis, robot vision, 3D scene reconstruction, interactions between humans and computers, and advanced driver assistance systems. The performance of these mentioned applications is highly dependent on the quality and accuracy of the depth map. Thus, generating an accurate depth map is of substantial importance.
- the main objective of depth estimation methods is to generate a per-pixel depth map of a scene based on two or more reference images.
- the reference images are captured by a stereo camera system in which the cameras are parallel to each other or are set with a slight angle.
- Depth maps can be estimated by using either stereo matching techniques or depth sensors. With the advent of depth sensors, fusion camera systems have been developed which directly measure the depth in real-time. The measurement of depth in such sensors is usually performed by either using time-of-flight (TOF) systems or infrared pattern deformation. Depth maps acquired by the depth sensors are usually noisy and suffer from poorly generated depth boundaries.
- TOF time-of-flight
- the stereo matching techniques can be classified into two groups, namely local and global techniques.
- the local methods generally consider a finite neighboring window to estimate the disparity.
- the window size plays an important role in such methods.
- the local methods are fast and computationally simple but, they are highly error-prone and the estimated depth maps are usually inaccurate.
- global techniques an energy function is globally optimized to find the disparity.
- Global depth estimation techniques can generate high-quality depth maps.
- Most popular techniques in this category include belief propagation, graph cuts and dynamic programming. However, due to the computational complexity of such algorithms, it is not feasible to exploit them in real-time applications. Combining the concepts of local and global stereo matching methods was first introduced as semi-global matching (SGM).
- SGM semi-global matching
- SGM performs pixel-wise matching based on mutual information and the approximation of a global smoothness constraint and a good trade-off between accuracy and runtime is obtained. However, it achieves limited performance under illumination changes. Despite the advantages of different depth estimation techniques, there are several problems in the generated depth maps. The existence of holes and sensitivity to noise and illumination changes are the main significant problems.
- a general object of the invention is to provide an improved method for depth estimation.
- Some embodiments of the depth estimation method of this invention are based on the non-parametric Census transform and semi-global optimization.
- the method of this invention improves the quality of estimated depth maps to fulfill the demands of real-time applications such as 3D scene reconstruction and/or stereo-based pedestrian detection.
- the method according to some embodiments of this invention can be described as a multi-resolution census-based depth estimation scheme that performs novel depth refinement techniques to enhance the quality of the output.
- the method comprises three major steps: cost calculation, optimization, and refinement.
- the general object of the invention can be attained, at least in part, through a method, implemented by a computer system, of determining depth in images.
- the method includes: generating an image mask indicating smooth areas and object edges for each of a pair of stereo images; calculating a computation cost for a plurality of pixels within the image mask, such as by using Census transform to map a pixel block into a bit stream; and matching pixels between the pair of stereo images and optimizing a value for the matching pixels to obtain an estimated depth map.
- the image resolution is first reduced for each of the pair of stereo images.
- the invention further includes a method of determining depth in images including: reducing image resolution in each of a pair of stereo images; creating an image mask as a function of pixel curvatures for each of the pair of stereo images, such as by distinguishing between the smooth areas and the object edges as a function of pixel curvature; determining a computational cost value for a plurality of pixels using a Census transform to map a local neighborhood surrounding each of the plurality of pixels to a bit stream; aggregating cost values for each of the plurality of pixels; and matching pixels between the pair of stereo images and optimizing a value for the matching pixels to obtain an estimated depth map.
- the proposed depth estimation technique exploits the non-parametric Census transform to calculate the cost function.
- the non-parametric Census transform maps the surrounding block of a pixel into a bit string.
- a simple Census window pattern is used for the smooth regions which leads to less computation, and a more complex pattern is applied to the non-uniform regions which usually occur along the edges and object boundaries.
- the depth refinement algorithm Compared to the existing hole filling techniques, the depth refinement algorithm according to some embodiments of this invention only considers the background pixels to fill the holes. The curvatures of pixels are considered and a trilateral filter enhances the quality of the estimated depth map. In order to avoid generating new depth values, the algorithm can map them to the nearest existing depth value.
- the method chooses the ones that belong to the background by comparing the depth values.
- the holes are then filled by a weighted interpolation on the selected correct pixels.
- the weights are calculated using Gaussian distribution based on the distance to the current pixel. Therefore, the farther pixels would have less impact on the calculated depth value.
- the quality of the hole filled depth map is improved by applying the trilateral filter which makes the boundaries sharper and corrects the misaligned regions.
- the proposed filter comprises three terms: depth data, texture data, and the curvature.
- the method also desirably ensures that no new depth values are introduced during the up-sampling. Therefore, when the depth map is filtered using the trilateral filter, the new depth values are adjusted by mapping them to the nearest depth value which already exists in the depth map.
- the method and system of this invention are desirably automatically executed or implemented on and/or through a computing platform.
- Such computing platforms generally include one or more processors for executing the method steps stored as coded software instructions, at least one recordable medium for storing the software and/or video data received or produced by method, an input/output (I/O) device, and a network interface capable of connecting either directly or indirectly to a video camera and/or the Internet or other network.
- FIG. 1 is a block diagram of a method according to one embodiment of this invention.
- FIG. 2 shows an image and a generated mask, according to one embodiment of this invention.
- FIG. 3 is a census window pattern P 1 for uniform regions, according to one embodiment of this invention.
- FIG. 4 is a census window patterns P 2 for non-uniform regions, according to one embodiment of this invention.
- FIG. 5 illustrates cost aggregation according to one embodiment of this invention.
- FIG. 6 is an algorithm according to one embodiment of this invention.
- FIG. 7 shows images and the corresponding depth maps generated by the invention.
- FIG. 8 shows a color image, the corresponding estimated depth map, and a refined depth map.
- FIG. 9 illustrates a hole filling comparison of filling all the surrounding correct pixels, and filling only considering background depth pixels.
- FIG. 10 is a histogram of an exemplary depth map before refinement and after refinement.
- the present invention provides a method for generating a depth map from a stereo image pair. Generating an accurate depth map is a vital factor in many applications.
- a novel depth estimation and refinement method is proposed.
- the depth estimation method is a multi-resolution census-based technique which benefits from a semi-global optimization technique.
- a Census transform is used that is robust against illumination changes. Therefore, the algorithm is applicable under different illumination situations.
- the proposed algorithm is performed in a multi-resolution basis in order to achieve higher frame rates.
- the pixel gradients are used to distinguish between smooth and complex parts of the image and based on different regions, and the cost calculation is performed separately.
- FIG. 1 is a block diagram of a method according to embodiments of this invention. As shown in FIG. 1 , a stereo image pair is used as the reference to generate a depth map. In embodiments of this invention, the method includes steps of: down-sampling and mask generation, cost calculation and aggregation, semi-global optimization, and depth refinement.
- Step 20 includes down-sampling and mask generation.
- Pixel belonging to the same object should have the same depth value. However, this is not always the case, due to erroneous mismatches, change of illumination, etc.
- the curvatures of the pixels in the given color images are used to distinguish between the smooth area and sharp edges of the objects in the scene.
- a first step is to down-sample the stereo color images by a predetermined factor. Both left and right images of the stereo image pair are down-sampled. In one embodiment, the images' resolution is reduced to 1 ⁇ 4 of the original image size. For every 4 ⁇ 4 window, only one pixel is used, which is desirably the average of all 16 pixels.
- a mask is then obtained, such as shown in FIG. 2 , which indicates the smoothness of different regions in the reference image.
- the curvatures of the pixels in the reference image pair are used to create the mask. Curvature can be used to distinguish between the smooth area and sharp edges of the objects in the scene.
- the curvature is calculated using the first and second order gradients of each pixel, given by:
- u x and u xx are the first and second order gradients, respectively.
- Subscripts indicate the direction of gradient.
- the Prewitt kernel is used to find the gradient.
- the values are aggregated over a 5 ⁇ 5 window and stored in a curvature map.
- a binary mask is generated by Equations (2) using the curvature map.
- Equations (2) M(x, y) is the mask value at location (x, y).
- the summation of curvature values over a 5 ⁇ 5 window is referred to as curv agg .
- T 1 is the mean value of the curvature map.
- Step 22 of FIG. 1 includes the calculation of a computation cost for a plurality of pixels within the image mask.
- the similarity of image locations are measured by defining a matching cost. Normally, a matching cost is calculated at each pixel for all disparities under certain considerations.
- a computational cost value for pixels is determined using a Census transform to map a local neighborhood surrounding each of the plurality of pixels to a bit stream.
- the Census transform is calculated for both left and right images.
- a simple Census window pattern is used for the smooth regions to reduce the computational complexity and a more complex pattern is used for the non-uniform regions which usually contain edges and object boundaries.
- Adaptive Census window patterns according to one embodiment of this invention are shown in FIG. 3 (8 pixels) and FIG. 4 (20 pixels), where the selected positions are denoted by black pixels.
- the Census transform is calculated using:
- N is the neighborhood of the current pixel within the Census transform
- ⁇ is the step function, and is bit-wise concatenation.
- the step function is defined by:
- the binary mask generated in the previous step is used to decide which pattern to use. If the number of mask pixels in the neighborhood of the reference pixel ⁇ is less than a predefined threshold T 2 , the first pattern is used.
- the decision criterion is made as follows.
- the cost function is calculated by finding the Hamming distance between the obtained bit streams of the left and right reference images using:
- BS is the calculated bit stream
- d is the disparity
- d H is the Hamming distance function
- subscripts l and r refer to the left and right reference images, respectively.
- each pixel's cost over a support region is aggregated in step 24 of FIG. 1 .
- the main goal of cost aggregation is to reduce the matching ambiguities and noise present in the initial cost.
- a modified cross-based cost aggregation is used based upon the effective assumption that neighboring pixels with similar colors and spatial characteristics usually belong to the same object and should have similar depth values.
- the proposed cost aggregation method includes: creating a cross shape, and aggregating cost over the created cross.
- FIG. 5 illustrates this cost aggregation.
- an adaptive cross is constructed for each pixel.
- the arm length of the cross varies for different pixels based on some criterion.
- the endpoint of the arm is defined as pl when one of the three following rules is not met: (1) the color difference between p and pl should be less than a predefined threshold; (2) the spatial distance between p and pl should be less than a preset maximum length; or (3) the curvature values of p and pl in the curvature map should not exceed a threshold.
- the abovementioned criteria are defined by:
- ⁇ 1 and ⁇ 2 are predefined thresholds.
- the thresholds have the main impact on the shape of the cross. Large thresholds are usually set for textureless regions in order to include adequate intensity variation.
- the next step is aggregating the cost values over the created cross.
- the intermediate cost is obtained by summing the cost values horizontally and the final cost is calculated by adding all the intermediate data vertically.
- the process is illustrated by FIG. 5 .
- the method includes, in step 26 of FIG. 1 , matching pixels between the pair of stereo images and optimizing a value for the matching pixels to obtain an estimated depth map.
- the best match for each pixel is obtained and the disparity between the matches is calculated.
- the optimum disparity is calculated by minimizing the energy function.
- the energy function consists of three terms.
- the first term is the matching cost from the previous step which is based on the Census transform.
- the other two terms are smoothness energy terms.
- two penalty terms are added to the matching cost function to take into account slight and abrupt changes in the disparity of neighboring pixels.
- An 8-direction optimization path is used to reach the optimum value.
- d p and d q are the depth values for pixels p and q.
- the problem can be illustrated as finding the disparity which minimizes the energy function obtained in the previous step.
- the pixels can be converted to depth values once the disparity for all the pixels is obtained.
- the focus step 28 in FIG. 1 is the refinement of the estimated depth map and this is broken down into two major steps: (1) filling the holes in the estimated depth map, and (2) sharpening the edges and object boundaries.
- the estimated depth map from the previous steps has some black regions due to the occlusion and mismatches which need to be filled. These hole regions usually belong to the background which cannot be seen from the other reference view. Hence the algorithm fails to estimate a depth value for those specific regions. By considering the boundary pixels in the vicinity of the hole region which have non-zero depth values, the ones that belong to the background can be solely chosen by comparing the depth values. The holes are then filled by a weighted interpolation on the selected correct pixels using:
- d bg is the background depth value and w is the weighting factor based on the distance from the background depth pixel to the current hole.
- the weights are calculated using Gaussian distribution based on the distance to the current pixel using Equation (11). Therefore, the farther pixels would have less impact on the calculated depth value.
- the up-sampling is performed by applying a trilateral filter which makes the boundaries sharper and corrects the misaligned regions using Equation (10).
- the designed filter consists of three terms: depth data, texture data, and the curvature.
- d ⁇ ( p ) 1 w ⁇ ⁇ d ⁇ ( q ) ⁇ f dep ⁇ ( ⁇ d p - d q ⁇ ) ⁇ f tex ⁇ ( ⁇ C p - C q ⁇ ) ⁇ f curv ⁇ ( ⁇ k p - k q ⁇ ) , ( 10 )
- Equation (1) d is the disparity value, C and k are the color and curvature values, respectively, and f is the Gaussian distribution with standard deviation a defined by Equation (11).
- the new depth values can be adjusted by mapping them to the nearest depth value which already exists in the depth map.
- if-else conditional statements have to be used to decide the candidate for depth value while filtering using:
- the Look-Up Table (LUT) as shown in FIG. 6 is an optimization according to one embodiment of this invention that maps each value in the range of 0-255 to a unique depth value.
- the resulting enhanced depth map is sent in step 30 for further processing in video-based pedestrian detection systems, and/or other three-dimensional video applications.
- the depth map estimation method performs stereo matching without explicit image rectification.
- the fundamental matrix is estimated by using Random Sample Consensus and an 8-point algorithm. Then, an epipolar line equation obtained by projective mapping is derived and the search for point correspondence is performed along the epipolar line. Simulation results show that the method produces accurate depth maps for uncalibrated stereo images with reduced computational complexity.
- the method computes disparities in pair of non-rectified images without explicit image rectification.
- a Radom Sample Consensus (RANSAC) algorithm (M. A. Fischler et al., “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24 (6), pp 381-395, June 1981) is used to calculate a fundamental matrix and the Census transform is used as the matching criterion to obtain disparities.
- the method includes three steps: estimating the fundamental matrix, updating epipolar search lines, and computing disparities.
- the fundamental matrix is the algebraic representation of epipolar geometry. Given a pair of stereo images, to each point in one image, there exists a corresponding epipolar line in the other image. Any point in the second image matching the point in the first image must line on the epipolar line and a projective mapping from a point in one image to its corresponding epipolar line in the other image is represented by the fundamental matrix.
- RANSAC was used to remove the effect of outliers and apply an 8-point algorithm.
- matching points that have larger Euclidean distance were chosen. This modification can increase the probability that the coplanar matching points which lie on the same object are not chosen to estimate the fundamental matrix.
- a line equation is computed.
- the epipolar line is used as the search line.
- the rectification step can be bypassed, which is time consuming and in many cases causes shearing and resampling effect.
- an implementation optimization is applied which can reduce the computational complexity in an M ⁇ N image from 9MN multiplications plus 6MN additions to only 3(M+N) additions.
- the epipolar line equation l r can be written as:
- c 1 , c 2 and c 3 are the line equation coefficients for the pixel (x,y).
- the right side image epipolar line coefficients vector can be written as:
- the right image epipolar line coefficients vector can be initialized by the third column vector of the fundamental matrix.
- the second column vector of the fundamental matrix can be considered while switching from the corresponding epipolar lines of two consecutive points. Stepping from one row to the next one, a single vector addition of the fundamental matrix first column vector and the epipolar line coefficients are applied.
- the start point of the searching strategy is the projection of the pixel in the same coordinate in the right side image onto the epipolar line. This point has the least distance to the reference pixel compared to the other points on the line.
- the maximum disparity range is defined by the user and varies based on the image resolution.
- the search direction is on the epipolar line.
- the matching metric used for cost calculation is the non-parametric Census transform due to its robustness to illumination changes. Census transform maps a block surrounding the reference pixel into a bit string.
- the cost function is calculated by finding the Hamming distance of the obtained bit streams.
- the final matching cost function is used for optimization.
- the optimum disparity is the value which minimizes the cost function.
- Table I indicates the error statistic of percentage of bad pixels with respect to the provided ground truth depth map.
- the percentage of bad pixels evaluation criterion is defined by:
- d G and d GT are the generated and ground truth depth values, respectively, ⁇ is the error tolerance, and N is the total number of pixels of the image.
- Table II shows the processing time of the proposed algorithm by running in the Middlebury dataset using C programming on CPU.
- FIG. 8 shows the result of depth estimation and refinement for a sample left side image of KITTI dataset.
- FIG. 9 compares the result of hole filling where in FIG. 9( b ) only the background depth pixels are used.
- FIG. 9 only shows the hole filling result without applying edge sharpening.
- the final result of depth refinement is shown in FIG. 8( c ).
- FIG. 10 shows the histograms of the depth map of the reference color image in FIG. 8 ( a ) before and after refinement. The unique depth values have not changed after refinement process.
- the invention provides a novel depth estimation algorithm has been proposed.
- the proposed method is based on adaptive window patterns of Census transform which make it robust against illumination changes and suitable for applications like advanced driver assistance systems.
- a modified cross-based cost aggregation technique is proposed that generated cross-shape support regions for each pixel individually.
- the proposed depth refinement technique aims at filling the holes and sharpening the object boundaries.
- the background depth pixels are used in order to fill the holes of the estimated depth map and the proposed trilateral filter is used to enhance the quality of the depth map. Simulation results indicate that the proposed method fulfills the aims by improving the quality of the generated depth maps and reducing the computational complexity.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Processing (AREA)
- Image Analysis (AREA)
Abstract
Description
- This application claims the benefit of U.S. Provisional Patent Application, Ser. No. 62/037,987, filed on 15 Aug. 2014. The co-pending Provisional Application is hereby incorporated by reference herein in its entirety and is made a part hereof, including but not limited to those portions which specifically appear hereinafter.
- This invention relates generally to determining or providing depth in or for three-dimensional images (3D) and, more particularly, to depth estimation for computer vision applications such as 3D scene reconstruction and/or stereo-based object detection.
- Depth is considered as one of the most important cues in perceiving the three dimensional characteristics of objects in a scene captured by cameras. In computer vision, the value which represents the distance between each object in the scene to the focal point of the camera is called depth and an image storing these values for all the pixels is referred to as a depth map. Depth maps are essential in a variety of applications such as view synthesis, robot vision, 3D scene reconstruction, interactions between humans and computers, and advanced driver assistance systems. The performance of these mentioned applications is highly dependent on the quality and accuracy of the depth map. Thus, generating an accurate depth map is of substantial importance. The main objective of depth estimation methods is to generate a per-pixel depth map of a scene based on two or more reference images. The reference images are captured by a stereo camera system in which the cameras are parallel to each other or are set with a slight angle.
- Depth maps can be estimated by using either stereo matching techniques or depth sensors. With the advent of depth sensors, fusion camera systems have been developed which directly measure the depth in real-time. The measurement of depth in such sensors is usually performed by either using time-of-flight (TOF) systems or infrared pattern deformation. Depth maps acquired by the depth sensors are usually noisy and suffer from poorly generated depth boundaries.
- Over the past several years, stereo-based methods which estimate the depth map algorithmically have attracted a lot of attention in the research community. Computation of the shift between the two reference images, also known as disparity, is a main key to determine the depth values in stereo matching techniques.
- The stereo matching techniques can be classified into two groups, namely local and global techniques. The local methods generally consider a finite neighboring window to estimate the disparity. Thus, the window size plays an important role in such methods. The local methods are fast and computationally simple but, they are highly error-prone and the estimated depth maps are usually inaccurate. On the other hand, in global techniques an energy function is globally optimized to find the disparity. Global depth estimation techniques can generate high-quality depth maps. Most popular techniques in this category include belief propagation, graph cuts and dynamic programming. However, due to the computational complexity of such algorithms, it is not feasible to exploit them in real-time applications. Combining the concepts of local and global stereo matching methods was first introduced as semi-global matching (SGM). SGM performs pixel-wise matching based on mutual information and the approximation of a global smoothness constraint and a good trade-off between accuracy and runtime is obtained. However, it achieves limited performance under illumination changes. Despite the advantages of different depth estimation techniques, there are several problems in the generated depth maps. The existence of holes and sensitivity to noise and illumination changes are the main significant problems.
- Thus there is a continuing need for improved depth estimation techniques.
- A general object of the invention is to provide an improved method for depth estimation. Some embodiments of the depth estimation method of this invention are based on the non-parametric Census transform and semi-global optimization. The method of this invention improves the quality of estimated depth maps to fulfill the demands of real-time applications such as 3D scene reconstruction and/or stereo-based pedestrian detection.
- The method according to some embodiments of this invention can be described as a multi-resolution census-based depth estimation scheme that performs novel depth refinement techniques to enhance the quality of the output. The method comprises three major steps: cost calculation, optimization, and refinement.
- The general object of the invention can be attained, at least in part, through a method, implemented by a computer system, of determining depth in images. The method includes: generating an image mask indicating smooth areas and object edges for each of a pair of stereo images; calculating a computation cost for a plurality of pixels within the image mask, such as by using Census transform to map a pixel block into a bit stream; and matching pixels between the pair of stereo images and optimizing a value for the matching pixels to obtain an estimated depth map. In embodiments of this invention, to generate the image mask, the image resolution is first reduced for each of the pair of stereo images.
- The invention further includes a method of determining depth in images including: reducing image resolution in each of a pair of stereo images; creating an image mask as a function of pixel curvatures for each of the pair of stereo images, such as by distinguishing between the smooth areas and the object edges as a function of pixel curvature; determining a computational cost value for a plurality of pixels using a Census transform to map a local neighborhood surrounding each of the plurality of pixels to a bit stream; aggregating cost values for each of the plurality of pixels; and matching pixels between the pair of stereo images and optimizing a value for the matching pixels to obtain an estimated depth map.
- The proposed depth estimation technique according to some embodiments of this invention exploits the non-parametric Census transform to calculate the cost function. The non-parametric Census transform maps the surrounding block of a pixel into a bit string. A simple Census window pattern is used for the smooth regions which leads to less computation, and a more complex pattern is applied to the non-uniform regions which usually occur along the edges and object boundaries.
- Compared to the existing hole filling techniques, the depth refinement algorithm according to some embodiments of this invention only considers the background pixels to fill the holes. The curvatures of pixels are considered and a trilateral filter enhances the quality of the estimated depth map. In order to avoid generating new depth values, the algorithm can map them to the nearest existing depth value.
- By considering the boundary pixels in the vicinity of a hole region which have non-zero depth values, the method chooses the ones that belong to the background by comparing the depth values. The holes are then filled by a weighted interpolation on the selected correct pixels. The weights are calculated using Gaussian distribution based on the distance to the current pixel. Therefore, the farther pixels would have less impact on the calculated depth value. The quality of the hole filled depth map is improved by applying the trilateral filter which makes the boundaries sharper and corrects the misaligned regions. The proposed filter comprises three terms: depth data, texture data, and the curvature. The method also desirably ensures that no new depth values are introduced during the up-sampling. Therefore, when the depth map is filtered using the trilateral filter, the new depth values are adjusted by mapping them to the nearest depth value which already exists in the depth map.
- The method and system of this invention are desirably automatically executed or implemented on and/or through a computing platform. Such computing platforms generally include one or more processors for executing the method steps stored as coded software instructions, at least one recordable medium for storing the software and/or video data received or produced by method, an input/output (I/O) device, and a network interface capable of connecting either directly or indirectly to a video camera and/or the Internet or other network.
- Other objects and advantages will be apparent to those skilled in the art from the following detailed description taken in conjunction with the appended claims and drawings.
-
FIG. 1 is a block diagram of a method according to one embodiment of this invention. -
FIG. 2 shows an image and a generated mask, according to one embodiment of this invention. -
FIG. 3 is a census window pattern P1 for uniform regions, according to one embodiment of this invention. -
FIG. 4 is a census window patterns P2 for non-uniform regions, according to one embodiment of this invention. -
FIG. 5 illustrates cost aggregation according to one embodiment of this invention. -
FIG. 6 is an algorithm according to one embodiment of this invention. -
FIG. 7 shows images and the corresponding depth maps generated by the invention. -
FIG. 8 shows a color image, the corresponding estimated depth map, and a refined depth map. -
FIG. 9 illustrates a hole filling comparison of filling all the surrounding correct pixels, and filling only considering background depth pixels. -
FIG. 10 is a histogram of an exemplary depth map before refinement and after refinement. - The present invention provides a method for generating a depth map from a stereo image pair. Generating an accurate depth map is a vital factor in many applications. In some embodiments of this invention, a novel depth estimation and refinement method is proposed. The depth estimation method is a multi-resolution census-based technique which benefits from a semi-global optimization technique. In contrast to the original semi-global matching algorithm, a Census transform is used that is robust against illumination changes. Therefore, the algorithm is applicable under different illumination situations. The proposed algorithm is performed in a multi-resolution basis in order to achieve higher frame rates. The pixel gradients are used to distinguish between smooth and complex parts of the image and based on different regions, and the cost calculation is performed separately.
-
FIG. 1 is a block diagram of a method according to embodiments of this invention. As shown inFIG. 1 , a stereo image pair is used as the reference to generate a depth map. In embodiments of this invention, the method includes steps of: down-sampling and mask generation, cost calculation and aggregation, semi-global optimization, and depth refinement. -
Step 20 includes down-sampling and mask generation. In an ideal case, pixels belonging to the same object should have the same depth value. However, this is not always the case, due to erroneous mismatches, change of illumination, etc. In embodiments of this invention, the curvatures of the pixels in the given color images are used to distinguish between the smooth area and sharp edges of the objects in the scene. In order to reduce computational complexity of the algorithm, a first step is to down-sample the stereo color images by a predetermined factor. Both left and right images of the stereo image pair are down-sampled. In one embodiment, the images' resolution is reduced to ¼ of the original image size. For every 4×4 window, only one pixel is used, which is desirably the average of all 16 pixels. - A mask is then obtained, such as shown in
FIG. 2 , which indicates the smoothness of different regions in the reference image. In embodiments of this invention, the curvatures of the pixels in the reference image pair are used to create the mask. Curvature can be used to distinguish between the smooth area and sharp edges of the objects in the scene. - In embodiments of this invention, the curvature is calculated using the first and second order gradients of each pixel, given by:
-
- where ux and uxx are the first and second order gradients, respectively. Subscripts indicate the direction of gradient. The Prewitt kernel is used to find the gradient. After computing the curvature, the values are aggregated over a 5×5 window and stored in a curvature map. A binary mask is generated by Equations (2) using the curvature map.
-
- In Equations (2), M(x, y) is the mask value at location (x, y). The summation of curvature values over a 5×5 window is referred to as curvagg. T1 is the mean value of the curvature map. When the aggregated curvature of a pixel is less than a threshold, a zero value is assigned to the mask. An example of mask generation from a color image is shown in
FIG. 2 . -
Step 22 ofFIG. 1 includes the calculation of a computation cost for a plurality of pixels within the image mask. The similarity of image locations are measured by defining a matching cost. Normally, a matching cost is calculated at each pixel for all disparities under certain considerations. - In embodiments of this invention, a computational cost value for pixels is determined using a Census transform to map a local neighborhood surrounding each of the plurality of pixels to a bit stream. The Census transform is calculated for both left and right images. In embodiments of this invention, a simple Census window pattern is used for the smooth regions to reduce the computational complexity and a more complex pattern is used for the non-uniform regions which usually contain edges and object boundaries. Adaptive Census window patterns according to one embodiment of this invention are shown in
FIG. 3 (8 pixels) andFIG. 4 (20 pixels), where the selected positions are denoted by black pixels. - For the pixel Ic (x, y) , the Census transform is calculated using:
-
- The binary mask generated in the previous step is used to decide which pattern to use. If the number of mask pixels in the neighborhood of the reference pixel α is less than a predefined threshold T2, the first pattern is used. The decision criterion is made as follows.
-
- The cost function is calculated by finding the Hamming distance between the obtained bit streams of the left and right reference images using:
-
C((x, y), d)=d H(BS l(x i , y i), BS r(x i , y i)), (6) - where BS is the calculated bit stream, d is the disparity, dH is the Hamming distance function, and the subscripts l and r refer to the left and right reference images, respectively.
- Since the cost is calculated for each pixel, each pixel's cost over a support region is aggregated in
step 24 ofFIG. 1 . The main goal of cost aggregation is to reduce the matching ambiguities and noise present in the initial cost. In embodiments of this invention, a modified cross-based cost aggregation is used based upon the effective assumption that neighboring pixels with similar colors and spatial characteristics usually belong to the same object and should have similar depth values. - In embodiments of this invention, the proposed cost aggregation method includes: creating a cross shape, and aggregating cost over the created cross.
FIG. 5 illustrates this cost aggregation. For the first step, an adaptive cross is constructed for each pixel. The arm length of the cross varies for different pixels based on some criterion. Given a pixel p, the endpoint of the arm is defined as pl when one of the three following rules is not met: (1) the color difference between p and pl should be less than a predefined threshold; (2) the spatial distance between p and pl should be less than a preset maximum length; or (3) the curvature values of p and pl in the curvature map should not exceed a threshold. The abovementioned criteria are defined by: -
|I c(p)−I c(p l)|≦τ1, -
|d E(p)−d E(p l)≦L, -
|curv(p)−curv(p l)|<τ2 (7) - where dE is the Euclidean distance, L is the maximum length, and τ1 and τ2 are predefined thresholds. The thresholds have the main impact on the shape of the cross. Large thresholds are usually set for textureless regions in order to include adequate intensity variation.
- The next step is aggregating the cost values over the created cross. The intermediate cost is obtained by summing the cost values horizontally and the final cost is calculated by adding all the intermediate data vertically. The process is illustrated by
FIG. 5 . - After finding the final cost, the method includes, in
step 26 ofFIG. 1 , matching pixels between the pair of stereo images and optimizing a value for the matching pixels to obtain an estimated depth map. The best match for each pixel is obtained and the disparity between the matches is calculated. The optimum disparity is calculated by minimizing the energy function. As shown in Equation (8), the energy function consists of three terms. The first term is the matching cost from the previous step which is based on the Census transform. The other two terms are smoothness energy terms. In one embodiment of this invention, two penalty terms are added to the matching cost function to take into account slight and abrupt changes in the disparity of neighboring pixels. An 8-direction optimization path is used to reach the optimum value. -
E(d)=ΣC(p,d p)+ΣP 1 F[|d q −d p|=1]+ΣP 2F[|dq −d p|≠1], (8) - where dp and dq are the depth values for pixels p and q. The problem can be illustrated as finding the disparity which minimizes the energy function obtained in the previous step.
- The pixels can be converted to depth values once the disparity for all the pixels is obtained. The
focus step 28 inFIG. 1 is the refinement of the estimated depth map and this is broken down into two major steps: (1) filling the holes in the estimated depth map, and (2) sharpening the edges and object boundaries. - The estimated depth map from the previous steps has some black regions due to the occlusion and mismatches which need to be filled. These hole regions usually belong to the background which cannot be seen from the other reference view. Hence the algorithm fails to estimate a depth value for those specific regions. By considering the boundary pixels in the vicinity of the hole region which have non-zero depth values, the ones that belong to the background can be solely chosen by comparing the depth values. The holes are then filled by a weighted interpolation on the selected correct pixels using:
-
- where dbg is the background depth value and w is the weighting factor based on the distance from the background depth pixel to the current hole.
- The weights are calculated using Gaussian distribution based on the distance to the current pixel using Equation (11). Therefore, the farther pixels would have less impact on the calculated depth value.
- At this stage of the algorithm, there is a low resolution dense disparity image which needs to be up-sampled to the original size. In embodiments of this invention, the up-sampling is performed by applying a trilateral filter which makes the boundaries sharper and corrects the misaligned regions using Equation (10). The designed filter consists of three terms: depth data, texture data, and the curvature.
-
- In Equation (1), d is the disparity value, C and k are the color and curvature values, respectively, and f is the Gaussian distribution with standard deviation a defined by Equation (11).
-
f(x)=e −(∥x∥/σ)2 . (11) - To ensure that no new depth values are introduced during the up-sampling, when the depth map is filtered using the trilateral filter, the new depth values can be adjusted by mapping them to the nearest depth value which already exists in the depth map. To accomplish this using a high-level programming language, if-else conditional statements have to be used to decide the candidate for depth value while filtering using:
-
- In case of N distinct depth values, the N-1 conditions need to be checked, which will increase the run-time of the algorithm if this is performed for all the pixels. The Look-Up Table (LUT) as shown in
FIG. 6 is an optimization according to one embodiment of this invention that maps each value in the range of 0-255 to a unique depth value. As shown inFIG. 1 , the resulting enhanced depth map is sent instep 30 for further processing in video-based pedestrian detection systems, and/or other three-dimensional video applications. - In some embodiments according to this invention, the depth map estimation method performs stereo matching without explicit image rectification. In one embodiment of the invention, the fundamental matrix is estimated by using Random Sample Consensus and an 8-point algorithm. Then, an epipolar line equation obtained by projective mapping is derived and the search for point correspondence is performed along the epipolar line. Simulation results show that the method produces accurate depth maps for uncalibrated stereo images with reduced computational complexity.
- Most stereo matching algorithms make assumptions about camera calibration and epipolar geometry. In these approaches, given a pair of stereo images, image rectification is performed so that pairs of conjugate epipolar lines become collinear and parallel to the x-axis of image. One major advantage of rectification is that point correspondence becomes much simpler because the search is performed along the horizontal lines of the rectified images. However, image rectification is computationally expensive and sometimes causes undesirable distortions. In embodiments of this invention, a depth map estimation algorithm performs stereo correspondence without explicit image rectification.
- In embodiments of this invention, the method computes disparities in pair of non-rectified images without explicit image rectification. A Radom Sample Consensus (RANSAC) algorithm (M. A. Fischler et al., “Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography,” Communications of the ACM, vol. 24 (6), pp 381-395, June 1981) is used to calculate a fundamental matrix and the Census transform is used as the matching criterion to obtain disparities. The method includes three steps: estimating the fundamental matrix, updating epipolar search lines, and computing disparities.
- The fundamental matrix is the algebraic representation of epipolar geometry. Given a pair of stereo images, to each point in one image, there exists a corresponding epipolar line in the other image. Any point in the second image matching the point in the first image must line on the epipolar line and a projective mapping from a point in one image to its corresponding epipolar line in the other image is represented by the fundamental matrix. For robust estimation of the fundamental matrix, RANSAC was used to remove the effect of outliers and apply an 8-point algorithm. To improve the accuracy in estimating the fundamental matrix, matching points that have larger Euclidean distance were chosen. This modification can increase the probability that the coplanar matching points which lie on the same object are not chosen to estimate the fundamental matrix.
- Based on the relation between the epipolar line and the fundamental matrix, a line equation is computed. In some embodiments of the method, the epipolar line is used as the search line. The rectification step can be bypassed, which is time consuming and in many cases causes shearing and resampling effect. While calculating the line equation for each pixel, an implementation optimization is applied which can reduce the computational complexity in an M×N image from 9MN multiplications plus 6MN additions to only 3(M+N) additions. The epipolar line equation lr can be written as:
-
l r :c 1 x+c 2 y+c 3=0, (13) - where c1, c2 and c3 are the line equation coefficients for the pixel (x,y). The right side image epipolar line coefficients vector can be written as:
-
- where xl and yl refer to the column and row coordinate of the left side image point, respectively. Starting with the (0,0) coordinate in the left side image, the right image epipolar line coefficients vector can be initialized by the third column vector of the fundamental matrix. To reduce the computational complexity significantly, only an addition of the second column vector of the fundamental matrix can be considered while switching from the corresponding epipolar lines of two consecutive points. Stepping from one row to the next one, a single vector addition of the fundamental matrix first column vector and the epipolar line coefficients are applied.
- For a pixel in the left side image, the start point of the searching strategy is the projection of the pixel in the same coordinate in the right side image onto the epipolar line. This point has the least distance to the reference pixel compared to the other points on the line. The maximum disparity range is defined by the user and varies based on the image resolution. The search direction is on the epipolar line. The matching metric used for cost calculation is the non-parametric Census transform due to its robustness to illumination changes. Census transform maps a block surrounding the reference pixel into a bit string. The cost function is calculated by finding the Hamming distance of the obtained bit streams. The final matching cost function is used for optimization. The optimum disparity is the value which minimizes the cost function.
- The present invention is described in further detail in connection with the following examples which illustrate or simulate various aspects involved in the practice of the invention. It is to be understood that all changes that come within the spirit of the invention are desired to be protected and thus the invention is not to be construed as limited by these examples.
- Simulations were performed to show the efficiency of the method compared with those of the state-of-the-art SGM-based stereo matching methods. The performance of the depth estimation and refinement algorithms of this invention were evaluated against the Middlebury data (D. Scharstein et al. “High-accuracy stereo depth maps using structured light,” Proc of IEEE Conf on Computer Vision and Pattern Recognition (CVPR), vol. 1, pp. 195-202, June 2003), and the KITTI stereovision benchmark suite (A. Geiger et al., “Are we ready for Autonomous Driving? The KITTI Vision Benchmark Suite,” Proc. of IEEE Conf on Computer Vision and Pattern Recognition (CVPR), pp. 3354-3361, June 2012). The workstation runs the Windows 7 operating system with Intel Xeon Quad-Core processor and 8 GB RAM. Four different image pairs from the Middlebury dataset are chosen for the evaluation.
FIG. 5 shows results of depth estimation by the proposed technique. - Table I indicates the error statistic of percentage of bad pixels with respect to the provided ground truth depth map. The percentage of bad pixels evaluation criterion is defined by:
-
- where dG and dGT are the generated and ground truth depth values, respectively, δ is the error tolerance, and N is the total number of pixels of the image.
-
TABLE I Percentage of bad pixels for the Middlebury Dataset Stereo image Invention method SGM Tsukuba 2.40% 3.96% Cones 7.73% 9.75% Teddy 11.75% 12.2% Venus 0.82% 1.57% - Table II shows the processing time of the proposed algorithm by running in the Middlebury dataset using C programming on CPU.
-
TABLE II Computational Time Complexity Stereo image Size (pixels) Time(ms) Tsukuba 384 × 288 57 Cones 450 × 375 64 Teddy 450 × 375 68 Venus 434 × 383 59 - The proposed algorithm has been tested on the KITTI dataset, which consists of 194 training image pairs and 195 test image pairs. The images have 1224×370 pixels resolution.
FIG. 8 shows the result of depth estimation and refinement for a sample left side image of KITTI dataset. - As stated earlier, the proposed depth refinement algorithm uses neighboring background pixels solely to fill the holes and also incorporates a depth adjustment stage to ensure that no new depth values are replaced a correct depth value in the depth map.
FIG. 9 compares the result of hole filling where inFIG. 9( b) only the background depth pixels are used.FIG. 9 only shows the hole filling result without applying edge sharpening. The final result of depth refinement is shown inFIG. 8( c). -
FIG. 10 shows the histograms of the depth map of the reference color image inFIG. 8 (a) before and after refinement. The unique depth values have not changed after refinement process. - Thus, the invention provides a novel depth estimation algorithm has been proposed. The proposed method is based on adaptive window patterns of Census transform which make it robust against illumination changes and suitable for applications like advanced driver assistance systems. By down-sampling the reference images, the computational complexity of the whole algorithm is reduced. A modified cross-based cost aggregation technique is proposed that generated cross-shape support regions for each pixel individually. The proposed depth refinement technique aims at filling the holes and sharpening the object boundaries. The background depth pixels are used in order to fill the holes of the estimated depth map and the proposed trilateral filter is used to enhance the quality of the depth map. Simulation results indicate that the proposed method fulfills the aims by improving the quality of the generated depth maps and reducing the computational complexity.
- The invention illustratively disclosed herein suitably may be practiced in the absence of any element, part, step, component, or ingredient which is not specifically disclosed herein.
- While in the foregoing detailed description this invention has been described in relation to certain preferred embodiments thereof, and many details have been set forth for purposes of illustration, it will be apparent to those skilled in the art that the invention is susceptible to additional embodiments and that certain of the details described herein can be varied considerably without departing from the basic principles of the invention.
Claims (20)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US14/827,897 US9754377B2 (en) | 2014-08-15 | 2015-08-17 | Multi-resolution depth estimation using modified census transform for advanced driver assistance systems |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201462037987P | 2014-08-15 | 2014-08-15 | |
US14/827,897 US9754377B2 (en) | 2014-08-15 | 2015-08-17 | Multi-resolution depth estimation using modified census transform for advanced driver assistance systems |
Publications (2)
Publication Number | Publication Date |
---|---|
US20160048970A1 true US20160048970A1 (en) | 2016-02-18 |
US9754377B2 US9754377B2 (en) | 2017-09-05 |
Family
ID=55302542
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US14/827,897 Expired - Fee Related US9754377B2 (en) | 2014-08-15 | 2015-08-17 | Multi-resolution depth estimation using modified census transform for advanced driver assistance systems |
Country Status (1)
Country | Link |
---|---|
US (1) | US9754377B2 (en) |
Cited By (19)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160239948A1 (en) * | 2015-02-12 | 2016-08-18 | Texas Instruments Incorporated | Method and circuitry for performing census transforms |
CN106737774A (en) * | 2017-02-23 | 2017-05-31 | 天津商业大学 | One kind is without demarcation mechanical arm Visual servoing control device |
CN106846290A (en) * | 2017-01-19 | 2017-06-13 | 西安电子科技大学 | Stereoscopic parallax optimization method based on anti-texture cross and weights cross |
CN106950597A (en) * | 2017-04-20 | 2017-07-14 | 吉林大学 | The mixing source data separation method filtered based on three sides |
CN106998460A (en) * | 2017-05-16 | 2017-08-01 | 合肥工业大学 | A kind of hole-filling algorithm based on depth transition and depth total variational |
CN107229903A (en) * | 2017-04-17 | 2017-10-03 | 深圳奥比中光科技有限公司 | Method, device and the storage device of robot obstacle-avoiding |
CN107507233A (en) * | 2017-07-14 | 2017-12-22 | 天津大学 | Stereo Matching Algorithm based on universe image symmetrical correlation |
US20170372510A1 (en) * | 2016-06-27 | 2017-12-28 | Robert Bosch Gmbh | Systems and methods for dynamic occlusion handling |
US20180137610A1 (en) * | 2015-05-18 | 2018-05-17 | Nokia Technologies Oy | Filtering depth map image |
WO2018211127A1 (en) * | 2017-05-19 | 2018-11-22 | Movidius Ltd. | Methods, systems and apparatus to optimize pipeline execution |
CN109801325A (en) * | 2019-01-11 | 2019-05-24 | 深兰人工智能芯片研究院(江苏)有限公司 | A kind of Binocular Stereo Vision System obtains the method and device of disparity map |
US10552972B2 (en) * | 2016-10-19 | 2020-02-04 | Samsung Electronics Co., Ltd. | Apparatus and method with stereo image processing |
CN111462195A (en) * | 2020-04-09 | 2020-07-28 | 武汉大学 | Irregular angle direction cost aggregation path determination method based on mainline constraint |
CN111784753A (en) * | 2020-07-03 | 2020-10-16 | 江苏科技大学 | Three-dimensional reconstruction stereo matching method for autonomous underwater robot recovery butt joint foreground view field |
CN113240964A (en) * | 2021-05-13 | 2021-08-10 | 广西英腾教育科技股份有限公司 | Cardiopulmonary resuscitation teaching machine |
DE102020212285A1 (en) | 2020-09-29 | 2022-03-31 | Myestro Interactive Gmbh | Method for spatial image acquisition using a stereo camera having two cameras and method for generating a redundant image of a measurement object and device for carrying out the method |
US11328446B2 (en) * | 2015-04-15 | 2022-05-10 | Google Llc | Combining light-field data with active depth data for depth map generation |
WO2023070421A1 (en) * | 2021-10-28 | 2023-05-04 | Intel Corporation | Methods and apparatus to perform mask-based depth enhancement for multi-view systems |
WO2024123343A1 (en) * | 2022-12-09 | 2024-06-13 | Innopeak Technology, Inc. | Stereo matching for depth estimation using image pairs with arbitrary relative pose configurations |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10621446B2 (en) * | 2016-12-22 | 2020-04-14 | Texas Instruments Incorporated | Handling perspective magnification in optical flow processing |
KR102187211B1 (en) * | 2018-04-23 | 2020-12-04 | 코그넥스코오포레이션 | Methods and apparatus for improved 3-d data reconstruction from stereo-temporal image sequences |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6215898B1 (en) * | 1997-04-15 | 2001-04-10 | Interval Research Corporation | Data processing system and method |
US20060067583A1 (en) * | 2004-09-24 | 2006-03-30 | Fuji Photo Film Co., Ltd. | Image compression apparatus, and image compression program storage medium |
US20120327189A1 (en) * | 2010-03-12 | 2012-12-27 | Hitachi Automotive Systems, Ltd. | Stereo Camera Apparatus |
US20130033582A1 (en) * | 2011-08-04 | 2013-02-07 | Aptina Imaging Corporation | Method of depth-based imaging using an automatic trilateral filter for 3d stereo imagers |
US20130077852A1 (en) * | 2011-09-27 | 2013-03-28 | Yu-Lin Chang | Method and apparatus for generating final depth information related map that is reconstructed from coarse depth information related map through guided interpolation |
US8467596B2 (en) * | 2011-08-30 | 2013-06-18 | Seiko Epson Corporation | Method and apparatus for object pose estimation |
-
2015
- 2015-08-17 US US14/827,897 patent/US9754377B2/en not_active Expired - Fee Related
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6215898B1 (en) * | 1997-04-15 | 2001-04-10 | Interval Research Corporation | Data processing system and method |
US20060067583A1 (en) * | 2004-09-24 | 2006-03-30 | Fuji Photo Film Co., Ltd. | Image compression apparatus, and image compression program storage medium |
US20120327189A1 (en) * | 2010-03-12 | 2012-12-27 | Hitachi Automotive Systems, Ltd. | Stereo Camera Apparatus |
US20130033582A1 (en) * | 2011-08-04 | 2013-02-07 | Aptina Imaging Corporation | Method of depth-based imaging using an automatic trilateral filter for 3d stereo imagers |
US8467596B2 (en) * | 2011-08-30 | 2013-06-18 | Seiko Epson Corporation | Method and apparatus for object pose estimation |
US20130077852A1 (en) * | 2011-09-27 | 2013-03-28 | Yu-Lin Chang | Method and apparatus for generating final depth information related map that is reconstructed from coarse depth information related map through guided interpolation |
Non-Patent Citations (2)
Title |
---|
Binary stereo matching, Zhang et al., ICPR 2012, pages 356-359 * |
SGM- based --- adaptive Census transform, Longman et al., IEEE, 978-1-4799-2491-2, 2013, Pages 592-597 * |
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20160239948A1 (en) * | 2015-02-12 | 2016-08-18 | Texas Instruments Incorporated | Method and circuitry for performing census transforms |
US9905020B2 (en) * | 2015-02-12 | 2018-02-27 | Texas Instruments Incorporated | Method and circuitry for performing census transforms |
US11328446B2 (en) * | 2015-04-15 | 2022-05-10 | Google Llc | Combining light-field data with active depth data for depth map generation |
US10482586B2 (en) * | 2015-05-18 | 2019-11-19 | Nokia Technologies Oy | Filtering depth map image |
US20180137610A1 (en) * | 2015-05-18 | 2018-05-17 | Nokia Technologies Oy | Filtering depth map image |
US20170372510A1 (en) * | 2016-06-27 | 2017-12-28 | Robert Bosch Gmbh | Systems and methods for dynamic occlusion handling |
US10706613B2 (en) * | 2016-06-27 | 2020-07-07 | Robert Bosch Gmbh | Systems and methods for dynamic occlusion handling |
US10552972B2 (en) * | 2016-10-19 | 2020-02-04 | Samsung Electronics Co., Ltd. | Apparatus and method with stereo image processing |
CN106846290A (en) * | 2017-01-19 | 2017-06-13 | 西安电子科技大学 | Stereoscopic parallax optimization method based on anti-texture cross and weights cross |
CN106737774A (en) * | 2017-02-23 | 2017-05-31 | 天津商业大学 | One kind is without demarcation mechanical arm Visual servoing control device |
CN107229903A (en) * | 2017-04-17 | 2017-10-03 | 深圳奥比中光科技有限公司 | Method, device and the storage device of robot obstacle-avoiding |
CN106950597A (en) * | 2017-04-20 | 2017-07-14 | 吉林大学 | The mixing source data separation method filtered based on three sides |
CN106998460A (en) * | 2017-05-16 | 2017-08-01 | 合肥工业大学 | A kind of hole-filling algorithm based on depth transition and depth total variational |
US11380005B2 (en) * | 2017-05-19 | 2022-07-05 | Movidius Limited | Methods, systems and apparatus to optimize pipeline execution |
WO2018211127A1 (en) * | 2017-05-19 | 2018-11-22 | Movidius Ltd. | Methods, systems and apparatus to optimize pipeline execution |
CN110998660A (en) * | 2017-05-19 | 2020-04-10 | 莫维迪乌斯有限公司 | Method, system and apparatus for optimizing pipeline execution |
KR20200020705A (en) * | 2017-05-19 | 2020-02-26 | 모비디어스 리미티드 | Methods, systems, and apparatus for optimizing pipeline execution |
US11954879B2 (en) * | 2017-05-19 | 2024-04-09 | Movidius Ltd. | Methods, systems and apparatus to optimize pipeline execution |
KR102655086B1 (en) | 2017-05-19 | 2024-04-08 | 모비디어스 리미티드 | Methods, systems and apparatus for optimizing pipeline execution |
US20230084866A1 (en) * | 2017-05-19 | 2023-03-16 | Movidius Limited | Methods, systems and apparatus to optimize pipeline execution |
CN107507233A (en) * | 2017-07-14 | 2017-12-22 | 天津大学 | Stereo Matching Algorithm based on universe image symmetrical correlation |
CN109801325A (en) * | 2019-01-11 | 2019-05-24 | 深兰人工智能芯片研究院(江苏)有限公司 | A kind of Binocular Stereo Vision System obtains the method and device of disparity map |
CN111462195A (en) * | 2020-04-09 | 2020-07-28 | 武汉大学 | Irregular angle direction cost aggregation path determination method based on mainline constraint |
CN111784753A (en) * | 2020-07-03 | 2020-10-16 | 江苏科技大学 | Three-dimensional reconstruction stereo matching method for autonomous underwater robot recovery butt joint foreground view field |
DE102020212285A1 (en) | 2020-09-29 | 2022-03-31 | Myestro Interactive Gmbh | Method for spatial image acquisition using a stereo camera having two cameras and method for generating a redundant image of a measurement object and device for carrying out the method |
CN113240964A (en) * | 2021-05-13 | 2021-08-10 | 广西英腾教育科技股份有限公司 | Cardiopulmonary resuscitation teaching machine |
WO2023070421A1 (en) * | 2021-10-28 | 2023-05-04 | Intel Corporation | Methods and apparatus to perform mask-based depth enhancement for multi-view systems |
WO2024123343A1 (en) * | 2022-12-09 | 2024-06-13 | Innopeak Technology, Inc. | Stereo matching for depth estimation using image pairs with arbitrary relative pose configurations |
Also Published As
Publication number | Publication date |
---|---|
US9754377B2 (en) | 2017-09-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9754377B2 (en) | Multi-resolution depth estimation using modified census transform for advanced driver assistance systems | |
CN114782691B (en) | Robot target identification and motion detection method based on deep learning, storage medium and equipment | |
US10484663B2 (en) | Information processing apparatus and information processing method | |
US10553026B2 (en) | Dense visual SLAM with probabilistic surfel map | |
US10462445B2 (en) | Systems and methods for estimating and refining depth maps | |
US8755630B2 (en) | Object pose recognition apparatus and object pose recognition method using the same | |
JP7134012B2 (en) | Parallax estimation device and method | |
US8385630B2 (en) | System and method of processing stereo images | |
US8326025B2 (en) | Method for determining a depth map from images, device for determining a depth map | |
US7755619B2 (en) | Automatic 3D face-modeling from video | |
WO2018098891A1 (en) | Stereo matching method and system | |
US7526140B2 (en) | Model-based localization and measurement of miniature surface mount components | |
Ma et al. | A modified census transform based on the neighborhood information for stereo matching algorithm | |
EP3293700A1 (en) | 3d reconstruction for vehicle | |
Loghman et al. | SGM-based dense disparity estimation using adaptive census transform | |
CN112734837B (en) | Image matching method and device, electronic equipment and vehicle | |
CN113763269A (en) | Stereo matching method for binocular images | |
Zhu et al. | Stereo matching algorithm with guided filter and modified dynamic programming | |
CN114494589A (en) | Three-dimensional reconstruction method, three-dimensional reconstruction device, electronic equipment and computer-readable storage medium | |
Lee et al. | Temporally consistent road surface profile estimation using stereo vision | |
CN110443228B (en) | Pedestrian matching method and device, electronic equipment and storage medium | |
CN113344989B (en) | NCC and Census minimum spanning tree aerial image binocular stereo matching method | |
Kim et al. | Piecewise planar scene reconstruction and optimization for multi-view stereo | |
Ershov et al. | Stereovision algorithms applicability investigation for motion parallax of monocular camera case | |
Noh et al. | Highly Dense 3D Surface Generation Using Multi‐image Matching |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: ILLINOIS INSTITUTE OF TECHNOLOGY, ILLINOIS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LOGIIMAN, MAZIAR;MESMAKHOSROSHAHI, MARAL;KIM, JOOHEE;SIGNING DATES FROM 20150814 TO 20150815;REEL/FRAME:036407/0169 Owner name: ILLINOIS INSTITUTE OF TECHNOLOGY, ILLINOIS Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:LOGHMAN, MAZIAR;MESMAKHOSROSHAHI, MARAL;KIM, JOOHEE;SIGNING DATES FROM 20150814 TO 20150815;REEL/FRAME:036407/0169 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: MICROENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: MICROENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Lapsed due to failure to pay maintenance fee |
Effective date: 20210905 |